# Phi-Delta-Diagrams: Software Implementation of a Visual Tool for Assessing Classifier and Feature Performance

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

`Python`and

`R`packages will be described.

## 1. Introduction

`SVM`classifier on the category Science, as opposed to the category Business, both taken from the Open Directory Project (ODP) (http://opendirectoryproject.org/).

`ratio`the diagram would be stretched accordingly. Let us note that isometrics of accuracy are in both cases straight lines parallel to the $\phi $ axis. However, in the former case, accuracy and bias are in fact “unbiased”, meaning that the actual class ratio does not affect their evaluation. Rendering the formulas of accuracy and bias in terms of sensitivity, specificity, and class ratio, their “unbiased” counterparts can be obtained by simply setting $\mathtt{ratio}=1$. We prefer to use the term “unbiased” rather than “imbalanced”, as these measures are calculated regardless of the imbalance of the dataset at hand, which may in fact be imbalanced or not. Other relevant isometrics are those related to specificity and sensitivity. Regardless of the adopted kind of diagrams, isometrics of specificity are straight lines parallel to the upper left edge of a diagram, whereas isometrics of sensitivity are straight lines parallel to the upper right edge (for more information on specificity and sensitivity isometrics, see reference [21]).

`Python`

`3.x`and

`R`packages. The former, made available as GitHub project, is given in both functional and object-oriented forms, whereas the latter, made available at the Comprehensive

`R`Archive Network (CRAN), includes also a web-application created with

`R`shiny. The remainder of this article is organized as follows: after a basic introduction to classifier and variable importance assessment with $\langle \phi ,\delta \rangle $ measures and diagrams, Section 2 outlines how to use $\langle \phi ,\delta \rangle $ diagrams on example data. Section 3 gives further details on the cited measures and on the required operational context. Materials and methods are described in Section 4, whereas Section 5 provides conclusions.

## 2. Results

`ODP`. This project, which has been for two decades the largest publicly available Web directory, catalogs a huge number of web pages by means of a suitable taxonomy, each node containing web pages related to a specific topic. Categorizing web pages is an essential activity to improve user experience [22], particularly when classes are topics [23,24] and when the page at hand must be labeled as relevant or not [25]. In this scenario, both dataset has been generated from a pair of

`ODP`categories, whose samples have been preprocessed for extracting the corresponding textual content.

`SVM`), which is apparently affected by the imbalance of data. The left-hand side of the figure reports the outcomes as if they would be observed in absence of imbalance, whereas the right-hand side clearly shows that the bias vanishes (on average) if the actual ratio of data is used for plotting the performances obtained by the selected learning algorithm on each fold.

## 3. Discussion

## 4. Materials and Methods

#### 4.1. Short Summary on $\langle \phi ,\delta \rangle $ Diagrams

`ratio`$n/p=1$ (i.e., when $n=p=1/2$), Equation (2) reduces to (1). Depending on a strict design choice, the $\delta $ value corresponding to oracle and anti-oracle would be $+1$ and $-1$ –regardless of the imposed

`ratio`. On the other hand, the $\delta $ value of dummy classifiers may change according to the

`ratio`. In particular, for low values of

`ratio`(which implies a majority of positive samples), the dummy classifier that always answer “yes” (right-hand side) becomes progressively closer to the oracle, and vice versa for the other dummy classifier. Conversely, for high values of

`ratio`(which implies a majority of negative samples), the dummy classifier that always answer “no” (left-hand side) becomes progressively closer to the oracle, and vice versa for the other dummy classifier. This should not be surprising, as it is well known that the accuracy obtained by a dummy classifier on a highly imbalanced dataset may be very good or very bad, according to the agreement or disagreement between the label of the majority of samples and the default answer of the classifier.

#### 4.2. Implementation of $\langle \phi ,\delta \rangle $ Diagrams

`Python`(https://www.python.org/) and

`R`(https://www.r-project.org/) have been selected, due to the fact that they are the most popular programming languages in the community of data scientists, with substantially growing importance over the last years. Regardless of their original design (the former is a general-purpose programming language, subsequently enriched with support for array-based computation, whereas the latter is born as a statistical programming language), nowadays both languages provide great support—in terms of ad-hoc libraries—for data mining and data analysis. Having selected the cited target languages, care has been taken to keep the same interface of the main functions for both

`Python`and

`R`.

`data`)—the latter as a one-dimensional array (say

`labels`). For instance, assuming that the available samples are 500 and that each instance is described by a list of 20 feature values,

`data`would be an array with 500 rows and 20 columns, whereas

`labels`an array with 500 rows (and one column).

`phiDelta.convert`,

`phidelta.stats`and

`phiDelta.plot`. The first should be called to convert standard performance measures (i.e., specificity and sensitivity) into $\langle \phi ,\delta \rangle $ values; the second to evaluate statistics on the data at hand (features only); the third to plot $\langle \phi ,\delta \rangle $ values into a $\langle \phi ,\delta \rangle $ diagram. With

`ratio`denoting a value of class imbalance and

`info`denoting structured information about the data at hand, the interface of these functions follows hereinafter (optional parameters have been marked with an asterisk):

`Python`and

`R`implementations adhere to the given interface and use the same function names. The agreement between names depends in fact on the interpretation given to the character “.” in the cited languages. In particular, in

`Python`it denotes an ownership or inclusion relation (e.g., between an object and one of its slots or methods, or between a package and one of its components), whereas it can be freely used in

`R`as part of a variable or function name. According to the underlying semantics, in the proposed syntax, the name

`phiDelta.plot`in

`Python`actually denotes the function

`plot`as belonging to the package

`phiDelta`, whereas in

`R`it denotes a function name.

`specificity`and`sensitivity`are two one-dimensional arrays, which are used as input parameters when a conversion is invoked. They must have equal size, as in fact—taken together—they embed $\langle \overline{\rho},\rho \rangle $ pairs.`phi`and`delta`are two one-dimensional arrays. They must have equal size, as in fact—taken together—they embed $\langle \phi ,\delta \rangle $ pairs.`data`and`labels`denote the available input instances and labels, as two- and one-dimensional arrays, respectively. They must have the same number of rows.`info`is an optional data structure (i.e., a dictionary in`Python`and a list in`R`) expected to contain information about the dataset at hand. In particular, the following information should appear therein: (i) the name of the dataset, (ii) a short description, (iii) a list of features, (iv) a list of class names, and (v) an assertion of which class name(s) should be considered positive. The default value for`info`is`null`. The value null has been used here to represent a null value, which in`Python`is denoted as`None`, whereas in`R`it is denoted as`NULL`. In the absence of explicit information, class names would be retrieved from`labels`, features names would be automatically generated (as $F1,F2,F3,\dots $), and the positive class would be the first class name (taken in alphabetical order).`names`is an optional one-dimensional array of strings, each string being intended to document the corresponding $\langle \phi ,\delta \rangle $ pair. Different semantics may hold for this parameter, depending on the underlying context. For classifier assessment, it is expected to embed classifier (or fold) names; for feature assessment, feature names. In both cases, the size of the array must be equal to the size of`phi`and`delta`. Being optional, the default value for`names`is`null`.`ratio`denotes the class imbalance (i.e., the ratio between negative and positive samples). As output parameter of the function`phiDelta.stats`, it represents the actual ratio found on the dataset at hand. As an input parameter of the function`phiDelta.plot`, it controls the way a $\langle \phi ,\delta \rangle $ diagram is stretched (the default value is 1). When $\mathtt{ratio}=1$, the corresponding $\langle \phi ,\delta \rangle $ diagram would be a perfect rhombus. In so doing, data would be subsequently drawn as they were in fact balanced—regardless of the actual imbalance.

`ratio`as input parameter to

`phiDelta.plot`allows the user to perform any sort of guess if on the corresponding performance measures. For instance, let us suppose that the actual ratio of the dataset at hand is 4 (meaning that $n=4/5$ and $p=1/5$). Notwithstanding the actual ratio, one may want to check a situation in which the ratio is assumed to be equal to $1/4$ (meaning that $n=1/5$ and $p=4/5$). This is a typical reversal of perspective, which might be very useful while trying to estimate the performance of a given classifier on different scenarios. In other words, under the assumption of statistical significance, imposing a ratio allows a user to come up with a useful estimation of a classifier performance in the event that the actual ratio would coincide with the one imposed as a parameter. As for the range of possible values, although in principle any positive value could be imposed, only those that fall in the interval $[{10}^{-1},{10}^{+1}]$ are allowed. In fact, although there is no conceptual reason to constrain the interval in $[{10}^{-1},{10}^{+1}]$, in practice, “stretching” a $\langle \phi ,\delta \rangle $ diagram outside these values would not be useful.

#### 4.3. Python Package

- Project title: Phi-Delta Diagram
- Package name: phiDelta
- Project home page (GitHub): https://github.com/garmano/phiDelta.git
- Operating system (s): Platform independent
- Programming language: Python (≥3.4.x)
- License: GPL (≥2)
- Any restrictions to use by non-academics: none

`R`and is derived by wrapping the former. Using either of them, however, is a matter of personal taste. Some details are given hereinafter on the object-oriented implementation, whereas the latter will be illustrated with less detail. This does not necessarily mean that the object-oriented implementation should be preferred. Considering that the function-based interface is the same for

`Python`and

`R`, the interested reader can obtain the missing details by reading that part of the article. Both implementations are embedded in the package

`phiDelta`. Let us point out that the function-based implementation—which fulfills the common requirements posed for both

`Python`and

`R`implementations—is in fact a wrapper for the object-oriented implementation of $\langle \phi ,\delta \rangle $ diagrams. This choice has been made to guarantee the same behavior to both implementations while avoiding unnecessary redundancy.

#### 4.3.1. Python Object-Oriented Implementation of $\langle \phi ,\delta \rangle $ Diagrams

`View`). It is worth noting that

`View`is derived from

`Geometry`, the latter class being focused on calculating all relevant details that characterize the shape of a $\langle \phi ,\delta \rangle $ diagram according to the given

`ratio`. Furthermore, a separate class entrusted with performing statistics (i.e.,

`Statistics`) has also been provided. In turn, it uses the class

`Feature`, which is repeatedly involved for handling the processing of each feature.

`model`follow:

`phidelta_std`transforms`specificity`and`sensitivity`into standard $\langle \phi ,\delta \rangle $ coordinates.`phidelta_std2gen`transforms standard $\langle \phi ,\delta \rangle $ coordinates into generalized ones, according to the`ratio`given as parameter.`make_grid`generates a grid of $N\times N$ $\langle \phi ,\delta \rangle $ points, to be subsequently plotted in a $\langle \phi ,\delta \rangle $ diagram (N is an input parameter). This operation is provided for testing only. The visualization of a grid should be the first test to be made for checking the installation of the`phiDelta`package.`random_samples`generates M random samples of $\langle \phi ,\delta \rangle $ values, which are subsequently plotted in a $\langle \phi ,\delta \rangle $ diagram (M is an input parameter). This operation is provided for testing only.

`View`, which visualizes the $\langle \phi ,\delta \rangle $ diagrams:

`plot`should be invoked to plot data in the current diagram.`__lshift__`, which overrides the operators “≪”, is used here to control the way a $\langle \phi ,\delta \rangle $ diagram is drawn. An example of how to use it is given hereinafter (see Listing 4).`__rshift__`, which overrides the operators “≫”, is used here to control the way a $\langle \phi ,\delta \rangle $ diagram is drawn. An example of how to use it is given hereinafter (see Listing 4).

`data`,

`labels`, and

`info`are first loaded from file (although some facilities are part of the library, in principle the user is responsible for properly implementing them). Then 30-fold cross validation is performed on the given data using a linear

`SVM`(although the corresponding function is not part of the package, it is easy to implement it by importing

`svm.svc`from

`sklearn`). Finally, the resulting values are plotted using a

`View`object (with $\mathtt{ratio}=1.$) created on the fly.

`data`,

`labels`, and

`info`are loaded from file. Then, statistics are calculated on the data at hand. Subsequently, the color map of the plot (i.e.,

`cmap`) is set to

`cool`and the default option of filling the diagram with a default background is removed from the current options. The functions

`options`and

`settings`have been used to simplify parameter passing. In practice, the former returns a tuple containing its arguments, whereas the latter a dictionary containing keyword-value pairs. Finally, the resulting $\langle \phi ,\delta \rangle $ values are plotted by using a

`View`object created on the fly.

#### 4.3.2. Python Function-Based Implementation of $\langle \phi ,\delta \rangle $ Diagrams

`convert`) by wrapping the object-oriented one. With this idea in mind, let us give information about the way it has been actually performed, with the goal of mimicking the corresponding implementation in

`R`. Listing 5 illustrates the definition of the functions

`phiDelta.convert`,

`phiDelta.stats`, and

`phiDelta.plot`.

#### 4.4. `R` Package

`R`users under the following requirements:

- Project title: Phi-Delta Diagram
- Package name: phiDelta
- Project home page (CRAN): http://cran.r-project.org/web/packages/phiDelta
- Operating system (s): Platform independent
- Programming language:
`R`(≥3.4.4) - License: GPL (≥2)
- Any restrictions to use by non-academics: none

`R`package is designed similar to the function-based

`Python`implementation of the $\langle \phi ,\delta \rangle $ diagrams. Listing 6 illustrates the main functions provided in the

`R`package

`phiDelta`.

`phiDelta.convert`contains transformation formulas (see Equation (1)) to convert given specificity and sensitivity to $\langle \phi ,\delta \rangle $ values. It is conceived for the use of classifier assessment.

`phiDelta.stats`is designed for feature assessment, i.e., $\langle \phi ,\delta \rangle $ values are calculated based on the input parameters

`data`and

`labels`. It provides a logical value

`ratio_correction`. By default the parameter is true and calculates $\phi $ and $\delta $ with respect to the ratio between positive and negative samples in the

`labels`parameter.

`ratio_correction = FALSE`, the ratio is set to one. The output of

`phiDelta.stats`is a list with four objects. The first three objects are each vectors of $\phi $ values, $\delta $ values, and the corresponding names of the features. The last object is the class ratio calculated from the data. The outputs retrieved from

`phiDelta.convert`and

`phiDelta.stats`are compatible with the input parameters of the

`phiDelta.plot`function, whereby

`names`and

`ratio`are not mandatory. In

`phiDelta.plot`$\phi $ and $\delta $ are plotted in a $\langle \phi ,\delta \rangle $ diagram with default

`ratio = 1`. Several graphical parameters can be passed, such as different isometric lines or points, which should be highlighted in the diagram.

`climate_data`and consists of meteorological data from a weather station in Frankfurt (Oder), Germany from February 2016, which has been implemented in the EFS package [27]. The class variable is the boolean variable

`RainBool`, which is 0 for no rain and 1 for rain. On the basis of that

`climate_data`, Listing 7 shows an exemplary usage of the

`R`functions. The outcome plot of the example is shown in Figure 7.

#### 4.5. Web Application

`Python`or

`R`, we also provide a web application at http://phiDelta.heiderlab.de. It contains all functionalities mentioned above. If users want to evaluate the distributions of the feature points in the $\langle \phi ,\delta \rangle $ diagram by a ranking of features, there are different options implemented.

`R`package, the web application provides the opportunity to graphically identify single feature points by moving the curses over the plots and zooming in the $\langle \phi ,\delta \rangle $ diagrams. By marking features in the feature ranking table, the corresponding feature points are highlighted in pink in the diagram.

## 5. Conclusions

`Python`and

`R`, which are two very popular programming languages widely acknowledged by the data mining community. Great attention has been taken to guarantee the same interface for the end user, at least for the function-based interface. In addition, the

`Python`library provides an object-oriented implementation of the cited measures and diagrams, whereas the

`R`library is enriched with an online interface. The library to be used largely depends on personal taste or on constraints dictated by the underlying environment. Along the article, several source listings have been given to facilitate users in the task of devising and experimenting the proposed performance measures. It is worth noting that, on the

`Python`side, more details have been given to illustrate the object-oriented implementation, as the function-based implementation is in fact obtained by wrapping the object-oriented one. For both languages, all relevant information has been given to help users to understand how the libraries are used. As for the

`R`side, further details have been given to illustrate how the online interface can be accessed and used. $\langle \phi ,\delta \rangle $ diagrams are a mandatory alternative to ROC curves when the focus is on accuracy and/or bias rather than on specificity and sensitivity. Moreover, they are a powerful tool for studying the characteristics of any given dataset, which can be estimated as easy or difficult to classify, depending on the overall “signature” that its features depict in a $\langle \phi ,\delta \rangle $ diagram. To this end, some examples of signature generation have been described and commented. We deem that the proposed implementations of $\langle \phi ,\delta \rangle $ measures and diagrams, depending on their availability in two very popular programming languages, will be very helpful for a wide range of researchers.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Dechêne, A.; Jochum, C.; Fingas, C.; Paul, A.; Heider, D.; Syn, W.K.; Gerken, G.; Canbay, A.; Zöpf, T. Endoscopic management is the treatment of choice for bile leaks after liver resection. Gastrointest. Endosc.
**2014**, 80, 626–633. [Google Scholar] [CrossRef] [PubMed] - Heider, D.; Appelmann, J.; Bayro, T.; Dreckmann, W.; Held, A.; Winkler, J.; Barnekow, A.; Borschbach, M. A computational approach for the identification of small GTPases based on preprocessed amino acid sequences. Technol. Cancer Res. Treat.
**2009**, 8, 333–341. [Google Scholar] [CrossRef] [PubMed] - Pyka, M.; Heider, D.; Hauke, S.; Kircher, T.; Jansen, A. Dynamic causal modeling with genetic algorithms. J. Neurosci. Methods
**2011**, 194, 402–406. [Google Scholar] [CrossRef] [PubMed] - Armano, G.; Marchesi, M.; Murru, A. A hybrid genetic-neural architecture for stock indexes forecasting. Inf. Sci.
**2005**, 170, 3–33. [Google Scholar] [CrossRef] - Carrasquilla, J.; Melko, R.G. Machine learning phases of matter. Nat. Phys.
**2017**, 13, 431. [Google Scholar] [CrossRef] - Hand, D. Construction and Assessment of Classification Rules; Wiley: Hoboken, NJ, USA, 1997. [Google Scholar]
- Pepe, M. The Statistical Evaluation of Medical Tests for Classification and Prediction; Oxford University Press: Oxford, UK, 2004. [Google Scholar]
- Fawcett, T. An introduction to ROC analysis. Patt. Recognit. Lett.
**2006**, 27, 861–874. [Google Scholar] [CrossRef] - Lever, J.; Krzywinski, M.; Altman, N. Points of Significance: Classification Evaluation. Nat. Methods
**2016**, 13, 603–604. [Google Scholar] [CrossRef] - Barandela, R.; Sánchez, J.S.; Garcıa, V.; Rangel, E. Strategies for learning in class imbalance problems. Patt. Recognit.
**2003**, 36, 849–851. [Google Scholar] [CrossRef] - Elazmeh, W.; Japkowicz, N.; Matwin, S. A framework for comparative evaluation of classifiers in the presence of class imbalance. In Proceedings of the third Workshop on ROC Analysis in Machine Learning, Pittsburgh, PA, USA, 29 June 2006. [Google Scholar]
- Guo, X.; Yin, Y.; Dong, C.; Yang, G.; Zhou, G. On the class imbalance problem. In Proceedings of the Fourth International Conference on Natural Computation, Jinan, China, 18–20 October 2008; Volume 4, pp. 192–201. [Google Scholar]
- Fürnkranz, J.; Flach, P.A. Roc ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms. Mach. Learn.
**2005**, 58, 39–77. [Google Scholar] [CrossRef] - Drummond, C.; Holte, R.C. Cost curves: An improved method for visualizing classifier performance. Mach. Learn.
**2006**, 65, 95–130. [Google Scholar] [CrossRef][Green Version] - Ben-David, A. A lot of randomness is hiding in accuracy. Eng. Appl. Artif. Intell.
**2007**, 20, 875–885. [Google Scholar] [CrossRef] - García, S.; Fernández, A.; Luengo, J.; Herrera, F. A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput.
**2009**, 13, 959. [Google Scholar] [CrossRef] - Armano, G. A Direct Measure of Discriminant and Characteristic Capability for Classifier Building and Assessment. Inf. Sci.
**2015**, 325, 466–483. [Google Scholar] [CrossRef] - Bellman, R. Adaptive Control Processes; Princeton University Press: Princeton, NJ, USA, 1961. [Google Scholar]
- Pearson, K. VII. Mathematical contributions to the theory of evolution.—III. Regression, heredity, and panmixia. Philos. Trans. R. Soc. A
**1896**, 187, 253–318. [Google Scholar] [CrossRef] - Cramer, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946; p. 575. [Google Scholar]
- Armano, G.; Giuliani, A. A two-tiered 2D Visual Tool for Assessing Classifier Performance. Inf. Sci.
**2018**, in press. [Google Scholar] [CrossRef] - Kalinov, P.; Stantic, B.; Sattar, A. Building a dynamic classifier for large text data collections. In Proceedings of the Twenty-First Australasian Database Conference, Brisbane, Australia, 18–22 January 2010; pp. 113–122. [Google Scholar]
- Kenekayoro, P.; Buckley, K.; Thelwall, M. Automatic Classification of Academic Web Page Types. Scientometrics
**2014**, 101, 1015–1026. [Google Scholar] [CrossRef] - Zhu, J.; Xie, Q.; Yu, S.I.; Wong, W.H. Exploiting link structure for web page genre identification. Data Min. Knowl. Discov.
**2016**, 30, 550–575. [Google Scholar] [CrossRef] - Mohammad, R.M.; Thabtah, F.A.; McCluskey, L. Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl.
**2014**, 25, 443–458. [Google Scholar] [CrossRef] - Zipf, G. Human Behavior and the Principle of Least Effort; Addison Wesley: Boston, MA, USA, 1949. [Google Scholar]
- Neumann, U.; Genze, N.; Heider, D. EFS: an ensemble feature selection tool implemented as R-package and web-application. BioData Min.
**2017**, 10, 21. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Assessment of the Science category, as opposed to Business (both categories have been extracted from the

`ODP`). The left-hand side of the figure shows an assessment made on an

`SVM`classifier, while running a k-fold cross validation. Each point reported in the diagram corresponds to the $\langle \phi ,\delta \rangle $ value calculated on a single fold. The right-hand side of the figure shows the “class signature”, made up by plotting the $\langle \phi ,\delta \rangle $ values of its features (treated as single-feature classifiers).

**Figure 2.**Standard version of a $\langle \phi ,\delta \rangle $ diagram, with $\mathtt{ratio}=1.0$ (left-hand side) and its generalization, in this example, with $\mathtt{ratio}=0.25$ (right-hand side).

**Figure 3.**Assessment of the Hiking category, as opposed to Fishing. The corresponding class signature is reported at the left-hand side. The performance of an

`SVM`classifier trained with k-fold cross validation is reported at the right-hand side (each point represents the performance of the classifier on a different fold).

**Figure 4.**Assessment of the Software category, as opposed to Hardware. The corresponding class signature is reported at the left-hand side. The performance of an

`SVM`classifier trained with k-fold cross validation is reported at the right-hand side (each point represents the performance of the classifier on a different fold).

**Figure 5.**Assessment of the Software category, as opposed to Hardware, with focus on the relation between

`ratio`and bias. As pointed out on the right-hand side, the bias reduces to zero in a generalized $\langle \phi ,\delta \rangle $ diagram with ratio corresponding to the actual one.

**Figure 6.**Assessment of the Software category, as opposed to Hardware, with focus on the “guess if” capability of $\langle \phi ,\delta \rangle $ diagrams.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Armano, G.; Giuliani, A.; Neumann, U.; Rothe, N.; Heider, D. Phi-Delta-Diagrams: Software Implementation of a Visual Tool for Assessing Classifier and Feature Performance. *Mach. Learn. Knowl. Extr.* **2019**, *1*, 121-137.
https://doi.org/10.3390/make1010007

**AMA Style**

Armano G, Giuliani A, Neumann U, Rothe N, Heider D. Phi-Delta-Diagrams: Software Implementation of a Visual Tool for Assessing Classifier and Feature Performance. *Machine Learning and Knowledge Extraction*. 2019; 1(1):121-137.
https://doi.org/10.3390/make1010007

**Chicago/Turabian Style**

Armano, Giuliano, Alessandro Giuliani, Ursula Neumann, Nikolas Rothe, and Dominik Heider. 2019. "Phi-Delta-Diagrams: Software Implementation of a Visual Tool for Assessing Classifier and Feature Performance" *Machine Learning and Knowledge Extraction* 1, no. 1: 121-137.
https://doi.org/10.3390/make1010007