Submit to Information Review for Information Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Machine Learning with Python

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (20 December 2019) | Viewed by 300586

Share This Special Issue

Special Issue Editor

Dr. Sebastian Raschka

E-Mail Website
Guest Editor

Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
Interests: machine learning; deep learning; statistics; computational biology

Special Issue Information

Dear Colleagues,

We live in this day and age where quintillions of bytes of data are generated and collected every day. Around the globe, researchers and companies are leveraging these vast amounts of data in countless application areas, ranging from drug discovery to improving transportation with self-driving cars.

As we all know, Python evolved into the lingua franca of machine learning and artificial intelligence research over the last couple of years. What makes Python particularly attractive for us researchers is that it gives us access to a cohesive set of tools for scientific computing and is easy to teach and learn. Also, as a language that bridges many different technologies and different fields, Python fosters interdisciplinary collaboration. And besides making us more productive in our research, sharing tools we develop in Python has the potential to reach a wide audience and benefit the broader research community.

Fortunately, we have seen a surge in introductory and original teaching material that has been written about machine learning with Python in the last couple of years. This body of tutorials and case studies is enabling both young and interdisciplinary researchers to leverage the rich toolsets for machine learning research and data science applications available in Python. While most literature focuses on introductory topics, however, the focus of this special issue is on the implementation of new algorithms and methods implemented in Python and essential applications in the fields of data science, machine learning, and deep learning.

This Special Issue aims to collect a body of advanced literature written by experts that provides access to the state-of-the-art methodology developed with Python. The mission is to advance the existing body of literature by sharing contemporary, cutting edge research enabled by Python. Moreover, we aim to provide representative applications of new, state-of-the-art libraries that facilitate modern problem-solving as valuable resources for researchers to utilize machine learning at the leading edge.

Dr. Sebastian Raschka
Guest Editor

Keywords

AutoML
data processing pipelines
distributed training
reproducible data science
dimensionality reduction and feature selection
deep learning

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

32 pages, 4670 KB

Open AccessArticle

A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing

by Navdeep Gill, Patrick Hall, Kim Montgomery and Nicholas Schmidt

Information 2020, 11(3), 137; https://doi.org/10.3390/info11030137 - 29 Feb 2020

Cited by 24 | Viewed by 23474

Abstract

This manuscript outlines a viable approach for training and evaluating machine learning systems for high-stakes, human-centered, or regulated applications using common Python programming tools. The accuracy and intrinsic interpretability of two types of constrained models, monotonic gradient boosting machines and explainable neural networks, a deep learning architecture well-suited for structured data, are assessed on simulated data and publicly available mortgage data. For maximum transparency and the potential generation of personalized adverse action notices, the constrained models are analyzed using post-hoc explanation techniques including plots of partial dependence and individual conditional expectation and with global and local Shapley feature importance. The constrained model predictions are also tested for disparate impact and other types of discrimination using measures with long-standing legal precedents, adverse impact ratio, marginal effect, and standardized mean difference, along with straightforward group fairness measures. By combining interpretable models, post-hoc explanations, and discrimination testing with accessible software tools, this text aims to provide a template workflow for machine learning applications that require high accuracy and interpretability and that mitigate risks of discrimination. Full article

(This article belongs to the Special Issue Machine Learning with Python)

► Show Figures

Figure 1

20 pages, 2877 KB

Open AccessArticle

Albumentations: Fast and Flexible Image Augmentations

by Alexander Buslaev, Vladimir I. Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin and Alexandr A. Kalinin

Information 2020, 11(2), 125; https://doi.org/10.3390/info11020125 - 24 Feb 2020

Cited by 1785 | Viewed by 64294

Abstract

Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve corresponding output labels. In computer vision, image augmentations have become a common implicit regularization technique to combat overfitting in deep learning models and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations of flipping, rotating, scaling, and cropping. Moreover, image processing speed varies in existing image augmentation libraries. We present Albumentations, a fast and flexible open source library for image augmentation with many various image transform operations available that is also an easy-to-use wrapper around other augmentation libraries. We discuss the design principles that drove the implementation of Albumentations and give an overview of the key features and distinct capabilities. Finally, we provide examples of image augmentations for different computer vision tasks and demonstrate that Albumentations is faster than other commonly used image augmentation tools on most image transform operations. Full article

(This article belongs to the Special Issue Machine Learning with Python)

► Show Figures

Figure 1

26 pages, 2450 KB

Open AccessArticle

Fastai: A Layered API for Deep Learning

by Jeremy Howard and Sylvain Gugger

Information 2020, 11(2), 108; https://doi.org/10.3390/info11020108 - 16 Feb 2020

Cited by 770 | Viewed by 85659

Abstract

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes: a new type dispatch system for Python along with a semantic type hierarchy for tensors; a GPU-optimized computer vision library which can be extended in pure Python; an optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4–5 lines of code; a novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training; a new data block API; and much more. We used this library to successfully create a complete deep learning course, which we were able to write more quickly than using previous approaches, and the code was more clear. The library is already in wide use in research, industry, and teaching. Full article

(This article belongs to the Special Issue Machine Learning with Python)

► Show Figures

Figure 1

14 pages, 4034 KB

Open AccessArticle

Cyber Security Tool Kit (CyberSecTK): A Python Library for Machine Learning and Cyber Security

by Ricardo A. Calix, Sumendra B. Singh, Tingyu Chen, Dingkai Zhang and Michael Tu

Information 2020, 11(2), 100; https://doi.org/10.3390/info11020100 - 11 Feb 2020

Cited by 15 | Viewed by 14267

Abstract

The cyber security toolkit, CyberSecTK, is a simple Python library for preprocessing and feature extraction of cyber-security-related data. As the digital universe expands, more and more data need to be processed using automated approaches. In recent years, cyber security professionals have seen opportunities to use machine learning approaches to help process and analyze their data. The challenge is that cyber security experts do not have necessary trainings to apply machine learning to their problems. The goal of this library is to help bridge this gap. In particular, we propose the development of a toolkit in Python that can process the most common types of cyber security data. This will help cyber experts to implement a basic machine learning pipeline from beginning to end. This proposed research work is our first attempt to achieve this goal. The proposed toolkit is a suite of program modules, data sets, and tutorials supporting research and teaching in cyber security and defense. An example of use cases is presented and discussed. Survey results of students using some of the modules in the library are also presented. Full article

(This article belongs to the Special Issue Machine Learning with Python)

► Show Figures

Figure 1

13 pages, 1535 KB

Open AccessArticle

Spectral Normalization for Domain Adaptation

by Liquan Zhao and Yan Liu

Information 2020, 11(2), 68; https://doi.org/10.3390/info11020068 - 27 Jan 2020

Cited by 4 | Viewed by 4118

Abstract

The transfer learning method is used to extend our existing model to more difficult scenarios, thereby accelerating the training process and improving learning performance. The conditional adversarial domain adaptation method proposed in 2018 is a particular type of transfer learning. It uses the domain discriminator to identify which images the extracted features belong to. The features are obtained from the feature extraction network. The stability of the domain discriminator directly affects the classification accuracy. Here, we propose a new algorithm to improve the predictive accuracy. First, we introduce the Lipschitz constraint condition into domain adaptation. If the constraint condition can be satisfied, the method will be stable. Second, we analyze how to make the gradient satisfy the condition, thereby deducing the modified gradient via the spectrum regularization method. The modified gradient is then used to update the parameter matrix. The proposed method is compared to the ResNet-50, deep adaptation network, domain adversarial neural network, joint adaptation network, and conditional domain adversarial network methods using the datasets that are found in Office-31, ImageCLEF-DA, and Office-Home. The simulations demonstrate that the proposed method has a better performance than other methods with respect to accuracy. Full article

(This article belongs to the Special Issue Machine Learning with Python)

► Show Figures

Figure 1

12 pages, 2194 KB

Open AccessArticle

Kernel-Based Ensemble Learning in Python

by Benjamin Guedj and Bhargav Srinivasa Desikan

Information 2020, 11(2), 63; https://doi.org/10.3390/info11020063 - 25 Jan 2020

Cited by 3 | Viewed by 5178

Abstract

We propose a new supervised learning algorithm for classification and regression problems where two or more preliminary predictors are available. We introduce KernelCobra, a non-linear learning strategy for combining an arbitrary number of initial predictors. KernelCobra builds on the COBRA algorithm introduced by Biau et al. (2016), which combined estimators based on a notion of proximity of predictions on the training data. While the COBRA algorithm used a binary threshold to declare which training data were close and to be used, we generalise this idea by using a kernel to better encapsulate the proximity information. Such a smoothing kernel provides more representative weights to each of the training points which are used to build the aggregate and final predictor, and KernelCobra systematically outperforms the COBRA algorithm. While COBRA is intended for regression, KernelCobra deals with classification and regression. KernelCobra is included as part of the open source Python package Pycobra (0.2.4 and onward), introduced by Srinivasa Desikan (2018). Numerical experiments were undertaken to assess the performance (in terms of pure prediction and computational complexity) of KernelCobra on real-life and synthetic datasets. Full article

(This article belongs to the Special Issue Machine Learning with Python)

► Show Figures

Figure 1

Review

Jump to: Research

44 pages, 1021 KB

Open AccessReview

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

by Sebastian Raschka, Joshua Patterson and Corey Nolet

Information 2020, 11(4), 193; https://doi.org/10.3390/info11040193 - 4 Apr 2020

Cited by 427 | Viewed by 98346

Abstract

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward. Full article

(This article belongs to the Special Issue Machine Learning with Python)

► Show Figures

Journal Menu

Journal Browser

Machine Learning with Python

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (7 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI