Machine Learning with Python

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (20 December 2019) | Viewed by 215686

Special Issue Editor


E-Mail Website
Guest Editor
Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
Interests: machine learning; deep learning; statistics; computational biology

Special Issue Information

Dear Colleagues,

We live in this day and age where quintillions of bytes of data are generated and collected every day. Around the globe, researchers and companies are leveraging these vast amounts of data in countless application areas, ranging from drug discovery to improving transportation with self-driving cars.

As we all know, Python evolved into the lingua franca of machine learning and artificial intelligence research over the last couple of years. What makes Python particularly attractive for us researchers is that it gives us access to a cohesive set of tools for scientific computing and is easy to teach and learn. Also, as a language that bridges many different technologies and different fields, Python fosters interdisciplinary collaboration. And besides making us more productive in our research, sharing tools we develop in Python has the potential to reach a wide audience and benefit the broader research community.

Fortunately, we have seen a surge in introductory and original teaching material that has been written about machine learning with Python in the last couple of years. This body of tutorials and case studies is enabling both young and interdisciplinary researchers to leverage the rich toolsets for machine learning research and data science applications available in Python. While most literature focuses on introductory topics, however, the focus of this special issue is on the implementation of new algorithms and methods implemented in Python and essential applications in the fields of data science, machine learning, and deep learning.

This Special Issue aims to collect a body of advanced literature written by experts that provides access to the state-of-the-art methodology developed with Python. The mission is to advance the existing body of literature by sharing contemporary, cutting edge research enabled by Python. Moreover, we aim to provide representative applications of new, state-of-the-art libraries that facilitate modern problem-solving as valuable resources for researchers to utilize machine learning at the leading edge.

Dr. Sebastian Raschka
Guest Editor

Keywords

  • AutoML
  • data processing pipelines
  • distributed training
  • reproducible data science
  • dimensionality reduction and feature selection
  • deep learning

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

32 pages, 4670 KiB  
Article
A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing
by Navdeep Gill, Patrick Hall, Kim Montgomery and Nicholas Schmidt
Information 2020, 11(3), 137; https://doi.org/10.3390/info11030137 - 29 Feb 2020
Cited by 15 | Viewed by 19070
Abstract
This manuscript outlines a viable approach for training and evaluating machine learning systems for high-stakes, human-centered, or regulated applications using common Python programming tools. The accuracy and intrinsic interpretability of two types of constrained models, monotonic gradient boosting machines and explainable neural networks, [...] Read more.
This manuscript outlines a viable approach for training and evaluating machine learning systems for high-stakes, human-centered, or regulated applications using common Python programming tools. The accuracy and intrinsic interpretability of two types of constrained models, monotonic gradient boosting machines and explainable neural networks, a deep learning architecture well-suited for structured data, are assessed on simulated data and publicly available mortgage data. For maximum transparency and the potential generation of personalized adverse action notices, the constrained models are analyzed using post-hoc explanation techniques including plots of partial dependence and individual conditional expectation and with global and local Shapley feature importance. The constrained model predictions are also tested for disparate impact and other types of discrimination using measures with long-standing legal precedents, adverse impact ratio, marginal effect, and standardized mean difference, along with straightforward group fairness measures. By combining interpretable models, post-hoc explanations, and discrimination testing with accessible software tools, this text aims to provide a template workflow for machine learning applications that require high accuracy and interpretability and that mitigate risks of discrimination. Full article
(This article belongs to the Special Issue Machine Learning with Python)
Show Figures

Figure 1

20 pages, 2877 KiB  
Article
Albumentations: Fast and Flexible Image Augmentations
by Alexander Buslaev, Vladimir I. Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin and Alexandr A. Kalinin
Information 2020, 11(2), 125; https://doi.org/10.3390/info11020125 - 24 Feb 2020
Cited by 1057 | Viewed by 43458
Abstract
Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve corresponding output labels. In computer vision, image augmentations have become a common implicit regularization technique to combat overfitting [...] Read more.
Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve corresponding output labels. In computer vision, image augmentations have become a common implicit regularization technique to combat overfitting in deep learning models and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations of flipping, rotating, scaling, and cropping. Moreover, image processing speed varies in existing image augmentation libraries. We present Albumentations, a fast and flexible open source library for image augmentation with many various image transform operations available that is also an easy-to-use wrapper around other augmentation libraries. We discuss the design principles that drove the implementation of Albumentations and give an overview of the key features and distinct capabilities. Finally, we provide examples of image augmentations for different computer vision tasks and demonstrate that Albumentations is faster than other commonly used image augmentation tools on most image transform operations. Full article
(This article belongs to the Special Issue Machine Learning with Python)
Show Figures

Figure 1

26 pages, 2450 KiB  
Article
Fastai: A Layered API for Deep Learning
by Jeremy Howard and Sylvain Gugger
Information 2020, 11(2), 108; https://doi.org/10.3390/info11020108 - 16 Feb 2020
Cited by 553 | Viewed by 70932
Abstract
fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims [...] Read more.
fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes: a new type dispatch system for Python along with a semantic type hierarchy for tensors; a GPU-optimized computer vision library which can be extended in pure Python; an optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4–5 lines of code; a novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training; a new data block API; and much more. We used this library to successfully create a complete deep learning course, which we were able to write more quickly than using previous approaches, and the code was more clear. The library is already in wide use in research, industry, and teaching. Full article
(This article belongs to the Special Issue Machine Learning with Python)
Show Figures

Figure 1

14 pages, 4034 KiB  
Article
Cyber Security Tool Kit (CyberSecTK): A Python Library for Machine Learning and Cyber Security
by Ricardo A. Calix, Sumendra B. Singh, Tingyu Chen, Dingkai Zhang and Michael Tu
Information 2020, 11(2), 100; https://doi.org/10.3390/info11020100 - 11 Feb 2020
Cited by 14 | Viewed by 11495
Abstract
The cyber security toolkit, CyberSecTK, is a simple Python library for preprocessing and feature extraction of cyber-security-related data. As the digital universe expands, more and more data need to be processed using automated approaches. In recent years, cyber security professionals have seen opportunities [...] Read more.
The cyber security toolkit, CyberSecTK, is a simple Python library for preprocessing and feature extraction of cyber-security-related data. As the digital universe expands, more and more data need to be processed using automated approaches. In recent years, cyber security professionals have seen opportunities to use machine learning approaches to help process and analyze their data. The challenge is that cyber security experts do not have necessary trainings to apply machine learning to their problems. The goal of this library is to help bridge this gap. In particular, we propose the development of a toolkit in Python that can process the most common types of cyber security data. This will help cyber experts to implement a basic machine learning pipeline from beginning to end. This proposed research work is our first attempt to achieve this goal. The proposed toolkit is a suite of program modules, data sets, and tutorials supporting research and teaching in cyber security and defense. An example of use cases is presented and discussed. Survey results of students using some of the modules in the library are also presented. Full article
(This article belongs to the Special Issue Machine Learning with Python)
Show Figures

Figure 1

13 pages, 1535 KiB  
Article
Spectral Normalization for Domain Adaptation
by Liquan Zhao and Yan Liu
Information 2020, 11(2), 68; https://doi.org/10.3390/info11020068 - 27 Jan 2020
Cited by 4 | Viewed by 3126
Abstract
The transfer learning method is used to extend our existing model to more difficult scenarios, thereby accelerating the training process and improving learning performance. The conditional adversarial domain adaptation method proposed in 2018 is a particular type of transfer learning. It uses the [...] Read more.
The transfer learning method is used to extend our existing model to more difficult scenarios, thereby accelerating the training process and improving learning performance. The conditional adversarial domain adaptation method proposed in 2018 is a particular type of transfer learning. It uses the domain discriminator to identify which images the extracted features belong to. The features are obtained from the feature extraction network. The stability of the domain discriminator directly affects the classification accuracy. Here, we propose a new algorithm to improve the predictive accuracy. First, we introduce the Lipschitz constraint condition into domain adaptation. If the constraint condition can be satisfied, the method will be stable. Second, we analyze how to make the gradient satisfy the condition, thereby deducing the modified gradient via the spectrum regularization method. The modified gradient is then used to update the parameter matrix. The proposed method is compared to the ResNet-50, deep adaptation network, domain adversarial neural network, joint adaptation network, and conditional domain adversarial network methods using the datasets that are found in Office-31, ImageCLEF-DA, and Office-Home. The simulations demonstrate that the proposed method has a better performance than other methods with respect to accuracy. Full article
(This article belongs to the Special Issue Machine Learning with Python)
Show Figures

Figure 1

12 pages, 2194 KiB  
Article
Kernel-Based Ensemble Learning in Python
by Benjamin Guedj and Bhargav Srinivasa Desikan
Information 2020, 11(2), 63; https://doi.org/10.3390/info11020063 - 25 Jan 2020
Cited by 2 | Viewed by 3891
Abstract
We propose a new supervised learning algorithm for classification and regression problems where two or more preliminary predictors are available. We introduce KernelCobra, a non-linear learning strategy for combining an arbitrary number of initial predictors. KernelCobra builds on the COBRA algorithm introduced by [...] Read more.
We propose a new supervised learning algorithm for classification and regression problems where two or more preliminary predictors are available. We introduce KernelCobra, a non-linear learning strategy for combining an arbitrary number of initial predictors. KernelCobra builds on the COBRA algorithm introduced by Biau et al. (2016), which combined estimators based on a notion of proximity of predictions on the training data. While the COBRA algorithm used a binary threshold to declare which training data were close and to be used, we generalise this idea by using a kernel to better encapsulate the proximity information. Such a smoothing kernel provides more representative weights to each of the training points which are used to build the aggregate and final predictor, and KernelCobra systematically outperforms the COBRA algorithm. While COBRA is intended for regression, KernelCobra deals with classification and regression. KernelCobra is included as part of the open source Python package Pycobra (0.2.4 and onward), introduced by Srinivasa Desikan (2018). Numerical experiments were undertaken to assess the performance (in terms of pure prediction and computational complexity) of KernelCobra on real-life and synthetic datasets. Full article
(This article belongs to the Special Issue Machine Learning with Python)
Show Figures

Figure 1

Review

Jump to: Research

44 pages, 1021 KiB  
Review
Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence
by Sebastian Raschka, Joshua Patterson and Corey Nolet
Information 2020, 11(4), 193; https://doi.org/10.3390/info11040193 - 04 Apr 2020
Cited by 206 | Viewed by 60190
Abstract
Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of [...] Read more.
Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward. Full article
(This article belongs to the Special Issue Machine Learning with Python)
Show Figures

Figure 1

Back to TopTop