Special Issue "Machine Learning with Python"

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 20 December 2019.

Special Issue Editor

Dr. Sebastian Raschka
E-Mail Website
Guest Editor
Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
Interests: machine learning; deep learning; statistics; computational biology

Special Issue Information

Dear Colleagues,

We live in this day and age where quintillions of bytes of data are generated and collected every day. Around the globe, researchers and companies are leveraging these vast amounts of data in countless application areas, ranging from drug discovery to improving transportation with self-driving cars.

As we all know, Python evolved into the lingua franca of machine learning and artificial intelligence research over the last couple of years. What makes Python particularly attractive for us researchers is that it gives us access to a cohesive set of tools for scientific computing and is easy to teach and learn. Also, as a language that bridges many different technologies and different fields, Python fosters interdisciplinary collaboration. And besides making us more productive in our research, sharing tools we develop in Python has the potential to reach a wide audience and benefit the broader research community.

Fortunately, we have seen a surge in introductory and original teaching material that has been written about machine learning with Python in the last couple of years. This body of tutorials and case studies is enabling both young and interdisciplinary researchers to leverage the rich toolsets for machine learning research and data science applications available in Python. While most literature focuses on introductory topics, however, the focus of this special issue is on the implementation of new algorithms and methods implemented in Python and essential applications in the fields of data science, machine learning, and deep learning.

This Special Issue aims to collect a body of advanced literature written by experts that provides access to the state-of-the-art methodology developed with Python. The mission is to advance the existing body of literature by sharing contemporary, cutting edge research enabled by Python. Moreover, we aim to provide representative applications of new, state-of-the-art libraries that facilitate modern problem-solving as valuable resources for researchers to utilize machine learning at the leading edge.

Dr. Sebastian Raschka
Guest Editor

Keywords

  • AutoML
  • data processing pipelines
  • distributed training
  • reproducible data science
  • dimensionality reduction and feature selection
  • deep learning

Published Papers

This special issue is now open for submission, see below for planned papers.

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Kernel-Based Ensemble Learning in Python
Authors: Benjamin Guedj 1, Bhargav Srinivasa Desikan 2
Affiliations: 1 Inria & University College London; 2 University of Chicago
Abstract: We propose a new supervised learning algorithm, for classification and regression problems where two or more preliminary predictors are available. We introduce KernelCobra, a non-linear learning strategy for combining an arbitrary number of initial predictors. KernelCobra builds on the COBRA algorithm introduced by Biau et al. (2016), which combined estimators based on a notion of proximity of predictions on the training data. While the COBRA algorithm used a binary threshold to declare which training data were close and to be used, we generalize this idea by using a kernel to better encapsulate the proximity information. Such a smoothing kernel provides more representative weights to each of the training points which are used to build the aggregate and final predictor, and KernelCobra systematically outperforms the COBRA algorithm. Our algorithm KernelCobra is included as part of the open source Python package Pycobra (0.2.4 and onward), introduced by Guedj and Srinivasa Desikan (2018). Numerical experiments assess the performance (in terms of pure prediction and computational complexity) of KernelCobra on real-life and synthetic datasets.
 
Title: Albumentations: Fast and Flexible Image Augmentations
Authors: Alexandr Kalinin et al.
Affiliation: University of Michigan
Abstract: Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output annotations. In the computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, their list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Moreover, the image processing speed varies in existing tools for image augmentation. We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks and show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations. The source code for Albumentations is made publicly available online.

Title: Increasing User Participation and Data Quality with Machine Learning
Author: Jared M. Moore
Affiliation: School of Computing and Information Systems, Grand Valley State University
Abstract: Honey bees are a critical pollinator for crops in North America.  To better support best management practices, the Bee Informed Partnership has partnered with beekeepers to electronically collect data from active hives including hive weight.  Hive weight and its change over time is indicative of hive health.  Changes in hive weights are typically gradual, but sudden changes are possible due to management or environmental conditions.   Developing automated machine learning models are further complicated due to the weight variation between hives and sudden weight-change events from beekeeper interaction with a hive.   Once data is collected, beekeepers can access the data in a web portal and annotate weight-change events. To-date, beekeepers have provided very few annotations.  The lack of annotations hinders model development to support best management practices.   In this paper, we present the process of implementing our initial model to address the lack of annotations.   Our results show that the model is able to predict most events that beekeepers have previously annotated and applying them to unannotated data identifies likely events. The model is now in production assisting beekeepers in annotating their data facilitating future model development for best management practices.

Title: Responsible Machine Learning: Interpretable Models, Post-hoc Explanation, and Disparate Impact Testing
Authors: Patrick Hall 1,2, Navdeep Gill 1, Kim Montgomery 1, and Nicholas Schmidt 3 
Affiliations: 1 H2O.ai; 2 George Washington University; 3 BLDS, LLC
Abstract: This text outlines a viable approach for training and evaluating complex machine learning systems for high-stakes, human-centered, or regulated applications using common Python programming tools. The accuracy and intrinsic interpretability of two types of constrained models, monotonic gradient boosting machines (M-GBM) and explainable neural networks (XNN), a deep learning architecture well-suited for structured data, are assessed on simulated datasets with known feature importance and sociological bias characteristics and on realistic, publicly available example datasets. For maximum transparency and the potential generation of personalized adverse action notices, the constrained models are analyzed using post-hoc explanation techniques including plots of individual conditional expectation (ICE) and global and local gradient-based or Shapley feature importance. The constrained model predictions are also tested for disparate impact and other types of sociological bias using straightforward group fairness measures. By combining innovations in interpretable models, post-hoc explanation, and bias testing with accessible software tools, this text aims to provide a template workflow for important machine learning applications that require high accuracy and interpretability and low disparate impact.
Back to TopTop