Albumentations: fast and flexible image augmentations

Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Moreover, the image processing speed varies in existing tools for image augmentation. We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks and show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations. The source code for Albumentations is made publicly available online at https://github.com/albu/albumentations


I. INTRODUCTION
Modern machine learning models, such as deep artificial neural networks, often have a very large number of parameters, which allows them to generalize well when trained on massive amounts of labeled data. In practice, such large labeled datasets are not always available for training, which leads to the elevated risk of overfitting. Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance on benchmark datasets [1]- [5].
While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Different domains, imaging modalities, and tasks may benefit from a wide range of different amounts and combinations of image transformations [4]- [6]. For example, very extensive image augmentations are typical in medical image analysis, where datasets are often small and expensive to acquire, and annotations are often sparse, compared to the natural images [7], [8]. Even on well studied benchmark datasets, different data augmentation strategies lead to performance variability. For example, image rotation is an effective data augmentation method on CIFAR-10, but not on MNIST, where it can negatively affect the network's ability to distinguish between handwritten digits 6 and 9 [6]. Thus, there is a need for flexible and rich image augmentation tools that will allow to apply combinations of a wide range of various transformations to the task in hand.
While the largest gains in efficiency of deep neural networks have come from computation with graphical processing units (GPUs) [9], which excel at the matrix and vector operations central to deep learning, data pre-processing and augmentation are typically done on a CPU. As the performance and the memory capacity of the GPU hardware steadily improve, the efficiency of data augmentation operations becomes increasingly important, such that GPUs are not idling while CPUs are preparing the next mini-batch of data to pass through the network. However, the image processing speed varies in existing tools for image augmentation.
In this paper we present Albumentations, a fast and flexible solution for image augmentations. Albumentations is a library based on a fast implementations of a large number of various image transform operations, but also is an easy-to-use wrapper around other augmentation libraries. It provides a simple yet powerful interface for different tasks, including image classification, segmentation, detection, etc. We demonstrate that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations.  Computer vision tasks are not limited to image classification, thus, the support of augmentations for other data formats is also important. Many spacial transforms implemented in Albumentations support operating on segmentation masks and bounding boxes, that are used in object detection and tracking. Fig. 2 shows an example of applying a combination of a horizontal flip and a random sized crop to an image from the Mapillary Vistas Dataset for Semantic Understanding of Street Scenes [10]. Besides the original image, it shows the result of transform application on both bounding box and instance mask annotations.

B. Satellite and aerial imagery
In the analysis of satellite and aerial images, transformations that preserve the shape of objects are typically used to avoid the distortions of rigid-shape objects, such as buildings. Such transform operations include cropping, rotations, reflections, and scaling. For example, on of the top-3 performing solutions in the DSTL Satellite Imagery Feature Detection challenge on Kaggle used combinations of random cropping and a random transformation from dihedral group Dih 4 [11]. Highperforming solutions to other challenges also implemented similar types of image augmentations, for example, for automatic road extraction [12] and for multi-class land segmentation [13] from satellite imagery. Fig. 3 provides examples of such image transforms applied to an image from the Inria Aerial Image Labeling dataset [14]. Corresponding binary masks of buildings present in the image are also transformed in the same way as the original image.

C. Biomedical image analysis
Image augmentations are intensively used in the analysis of biological and medical images due to the typically limited amount of available labeled data [7]. For example, combining pre-trained deep network architectures with multiple augmentation techniques enabled accurate detection of breast cancer from a very small set of histology images with less than 100 images per class [8]. Similarly, the use of medical image augmentations helped to improve the results of segmentation of hand radiographs and bone age assessment [15]. In medical computer vision tasks that deal with color images or videos, color transformations have also been shown to help deep networks to generalize better, for example, such for surgical [16] or endoscopic [17] video analysis. In addition to these commonly used transforms, operations like grid distortion and elastic transform often can be helpful, see Fig. 3, since medical imaging is often dealing with non-rigid structures that have shape variations.

III. BENCHMARKS
The quantitative comparison of image transformation speed performance for Albumentations and other commonly used image augmentation tools is presented in Table I. We included a framework-agnostic image augmentation library imgaug [18], as well as augmentations provided within Keras [19] and PyTorch [20] frameworks. For the most image operations, Albumentations is consistently faster than all alternatives.

IV. CONCLUSIONS
We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks ans show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations. The source code for Albumentations is made publicly available online at https://github.com/albu/ albumentations.