# Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. ML with Low Quality Datasets: State-of-the-Art and Recent Progress

#### 1.1.1. Classical Data Augmentation

#### 1.1.2. Classical Approaches to ML with Incomplete Data

#### 1.2. Mathematical Notation and Definitions

## 2. Methods

#### 2.1. A Unified View of Data Decomposition Models for ML

#### 2.2. Subspace Approximation (PCA)

#### 2.3. Sparse Decomposition (SD)

#### 2.4. Empirical Mode Decomposition (EMD)

- Determine the local maxima and minima of the signal $x\left(t\right)$.
- Calculate the upper (lower) envelope by interpolating the local maxima (minima) points. The interpolation can be carried out in different ways (linear interpolation, spline interpolation, etc.), which could lead to slightly different results.
- Calculate the local mean $m\left(t\right)$ by averaging the upper and lower envelopes.
- Calculate the first IMF candidate ${h}_{1}\left(t\right)=x\left(t\right)-m\left(t\right)$.
- Checks whether candidate ${h}_{1}\left(t\right)$ meets the criteria to be an IMF:
- If ${h}_{1}\left(t\right)$ meets the criteria, define the first IMF as ${c}_{1}\left(t\right)={h}_{1}\left(t\right)$.
- If ${h}_{1}\left(t\right)$ does not meet the criteria, set $x\left(t\right)={h}_{1}\left(t\right)$ and repeat from step 1

#### 2.5. Tensor Decomposition (TD)

**Low-rank Tucker decomposition:**when the core tensor is much smaller than the original, i.e., ${R}_{n}\ll {I}_{n}$ [47,48] (see Figure 1e).

**Sparse Tucker decomposition:**when core tensor is of the same size or larger than tensor $\underline{\mathbf{X}}$ but it is sparse as illustrated in Figure 1g. In this case, by looking at Equation (7), we conclude that the Sparse Tucker model corresponds to the classical Sparse Coding model of (4) with a dictionary that is obtained as the Kronecker product of three mode dictionaries, i.e., $\mathsf{\Phi}={\mathbf{A}}_{3}\otimes {\mathbf{A}}_{2}\otimes {\mathbf{A}}_{1}$ [49,50]. Mode dictionaries can be chosen from classical sparsifying transforms such as wavelets, cosine transform and others or, if enough data is available, they can be learned from a dataset, which usually provides higher levels of sparsity and compression. A Kronecker dictionary learning algorithm was introduce in [50] and later a variant with orthogonality constraints was proposed in [51].

#### 2.6. Comparison of Methods for ML with Low-Quality Datasets

## 3. Results

#### 3.1. Brain Signal Classification

#### 3.1.1. BCI with Missing/Corrupted Measurements

#### 3.1.2. Efficient Data Augmentation for BCI

- Randomly select N frames from the set of frames belonging to the selected class.
- Decompose, using EMD, each one of the N frames, generating a set of IMFs per channel and frame.
- Then, select the first IMF from the first selected frame (one per channel and keeping the same position for each channel), the second IMF from the second selected frame, and successively until the Nth frame, which contributes with its Nth IMF.
- Add up all the IMFs corresponding to the same channel to build each new EEG channel of the new artificial frame.

#### 3.1.3. Epileptic Focal Detection with Limited Data

- Randomly choose seven iEEG signals from the dataset and apply the DCT to obtain the spectrum.
- Segment the spectrum into the seven physiological frequency bands (Delta: 0–4 Hz, Theta: 4–8 Hz, Alpha: 8–13 Hz, Beta: 13–30 Hz, Gamma: 30–80 Hz, Ripple: 80–150 Hz, and Fast Ripple: 150–Nyquist Hz), extract one frequency band of each of the decompositions, from lowest to highest frequencies, and merge the seven extracted components (frequency bands) to create a new artificial spectrum. For example, we can extract the delta, the theta, the alpha, the gamma, the ripple, and the fast ripple from the first, the second, the third, the fourth, the fifth, the sixth, and the seventh signal, respectively.
- Apply the inverse DCT to the artificial spectrum in the frequency domain to obtain an artificial signal in the time-domain.

#### 3.2. Classification of Noisy Faces

#### 3.3. Scada Data Completion in Water Networks

## 4. Conclusions and Discussion

- The decomposition methods reviewed in this work for imputation of missing/corrupted values do not exploit the class label information in a supervised learning scenario. A possible further improvement of current methods is to incorporate label information into the decomposition models. We believe that missing data values could be better recovered if the class label of the corresponding data sample is known.
- EMD based data augmentation was developed in an ad-hoc fashion. We believe that more theoretical insights could be explored allowing future improvements, for example, by re-designing the way that IMFs are calculated in order to produce class-preserving artificial samples.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Harari, Y.N. Reboot for the AI revolution. Nat. Publ. Group
**2017**, 550, 324–327. [Google Scholar] [CrossRef] - Fatourechi, M.; Bashashati, A.; Ward, R.K.; Birch, G.E. EMG and EOG artifacts in brain computer interface systems: A survey. Clin. Neurophysiol.
**2007**, 118, 480–494. [Google Scholar] [CrossRef] - Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Generative Image Inpainting With Contextual Attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5505–5514. [Google Scholar]
- Zhang, M.; Chen, Y. Inductive Matrix Completion Based on Graph Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 4–7 May 2020. [Google Scholar]
- Mirkes, E.M.; Coats, T.J.; Levesley, J.; Gorban, A.N. Handling missing data in large healthcare dataset: A case study of unknown trauma outcomes. Comput. Biol. Med.
**2016**, 75, 203–216. [Google Scholar] [CrossRef] [PubMed][Green Version] - Schölkopf, B.; Burges, C.; Vapnik, V. Incorporating Invariances in Support Vector Learning Machines. ICANN
**1996**, 1112, 47–52. [Google Scholar] - Decoste, D.; Schölkopf, B. Training Invariant Support Vector Machines. Mach. Learn.
**2002**, 46, 161–190. [Google Scholar] [CrossRef] - Cireşan, D.C.; Meier, U.; Gambardella, L.M.; Schmidhuber, J. Deep, big, simple neural nets for handwritten digit recognition. Neural Comput.
**2010**, 22, 3207–3220. [Google Scholar] [CrossRef][Green Version] - Dosovitskiy, A.; Fischer, P.; Springenberg, J.T.; Riedmiller, M.; Brox, T. Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell.
**2015**, 38, 1734–1747. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ratner, A.J.; Ehrenberg, H.R.; Hussain, Z.; Dunnmon, J.; Ré, C. Learning to Compose Domain-Specific Transformations for Data Augmentation. Adv. Neural Inf. Process. Syst.
**2017**, 30, 3239–3249. [Google Scholar] - Uhlich, S.; Porcu, M.; Giron, F.; Enenkl, M.; Kemp, T.; Takahashi, N.; Mitsufuji, Y. Improving music source separation based on deep neural networks through data augmentation and network blending. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 261–265. [Google Scholar]
- Lee, M.B.; Kim, Y.H.; Park, K.R. Conditional Generative Adversarial Network- Based Data Augmentation for Enhancement of Iris Recognition Accuracy. IEEE Access
**2019**, 7, 122134–122152. [Google Scholar] [CrossRef] - Hu, T.; Tang, T.; Chen, M. Data Simulation by Resampling—A Practical Data Augmentation Algorithm for Periodical Signal Analysis-Based Fault Diagnosis. IEEE Access
**2016**, 7, 125133–125145. [Google Scholar] [CrossRef] - Xie, F.; Wen, H.; Wu, J.; Hou, W.; Song, H.; Zhang, T.; Liao, R.; Jiang, Y. Data Augmentation for Radio Frequency Fingerprinting via Pseudo-Random Integration. IEEE Trans. Emerg. Top. Comput. Intell.
**2019**, 4, 1–11. [Google Scholar] [CrossRef] - Ding, J.; Chen, B.; Liu, H.; Huang, M. Convolutional Neural Network With Data Augmentation for SAR Target Recognition. IEEE Geosci. Remote Sens. Lett.
**2016**, 13, 1–5. [Google Scholar] [CrossRef] - Dao, T.; Gu, A.; Ratner, A.; Smith, V.; De Sa, C.; Ré, C. A Kernel Theory of Modern Data Augmentation. Proc. Mach. Learn. Res.
**2019**, 97, 1528–1537. [Google Scholar] [PubMed] - García-Laencina, P.J.; Sancho-Gómez, J.L.; Figueiras-Vidal, A.R. Pattern classification with missing data: A review. Neural Comput. Appl.
**2009**, 19, 263–282. [Google Scholar] [CrossRef] - Little, R.J.A.; Rubin, D.B. Stat. Anal. Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Batista, G.E.A.P.A.; Monard, M.C. A Study of K-Nearest Neighbour as an Imputation Method. Hybrid Intell. Syst.
**2002**, 30, 251–260. [Google Scholar] - Fessant, F.; Midenet, S. Self-Organising Map for Data Imputation and Correction in Surveys. Neural Comput. Appl.
**2002**, 10, 300–310. [Google Scholar] [CrossRef] - Yoon, S.Y.; Lee, S.Y. Training algorithm with incomplete data for feed-forward neural networks. Neural Process. Lett.
**1999**, 10, 171–179. [Google Scholar] [CrossRef] - Bengio, Y.; Gingras, F. Recurrent Neural Networks for Missing or Asynchronous Data. Adv. Neural Inf. Process. Syst.
**1995**, 8, 395–401. [Google Scholar] - Ghahramani, Z.; Jordan, M.I. Supervised learning from incomplete data via an EM approach. Adv. Neural Inf. Process. Syst.
**1994**, 6, 120–127. [Google Scholar] - Goldberg, A.B.; Zhu, X.; Recht, B.; Xu, J.M.; Nowak, R.D. Transduction with Matrix Completion—Three Birds with One Stone. Adv. Neural Inf. Process. Syst.
**2010**, 23, 757–765. [Google Scholar] - Hazan, E.; Livni, R.; Mansour, Y. Classification with Low Rank and Missing Data. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
- Huang, S.J.; Xu, M.; Xie, M.K.; Sugiyama, M.; Niu, G.; Chen, S. Active Feature Acquisition with Supervised Matrix Completion. arXiv
**2018**, arXiv:1802.05380. [Google Scholar] - Smieja, M.; Struski, L.; Tabor, J.; Zielinski, B.; Spurek, P. Processing of missing data by neural networks. Adv. Neural Inf. Process. Syst.
**2018**, 31, 2719–2729. [Google Scholar] - S, K.P.F.R. LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci.
**1901**, 2, 559–572. [Google Scholar] - Karhunen, K. Über lineare Methoden in der Wahrscheinlichkeitsrechnung; Annales Academiae Scientiarum: Helsinki, Sana, 1947. [Google Scholar]
- Loève, M. Probability Theory; Van Nostrand: Princeton, NJ, USA, 1963. [Google Scholar]
- Bruckstein, A.M.; Donoho, D.L.; Elad, M. From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images. SIAM Rev.
**2009**, 51, 34–81. [Google Scholar] [CrossRef][Green Version] - Elad, M.; Figueiredo, M.A.T.; Ma, Y. On the Role of Sparse and Redundant Representations in Image Processing. Proc. IEEE
**2010**, 98, 972–982. [Google Scholar] [CrossRef] - Davis, G.M.; Mallat, S.G.; Zhang, Z. Adaptive Time-frequency Decompositions. Opt. Eng.
**1994**, 33, 2183. [Google Scholar] - Tropp, J.A.; Gilbert, A.C. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit. Inst. Electr. Electron. Eng. Trans. Inf. Theory
**2007**, 53, 4655–4666. [Google Scholar] [CrossRef][Green Version] - Needell, D.; Tropp, J. CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples. Appl. Comput. Harmon. Anal.
**2009**, 26, 301–321. [Google Scholar] [CrossRef][Green Version] - Chen, S.; Donoho, D.; Saunders, M. Atomic Decomposition by Basis Pursuit. SIAM Rev.
**2001**, 43, 129–159. [Google Scholar] [CrossRef][Green Version] - Tropp, J.A.; Wright, S.J. Computational Methods for Sparse Solution of Linear Inverse Problems. Proc. IEEE
**2010**, 98, 948–958. [Google Scholar] [CrossRef][Green Version] - Elad, M.; Aharon, M. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. Image Process. IEEE Trans.
**2006**, 15, 3736–3745. [Google Scholar] [CrossRef] [PubMed] - Mairal, J.; Bach, F.R.; Ponce, J.; Sapiro, G. Online Dictionary Learning for Sparse Coding. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), Montreal, QC, Canada, 14–18 June 2009; pp. 689–696. [Google Scholar]
- Donoho, D.L. Compressed sensing. Inst. Electr. Electron. Eng. Trans. Inf. Theory
**2006**, 52, 1289–1306. [Google Scholar] [CrossRef] - Candès, E.; Wakin, M. An Introduction to Compressive Sampling. Signal Process. Mag. IEEE
**2008**, 25, 21–30. [Google Scholar] [CrossRef] - Bobin, J.; Starck, J.L.; Fadili, J.; Moudden, Y. Sparsity and Morphological Diversity in Blind Source Separation. Image Process. IEEE Trans.
**2007**, 16, 2662–2674. [Google Scholar] [CrossRef][Green Version] - Elad, M.; Starck, J.L.; Querre, P.; Donoho, D.L. Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal.
**2005**, 19, 340–358. [Google Scholar] [CrossRef][Green Version] - Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and nOn-Stationary Time Series Analysis; Royal Society of London Proceedings Series A; The Royal Society: London, UK, 1998; pp. 903–998. [Google Scholar]
- Tucker, L.R. Some mathematical notes on three-mode factor analysis. Psychometrika
**1966**, 31, 279–311. [Google Scholar] [CrossRef] - Kolda, T.; Bader, B. Tensor decompositions and applications. SIAM Rev.
**2009**, 51, 455–500. [Google Scholar] [CrossRef] - Cichocki, A.; Mandic, D.; De Lathauwer, L.; Zhou, G.; Zhao, Q.; Caiafa, C.; Phan, A.H. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Process. Mag.
**2015**, 32, 145–163. [Google Scholar] [CrossRef][Green Version] - Caiafa, C.F.; Cichocki, A. Computing sparse representations of multidimensional signals using Kronecker bases. Neural Comput.
**2013**, 25, 186–220. [Google Scholar] [CrossRef] - Caiafa, C.F.; Cichocki, A. Multidimensional compressed sensing and their applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2013**, 3, 355–380. [Google Scholar] [CrossRef] - Huang, J.; Zhou, G.; Yu, G. Orthogonal tensor dictionary learning for accelerated dynamic MRI. Med. Biol. Eng. Comput.
**2019**, 57, 1933–1946. [Google Scholar] [CrossRef] [PubMed] - Dinares-Ferran, J.; Ortner, R.; Guger, C.; Solé-Casals, J. A New Method to Generate Artificial Frames Using the Empirical Mode Decomposition for an EEG-Based Motor Imagery BCI. Front. Neurosci.
**2018**, 12, 1–9. [Google Scholar] [CrossRef] [PubMed][Green Version] - Zhang, Z.; Duan, F.; Solé-Casals, J.; Dinares-Ferran, J.; Cichocki, A.; Yang, Z.; Sun, Z. A Novel Deep Learning Approach With Data Augmentation to Classify Motor Imagery Signals. IEEE Access
**2019**, 7, 15945–15954. [Google Scholar] [CrossRef] - Classification of Epileptic IEEG Signals by CNN and Data Augmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2020; pp. 926–930.
- Al-Baddai, S.; Marti-Puig, P.; Gallego-Jutglà, E.; Al-Subari, K.; Tomé, A.M.; Ludwig, B.; Lang, E.W.; Solé-Casals, J. A recognition-verification system for noisy faces based on an empirical mode decomposition with Green’s functions. Soft Comput.
**2019**, 24, 3809–3827. [Google Scholar] [CrossRef] - Akter, M.S.; Islam, M.R.; Iimura, Y.; Sugano, H.; Fukumori, K.; Wang, D.; Tanaka, T.; Cichocki, A. Multiband entropy-based feature-extraction method for automatic identification of epileptic focus based on high-frequency components in interictal iEEG. Sci. Rep.
**2020**, 10, 7044. [Google Scholar] [CrossRef] [PubMed] - Solé-Casals, J.; Caiafa, C.F.; Zhao, Q.; Cichocki, A. Brain-Computer Interface with Corrupted EEG Data: A Tensor Completion Approach. Cogn. Comput.
**2018**, 10, 1062–1074. [Google Scholar] [CrossRef][Green Version] - Acar, E.; Dunlavy, D.M.; Kolda, T.G.; Mørup, M. Scalable tensor factorizations for incomplete data. Chemom. Intell. Lab. Syst.
**2011**, 106, 41–56. [Google Scholar] [CrossRef][Green Version] - Liu, J.; Musialski, P.; Wonka, P.; Ye, J. Tensor Completion for Estimating Missing Values in Visual Data. IEEE Trans. Pattern Anal. Mach. Intell.
**2012**, 35, 208–220. [Google Scholar] [CrossRef] - Zhao, Q.; Zhang, L.; Cichocki, A. Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination. IEEE Trans. Pattern Anal. Mach. Intell.
**2015**, 37, 1751–1763. [Google Scholar] [CrossRef][Green Version] - Marti-Puig, P.; Martí-Sarri, A.; Serra-Serra, M. Different Approaches to SCADA Data Completion in Water Networks. Water
**2019**, 11, 1023. [Google Scholar] [CrossRef][Green Version] - Marti-Puig, P.; Martí-Sarri, A.; Serra-Serra, M. Double Tensor-Decomposition for SCADA Data Completion in Water Networks. Water
**2020**, 12, 80. [Google Scholar] [CrossRef][Green Version] - Ramoser, H.; Müller-Gerking, J.; Pfurtscheller, G. Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans. Rehabil. Eng. Publ. IEEE Eng. Med. Biol. Soc.
**2000**, 8, 441–446. [Google Scholar] [CrossRef] [PubMed][Green Version] - Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol.
**2013**, 49, 764–766. [Google Scholar] [CrossRef][Green Version] - Andrzejak, R.G.; Schindler, K.; Rummel, C. Nonrandomness, nonlinear dependence, and nonstationarity of electroencephalographic recordings from epilepsy patients. Phys. Rev. E
**2012**, 86, 046206. [Google Scholar] [CrossRef][Green Version] - Haibo, H.; Yang, B.; Garcia, E.A.; Shutao, L. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Liu, X.; Tanaka, M.; Okutomi, M. Single-Image Noise Level Estimation for Blind Denoising. Image Process. IEEE Trans.
**2013**, 22, 5226–5237. [Google Scholar] [CrossRef] - Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. In Proceedings of the Sixth International Conference on Computer Vision, Washington, DC, USA, 4–7 January 1998; pp. 839–846. [Google Scholar]
- Gonzalez, R.C.; Woods, R.E. Digital Image Processing; Prentice Hall: Upper Saddle River, NJ, USA, 2008. [Google Scholar]
- Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
- Lim, J.S. Two-Dimensional Signal and Image Processing; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1990. [Google Scholar]
- Al-Baddai, S.; Al-Subari, K.; Tom, A.M.; Casals, J.S.; Lang, E.W. A Green’s Function-Based Bi-Dimensional Empirical Mode Decomposition. Inf. Sci.
**2016**, 348, 1–17. [Google Scholar] [CrossRef]

**Figure 1.**Decomposition models. (

**a**)

**General linear model**: a collection of vector data samples organized as columns of a matrix dataset $\mathbf{X}$ is approximated by the product of matrices $\mathsf{\Phi}$ and $\mathbf{A}$. (

**b**)

**Subspace approximation**: all vectors in the dataset are approximated by linear combination of few vectors (principal components). (

**c**)

**Sparse coding**: each vector in the dataset is approximated by the linear combination of atoms (columns of a dictionary $\mathsf{\Phi}$). In both, (

**b**,

**c**), the optimal choice of matrix $\mathsf{\Phi}$ can be computed from the dataset itself by means of the SVD and an dictionary learning algorithm, respectively. (

**d**)

**EMD**: every single signal is decomposed as a sum of characteristic modes. Tensor decomposition models such as

**Low-rank Tucker**(

**e**),

**Low-rank CP**(

**f**) and

**Sparse Tucker**(

**g**) can be written as sum of rank-1 tensors (

**h**).

**Figure 2.**Training a BCI classifier (LDA) with noisy/missing EEG measurements. (

**a**) Preprocessing steps: first, the positions in which the data is missed/corrupted are identified; then, a mask is created to ignore values in those positions; and finally, the tensor model reconstructs the missing data. (

**b**) Results with randomly missing entries. (

**c**) Results with random missing channels. (Figure adapted from [57]).

**Figure 3.**EEG data augmentation: (

**a**) For each new EEG signal to be generated, N available EEG signals are randomly selected and their EMDs are computed. (

**b**) To generate an artificial EEG signal, IMFs from different signals are combined.

**Figure 4.**An artificial signal generation with the DCT. (

**a**) Seven intracranial iEEG signals at either focal or non-focal area. (

**b**) DCT coefficients in the spectrum-domain. The spectra are segmented into seven physiological sub-bands, and the sub-band components extracted from each spectrum are merged to create an artificial spectrum. (

**c**) The inverse DCT leads to the resulting artificial signal.

**Figure 5.**(

**a**) The new proposed approach to eliminate the noise and improve the classification accuracy is based on the GiT-BEMD decomposition. The high frequency IMFs are discarded and the (noiseless) image is reconstructed by summing up the rest of the modes. This is the image that will feed the classifier. (

**b**) Comparison of classification results using a Support Vector Machine (SVM) and K-Nearest Neighbor (kNN) classifiers applied to noisy, filtered faces (Gaussian, Mean, Median) and GiT-BEMD processed faces.

**Figure 6.**Data tensorization of a 3 week tensor with 200 samples of lost data bursts. In (

**a**) the green line shows the original data, and the red line shows the lost burst. The soft blue window shows the data introduced in burst-centered tensor, which forces the burst to be in the center of the window. Panels (

**b**) shows how the continuous flow of data in the soft blue window is fragmented to be allocated in the tensor as shown in panel (

**c**).

**Table 1.**Comparison of methods for Machine Learning (ML) problems with low-quality datasets. Article sections in which these methods are discussed are noted in the first column and relevant references are included in the last column.

Method | Characteristics | Shortcomings | Advantages | Application | References |
---|---|---|---|---|---|

Class preserving transforms (Section 1.1.1) | Ad-hoc; mostly images oriented but some extensions to other types of data were explored | Limited theory available; difficult to apply to arbitrary type datasets | Easy to use; widely available in deep learning platforms | Data augmentation | [7,8,9,10,11,12,13,14,15,16,17] |

Empirical Mode Decomposition (EMD) based data generation (Section 2.4, Section 3.1.2, Section 3.1.3 and Section 3.2) | Ad-hoc; based on the manipulation and recombination of Intrinsic Mode Functions (IMFs); | Lack of theoretical ground | Easy to use; capture dataset discriminative features; denoising power | Electroencephalography (EEG)/ invasive EEG (iEEG) data augmentation and denoising | [45,52,53,54,55] |

Transform domain based data generationSection 3.1.3) | Ad-hoc; based on the manipulation and recombination of spectrum domain components obtained by Discrete Cosine Transform (DCT), Wavelets, etc. | Lack of theoretical ground | Easy to use; capture dataset discriminative features | iEEG data augmentation | [45,52,53,56] |

Statistical imputation (Section 1.1.2) | Preprocessing step in ML; exploit statistical properties of datasamples; wide variety of methods, from simple ones (mean) to more sophisticated (regression, k-Nearest Neighbor (kNN), Self Organization Map (SOM), etc.) | Does not use the class label information of data samples | Computationally efficient | ML with incomplete or corrupted data | [18,19,20,21,22,23] |

Probabilistic modelling (Section 1.1.2) | Gaussian Mixture Model (GMM) as data model; Bayesian classification; fitting model and classifiers in an Expectation-Maximization (EM) fashion; can be adapted to deep neural networks | Computationally expensive | Incorporates class label information of data samples; elegant theoretical approach | ML with incomplete or corrupted data | [19,24,28] |

Low-rank matrix completion (Section 1.1.2) | Based on Singular Value Decomposition (SVD) | Computationally very expensive; not suitable for complex boundary functions | Incorporates class label information of data samples | ML with incomplete or corrupted data | [25,26,27] |

Tensor decomposition (TD) based imputation (Section 2.5, Section 3.1.1 and Section 3.3) | Preprocessing step in ML; based on low-rank TDs (e.g., Tucker, CANDECOMP/PARAFAC (CP), etc.) or sparse TDs | Does not use the label information of data samples | Exploits intricate relationship among modes in multidimensional data | ML with incomplete or corrupted data | [50,57,58,59,60,61,62] |

**Table 2.**Dispersion ratio R computed in seven subjects (S01-S07) with Equation (8) for right (R) and left (L) classes at different levels of used artificial frames (AF). Results with $R>3$ are highlighted in red and with $2<R<3$ in orange.

S01 | S02 | S03 | S04 | S05 | S06 | S07 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

AF(%) | R | L | R | L | R | L | R | L | R | L | R | L | R | L |

2.5 | 0.12 | 0.67 | 0.22 | 0.64 | 0.58 | 1.27 | 0.32 | 0.31 | 0.32 | 0.27 | 0.33 | 0.64 | 0.34 | 0.69 |

5.0 | 0.05 | 1.03 | 0.82 | 0.56 | 1.11 | 1.02 | 0.46 | 0.45 | 0.18 | 0.35 | 0.47 | 0.83 | 0.01 | 0.63 |

7.5 | 0.29 | 0.88 | 1.03 | 0.07 | 1.06 | 1.51 | 0.51 | 0.51 | 0.00 | 0.02 | 1.17 | 1.49 | 0.46 | 0.62 |

10.0 | 0.37 | 1.13 | 0.99 | 0.11 | 1.19 | 1.75 | 0.80 | 0.46 | 0.38 | 0.08 | 1.04 | 1.66 | 0.49 | 0.84 |

12.5 | 0.24 | 0.94 | 1.42 | 0.04 | 1.89 | 1.86 | 1.00 | 0.44 | 0.46 | 0.27 | 0.87 | 1.52 | 0.40 | 0.85 |

25.0 | 0.09 | 1.44 | 2.79 | 0.44 | 2.13 | 1.94 | 1.28 | 0.61 | 0.96 | 0.78 | 0.71 | 2.09 | 0.51 | 1.28 |

37.5 | 0.11 | 1.55 | 3.12 | 0.41 | 1.97 | 2.01 | 1.20 | 0.69 | 1.07 | 1.18 | 0.57 | 2.66 | 0.73 | 1.92 |

50.0 | 0.15 | 1.45 | 2.86 | 1.00 | 2.18 | 2.68 | 1.27 | 1.06 | 1.42 | 1.23 | 0.62 | 2.76 | 0.73 | 1.86 |

**Table 3.**Algorithms’ performance in terms of the MSE per sample. Best results are indicated in bold text.

Method | Weeks | MSE/Sample | |
---|---|---|---|

Burst Length = 100 | Burst Length = 200 | ||

Forward & Backward Predictors | - | 1.11 | 2.23 |

SingleDecomp—CP | 3 | 0.87 | 1.78 |

SingleDecomp—CP | 7 | 0.80 | 1.58 |

SingleDecomp—TK | 3 | 0.80 | 1.43 |

SingleDecomp—TK | 7 | 0.71 | 1.28 |

DoubleDecomp—CP | 3 | 0.55 | 1.05 |

DoubleDecomp—CP | 7 | 0.52 | 1.02 |

DoubleDecomp—TK | 3 | 0.55 | 1.04 |

DoubleDecomp—TK | 7 | 0.50 | 0.97 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Caiafa, C.F.; Solé-Casals, J.; Marti-Puig, P.; Zhe, S.; Tanaka, T. Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets. *Appl. Sci.* **2020**, *10*, 8481.
https://doi.org/10.3390/app10238481

**AMA Style**

Caiafa CF, Solé-Casals J, Marti-Puig P, Zhe S, Tanaka T. Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets. *Applied Sciences*. 2020; 10(23):8481.
https://doi.org/10.3390/app10238481

**Chicago/Turabian Style**

Caiafa, Cesar Federico, Jordi Solé-Casals, Pere Marti-Puig, Sun Zhe, and Toshihisa Tanaka. 2020. "Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets" *Applied Sciences* 10, no. 23: 8481.
https://doi.org/10.3390/app10238481