# The Additive Input-Doubling Method Based on the SVR with Nonlinear Kernels: Small Data Approach

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- the design of an SVR-based additive input-doubling method, which provides increase of the prediction accuracy of regression modeling in case of processing short and very short sets of medical data; procedures for its training and application are developed;
- two algorithmic implementations of the developed method are investigated based on the use of two different nonlinear SVR kernels (rbf and polynomial);
- the optimal parameters of the developed algorithms are experimentally determined; the highest prediction accuracy of the proposed algorithms is established compared to other machine learning methods of this class.

## 2. Related Works

- -
- ensemble learning;
- -
- numerical data augmentation.

## 3. Support Vector Machine

## 4. Proposed Method

#### 4.1. Machine Learning in the Case of Short Datasets Using Axial Symmetry of the Response Surface

#### 4.2. SVR-Based Additive Input-Doubling Method

#### 4.2.1. Training Mode

^{2}pairs of vectors ${\overline{x}}_{i}{\overline{x}}_{j}\to {z}_{i,j}^{augm},\text{\hspace{0.17em}}i=1,N;\text{\hspace{0.17em}}j=1,N;\text{\hspace{0.17em}}t=1,{N}^{2}$ (extensions by columns) that will be formed by combining all available vectors of the training dataset, where N is the number of existing observations (extensions by rows).

#### 4.2.2. Application Mode

- the mutual compensation of errors of various signs;
- the principles of ensemble learning by averaging the result.

## 5. Modeling and Results

## 6. Comparison and Discussion

## 7. Conclusions

- the development of input-doubling methods and additive input-doubling methods based on the use of a high-speed RBF-SGTM neural-like structure and its modifications [38]. This will reduce the duration of the training procedure of the developed methods;
- the development of a weighted input-doubling method and additive input-doubling method by replacing expression (15) with a neural network, in particular a non-iterative, corrective SGTM neural-like structure. This will allow the implementation of the procedure for weighing the results (15) instead of the usual summation, which will increase the prediction accuracy;
- the application of clustering and input doubling methods for efficient processing of middle-sized datasets;
- the evaluation of the designed method for the solution of other real tasks in different application areas using a large number of short datasets.

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A

**Figure A1.**Error values for all methods investigated using the second short dataset: (

**a**) RMSE values; (

**b**) MAE values.

## References

- Bodyanskiy, Y.; Pirus, A.; Deineko, A. Multilayer Radial-Basis Function Network and Its Learning. In Proceedings of the 2020 IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT), Zbarazh, Ukraine, 23–26 September 2020; Volume 1, pp. 92–95. [Google Scholar]
- Fedushko, S.; Gregus ml., M.; Ustyianovych, T. Medical Card Data Imputation and Patient Psychological and Behavioral Profile Construction. Procedia Comput. Sci.
**2019**, 160, 354–361. [Google Scholar] [CrossRef] - Chumachenko, D.; Sokolov, O.; Yakovlev, S. Fuzzy Recurrent Mappings in Multiagent Simulation of Population Dynamics Systems. IJC
**2020**, 19, 290–297. [Google Scholar] [CrossRef] - Vanpoucke, D.E.P.; van Knippenberg, O.S.J.; Hermans, K.; Bernaerts, K.V.; Mehrkanoon, S. Small Data Materials Design with Machine Learning: When the Average Model Knows Best. J. Appl. Phys.
**2020**, 128, 054901. [Google Scholar] [CrossRef] - Chumachenko, D.; Chumachenko, T.; Meniailov, I.; Pyrohov, P.; Kuzin, I.; Rodyna, R. On-Line Data Processing, Simulation and Forecasting of the Coronavirus Disease (COVID-19) Propagation in Ukraine Based on Machine Learning Approach. In Proceedings of the Data Stream Mining & Processing, Lviv, Ukraine, 21–25 August 2020; Springer: Cham, Switzerland, 2020; pp. 372–382. [Google Scholar]
- Data Analytics: A Small Data Approach. Available online: https://www.routledge.com/Data-Analytics-A-Small-Data-Approach/Huang-Deng/p/book/9780367609504 (accessed on 17 January 2021).
- Berezsky, O.; Melnyk, G.; Datsko, T.; Verbovy, S. An Intelligent System for Cytological and Histological Image Analysis. In Proceedings of the Experience of Designing and Application of CAD Systems in Microelectronics, Lviv, Ukraine, 24–27 February 2015; pp. 28–31. [Google Scholar]
- Hekler, E.B.; Klasnja, P.; Chevance, G.; Golaszewski, N.M.; Lewis, D.; Sim, I. Why We Need a Small Data Paradigm. BMC Med.
**2019**, 17, 133. [Google Scholar] [CrossRef][Green Version] - Fong, S.J.; Li, G.; Dey, N.; Gonzalez-Crespo, R.; Herrera-Viedma, E. Finding an Accurate Early Forecasting Model from Small Dataset: A Case of 2019-NCoV Novel Coronavirus Outbreak. IJIMAI
**2020**, 6, 132. [Google Scholar] [CrossRef] - Shaikhina, T.; Khovanova, N.A. Handling Limited Datasets with Neural Networks in Medical Applications: A Small-Data Approach. Artif. Intell. Med.
**2017**, 75, 51–63. [Google Scholar] [CrossRef] - Snow, D. DeltaPy: A Framework for Tabular Data Augmentation in Python; Social Science Research Network: Rochester, NY, USA, 2020. [Google Scholar]
- Carvajal, R.; Orellana, R.; Katselis, D.; Escárate, P.; Agüero, J.C. A Data Augmentation Approach for a Class of Statistical Inference Problems. PLoS ONE
**2018**, 13, e0208499. [Google Scholar] [CrossRef] - Porcu, S.; Floris, A.; Atzori, L. Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems. Electronics
**2020**, 9, 1892. [Google Scholar] [CrossRef] - Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data
**2019**, 6, 60. [Google Scholar] [CrossRef] - Salazar, A.; Vergara, L.; Safont, G. Generative Adversarial Networks and Markov Random Fields for Oversampling Very Small Training Sets. Expert Syst. Appl.
**2021**, 163, 113819. [Google Scholar] [CrossRef] - Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling Tabular Data Using Conditional GAN. arXiv
**2019**, arXiv:1907.00503. [Google Scholar] - Izonin, I.; Tkachenko, R.; Gregus, M.; Zub, K.; Lotoshunska, N. Input Doubling Method Based on SVR with RBF Kernel in Clinical Practice: Focus on Small Data. Procedia Comput. Sci.
**2021**, in press. [Google Scholar] - Izonin, I.; Tkachenko, R.; Horbal, N.; Greguš, M.; Verhun, V.; Tolstyak, Y. An Approach towards Numerical Data Augmentation and Regression Modeling Using Polynomial-Kernel-Based SVR. In Lecture Notes in Networks and Systems, Proceedings of the 2nd International Conference on Data Science and Applications (ICDSA 2021), Kolkata, India, 10–11 April 2021; Springer: Berlin, Germany, 2021; in press. [Google Scholar]
- Izonin, I.; Tkachenko, R.; Dronyuk, I.; Tkachenko, P.; Gregus, M.; Rashkevych, M. Predictive Modeling Based on Small Data in Clinical Medicine: RBF-Based Additive Input-Doubling Method. Math. Biosci. Eng.
**2021**, 18, 2599–2613. [Google Scholar] [CrossRef] - Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support Vector Regression Machines. Adv. Neural Inf. Process. Syst.
**1996**, 9, 155–161. [Google Scholar] - Setlak, G.; Bodyanskiy, Y.; Vynokurova, O.; Pliss, I. Deep Evolving GMDH-SVM-Neural Network and Its Learning for Data Mining Tasks. In Proceedings of the 2016 Federated Conference on Computer Science and Information Systems 2016, Gdansk, Poland, 11–14 September 2016; pp. 141–145. [Google Scholar]
- Lateh, M.A.; Muda, A.K.; Yusof, Z.I.M.; Muda, N.A.; Azmi, M.S. Handling a Small Dataset Problem in Prediction Model by Employ Artificial Data Generation Approach: A Review. J. Phys. Conf. Ser.
**2017**, 892, 012016. [Google Scholar] [CrossRef][Green Version] - Sklearn.Svm.SVR—Scikit-Learn 0.24.0 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html (accessed on 8 January 2021).
- Wang, Y.; Wang, B.; Zhang, X. A New Application of the Support Vector Regression on the Construction of Financial Conditions Index to CPI Prediction. Procedia Comput. Sci.
**2012**, 9, 1263–1272. [Google Scholar] [CrossRef] - Alwee, R.; Hj Shamsuddin, S.M.; Sallehuddin, R. Hybrid Support Vector Regression and Autoregressive Integrated Moving Average Models Improved by Particle Swarm Optimization for Property Crime Rates Forecasting with Economic Indicators. Sci. World J.
**2013**, 2013, 951475. [Google Scholar] [CrossRef] - Babichev, S.; Škvor, J. Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods. Diagnostics
**2020**, 10, 584. [Google Scholar] [CrossRef] - Babichev, S. An Evaluation of the Information Technology of Gene Expression Profiles Processing Stability for Different Levels of Noise Components. Data
**2018**, 3, 48. [Google Scholar] [CrossRef][Green Version] - Сеча За Сулкoвичем (Кальцій в Сечі Якісне Визначення Ступінь Пoмутніння) > Кoнсультація Лікаря Вищoї Категoрії в Клініці Median. Available online: https://median.kiev.ua/ua/poslugi/493-secha-za-sulkovichem-kaltsiy-v-sechi-yakisne-viznachennya-stupin-pomut (accessed on 13 March 2021).
- Cassiède, M.; Nair, S.; Dueck, M.; Mino, J.; McKay, R.; Mercier, P.; Quémerais, B.; Lacy, P. Assessment of 1H NMR-Based Metabolomics Analysis for Normalization of Urinary Metals against Creatinine. Clin. Chim. Acta
**2017**, 464, 37–43. [Google Scholar] [CrossRef] - R: Urine Analysis Data. Available online: https://vincentarelbundock.github.io/Rdatasets/doc/boot/urine.html (accessed on 12 December 2020).
- Hovorushchenko, T.O. Methodology of Evaluating the Sufficiency of Information for Software Quality Assessment According to ISO 25010. J. Inf. Organ. Sci. Online
**2018**, 42, 63–85. [Google Scholar] [CrossRef] - Shakhovska, N.; Yakovyna, V.; Kryvinska, N. An Improved Software Defect Prediction Algorithm Using Self-Organizing Maps Combined with Hierarchical Clustering and Data Preprocessing. In Proceedings of the Database and Expert Systems Applications, Bratislava, Slovakia, 14–17 September 2020; Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 414–424. [Google Scholar]
- Chukhrai, N.; Grytsai, O. Diagnosing the Efficiency of Cost Management of Innovative Processes at Machine-Building Enterprises. Actual Probl. Econ.
**2013**, 146, 75–80. [Google Scholar] - Auzinger, W.; Obelovska, K.; Stolyarchuk, R. A Modified Gomory-Hu Algorithm with DWDM-Oriented Technology. In Proceedings of the Large-Scale Scientific Computing, Sozopol, Bulgaria, 10–14 June 2019; Springer: Cham, Switzerland, 2019; pp. 547–554. [Google Scholar]
- Dronyuk, I.; Fedevych, O.; Lipinski, P. Ateb-Prediction Simulation of Traffic Using OMNeT++ Modeling Tools. In Proceedings of the 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 6–10 September 2016; pp. 96–98. [Google Scholar]
- Duriagina, Z.; Lemishka, I.; Litvinchev, I.; Marmolejo, J.A.; Pankratov, A.; Romanova, T.; Yaskov, G. Optimized Filling of a Given Cuboid with Spherical Powders for Additive Manufacturing. J. Oper. Res. Soc. China
**2020**, 1–16. [Google Scholar] [CrossRef] - Tkachenko, R.; Kutucu, H.; Izonin, I.; Doroshenko, A.; Tsymbal, Y. Non-Iterative Neural-like Predictor for Solar Energy in Libya. In Proceedings of the ICTERI 2018, Kyiv, Ukraine, 14–17 May 2018; Ermolayev, V., Suárez-Figueroa, M.C., Lawrynowicz, A., Palma, R., Yakovyna, V., Mayr, H.C., Nikitchenko, M., Spivakovsky, A., Eds.; CEUR-WS.org: Kyiv, Ukraine, 2018; Volume 2105, pp. 35–45. [Google Scholar]

**Figure 2.**The error values for both training and application modes when changing the number of epochs of the training algorithm. Two algorithmic implementations of the developed method are investigated: (

**a**) RMSE values for the proposed method based on the rbf kernel; (

**b**) RMSE values for the proposed method based on the polynomial kernel; (

**c**) MAE values for the proposed method based on the rbf kernel; (

**d**) MAE values for the proposed method based on the polynomial kernel.

**Figure 5.**Error values for the additive input-doubling method and the input-doubling method in the test mode when changing the number of epochs of the training algorithm. Investigation of two different algorithmic implementations of the studied methods, other things being equal: (

**a**) RMSE values for the investigated methods based on the rbf kernel; (

**b**) MAE values for the investigated methods based on the rbf kernel; (

**c**) RMSE values for the investigated methods based on the polynomial kernel; (

**d**) MAE values for the investigated methods based on the polynomial kernel.

Variable’s Title | MIN Value | MEAN Value | MAX Value |
---|---|---|---|

The pH reading of the urine | 4.76 | 6.042 | 7.94 |

The osmolarity of the urine | 187.00 | 613.61 | 1236.00 |

Indicator of the presence of calcium oxalate crystals | 0.00 | 0.436 | 1.00 |

The urea concentration in millimoles per liter | 10.00 | 264.141 | 620.00 |

The conductivity of the urine | 5.10 | 20.901 | 38.00 |

The specific gravity of the urine | 1.01 | 1.018 | 1.04 |

The calcium concentration in millimoles per liter | 0.17 | 4.161 | 14.34 |

Method | MAE | RMSE | Training Time, Seconds |
---|---|---|---|

Additive SVR(rbf)-based input-doubling method | Training mode | ||

1.524 | 2.219 | 0.624 | |

Test mode | |||

1.965 | 2.707 | - | |

Additive SVR(poly)-based input-doubling method | Training mode | ||

1.744 | 2.296 | 0.178 | |

Test mode | |||

1.977 | 2.823 | - | |

SVR(rbf)-based input-doubling method | Training mode | ||

1.524 | 2.219 | 0.624 | |

Test mode | |||

2.315 | 3.057 | - | |

SVR(poly)-based input-doubling method | Training mode | ||

1.775 | 2.413 | 0.178 | |

Test mode | |||

2.187 | 3.093 | - | |

SVR(poly) | Training mode | ||

2.065 | 2.815 | 0.001 | |

Test mode | |||

2.937 | 3.728 | - | |

SVR(rbf) | Training mode | ||

2.029 | 2.810 | 0.002 | |

Test mode | |||

2.662 | 3.449 | - | |

Adaptive Boosting * | Training mode | ||

0.449 | 0.603 | 0.239 | |

Test mode | |||

2.317 | 3.06 | - | |

Stochastic Gradient Descent ** | Training mode | ||

2.883 | 4.075 | 0.002 | |

Test mode | |||

2.578 | 4.115 | - |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Izonin, I.; Tkachenko, R.; Shakhovska, N.; Lotoshynska, N.
The Additive Input-Doubling Method Based on the SVR with Nonlinear Kernels: Small Data Approach. *Symmetry* **2021**, *13*, 612.
https://doi.org/10.3390/sym13040612

**AMA Style**

Izonin I, Tkachenko R, Shakhovska N, Lotoshynska N.
The Additive Input-Doubling Method Based on the SVR with Nonlinear Kernels: Small Data Approach. *Symmetry*. 2021; 13(4):612.
https://doi.org/10.3390/sym13040612

**Chicago/Turabian Style**

Izonin, Ivan, Roman Tkachenko, Nataliya Shakhovska, and Nataliia Lotoshynska.
2021. "The Additive Input-Doubling Method Based on the SVR with Nonlinear Kernels: Small Data Approach" *Symmetry* 13, no. 4: 612.
https://doi.org/10.3390/sym13040612