# Small Stochastic Data Compactification Concept Justified in the Entropy Basis

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

^{20}points, that is, 10

^{18}times more points compared to the original 1-dimensional space. This example demonstrates the reason for the inefficiency of the brute force method in typical machine learning problems (classification, clustering, and regression) [5,6,7,8,9]. The paradox is that it is impossible to solve the mentioned applied problems using a small number of parameters and achieve adequate results. One can simply turn a blind eye to the problem of dimensionality, which is the paradigm of deep learning, where using non-parameterized models achieves a significant increase in their quality despite the colossal increase in the number of calculations and accepting as an axiom the potential instability of the training process. But this recipe is unacceptable in the context of the machine learning ideology. The following Table 1 contains a more detailed comparison of these two methods.

^{2}of the corresponding component. Based on the value of these variances, the researcher can select the required number of components. What is the best value $\sum _{m}{\sigma}^{2}$? Some recommend maintaining the inequality $\sum _{m}{\sigma}^{2}}\ge 0,90$, while others believe that $\sum _{m}{\sigma}^{2}}\ge 0,50$ is sufficient. The original answer to this question is provided by Horn’s parallel analysis based on Monte Carlo simulation [25]. The disadvantage of both SVD and PCA is the high computational complexity of obtaining a singular distribution (well-known randomized algorithms [26] slightly mitigate this limitation). A more serious limitation is the sensitivity of SVD/PCA to outliers and the type of distribution of the original data. Most researchers believe that SVD/PCA works consistently with normally distributed data, but it has been empirically found that, as the data dimensionality increases, there are exceptions even to this rule. Therefore, SVD/PCA methods cannot guarantee the stability of the data dimensionality reduction procedure.

- -
- formalize the concept of calculating the variable entropy estimation of the probability distribution density of the characteristic parameters of the stochastic empirical data collection;
- -
- formalize the process of the stochastic empirical data collection compactification with the maximization of the relative entropy between the original and compactified entities;
- -
- justify the adequacy of the proposed mathematical apparatus and demonstrate its functionality with an example.

## 2. Models and Methods

#### 2.1. Statement of the Research

_{i}of the original empirical vector y contain interference, which are represented by stochastic vector values ${\epsilon}_{i}\in \epsilon $, $i=\overline{1,m}$, $\epsilon \in \mathrm{E}=\left\{{\epsilon}^{-}\le \epsilon \le {\epsilon}^{+}\right\}$, with the probability density function $L\left(\epsilon \right)$ of a stochastic vector $\epsilon $. Taking into account interferences, we present expression (1) as

- -
- optimality criterion of the compactified data matrix Y
_{(m×r)}; - -
- a method for calculating the elements of the optimal compactified data matrix Y
_{(m×r)}; - -
- a method for comparing the probability distribution densities of outputs of models (2) and (4) as an indicator of the effectiveness of the proposed compactification concept.

#### 2.2. The Concept of Entropy-Optimal Compactification of Stochastic Empirical Data

^{mr}: ${Y}_{\left(m\times r\right)}={V}_{\left(m\times n\right)}{Q}_{\left(n\times r\right)}$. We obtain the inverse projection on the space R

^{mn}using the matrix ${S}_{\left(r\times n\right)}$, and the values of all elements which are positive: ${X}_{\left(m\times n\right)}={V}_{\left(m\times n\right)}{Q}_{\left(n\times r\right)}{S}_{\left(r\times n\right)}$. The dimensionality of both the obtained matrix X and the original matrix V is the same: (m × n).

^{(0)}, V

^{(0)}.

_{d}(u) is the desired probability distribution density of the d-problem model, and $\mathsf{\Pi}\left(u-\epsilon \right)$ is the density of the stochastic vector $u-\epsilon $. From expression (27) we find $w={V}^{\mathrm{T}}z/{V}^{\mathrm{T}}V$.

_{E}we define in terms of the relative entropy of RE as

_{E}= 0 is reached at ${\tilde{F}}_{d}\left(\lambda \right)={\tilde{F}}_{c}\left(\lambda \right)$.

## 3. Results

^{4}, 10

^{2}). When processing the first dataset, we set r = {100, 90,…, 50} When working with the second dataset, we set r = {100, 90,…, 50} The obtained results are presented in Figure 2.

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Biswas, P.; Dandapat, S.K.; Sairam, A.S. Ripple: An approach to locate k nearest neighbours for location-based services. Inf. Syst.
**2022**, 105, 101933. [Google Scholar] [CrossRef] - Bansal, M.; Goyal, A.; Choudhary, A. A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning. Decis. Anal. J.
**2022**, 3, 100071. [Google Scholar] [CrossRef] - Izonin, I.; Tkachenko, R.; Dronyuk, I.; Tkachenko, P.; Gregus, M.; Rashkevych, M. Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method. Math. Biosci. Eng.
**2021**, 18, 2599–2613. [Google Scholar] [CrossRef] [PubMed] - Izonin, I.; Tkachenko, R.; Shakhovska, N.; Lotoshynska, N. The Additive Input-Doubling Method Based on the SVR with Nonlinear Kernels: Small Data Approach. Symmetry
**2021**, 13, 612. [Google Scholar] [CrossRef] - Kamm, S.; Veekati, S.S.; Müller, T.; Jazdi, N.; Weyrich, M. A survey on machine learning based analysis of heterogeneous data in industrial automation. Comput. Ind.
**2023**, 149, 103930. [Google Scholar] [CrossRef] - Tymchenko, O.; Havrysh, B.; Tymchenko, O.O.; Khamula, O.; Kovalskyi, B.; Havrysh, K. Person Voice Recognition Methods. In Proceedings of the 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 21–25 August 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
- Bisikalo, O.; Kovtun, O.; Kovtun, V.; Vysotska, V. Research of Pareto-Optimal Schemes of Control of Availability of the Information System for Critical Use. In Proceedings of the 2020 1st International Workshop on Intelligent Information Technologies & Systems of Information Security (IntelITSIS), Khmelnytskyi, Ukraine, 10–12 June 2020; CEUR-WS. Volume 2623, pp. 174–193. [Google Scholar]
- Bisikalo, O.V.; Kovtun, V.V.; Kovtun, O.V.; Danylchuk, O.M. Mathematical Modeling of the Availability of the Information System for Critical Use to Optimize Control of its Communication Capabilities. Int. J. Sens. Wirel. Commun. Control.
**2021**, 11, 505–517. [Google Scholar] [CrossRef] - Bisikalo, O.; Danylchuk, O.; Kovtun, V.; Kovtun, O.; Nikitenko, O.; Vysotska, V. Modeling of Operation of Information System for Critical Use in the Conditions of Influence of a Complex Certain Negative Factor. Int. J. Control. Autom. Syst.
**2022**, 20, 1904–1913. [Google Scholar] [CrossRef] - Bisikalo, O.; Bogach, I.; Sholota, V. The Method of Modelling the Mechanism of Random Access Memory of System for Natural Language Processing. In Proceedings of the 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), Lviv-Slavske, Ukraine, NJ, USA, 25–29 February 2020; IEEE: Piscataway, NJ, USA. [Google Scholar] [CrossRef]
- Mochurad, L.; Horun, P. Improvement Technologies for Data Imputation in Bioinformatics. Technologies
**2023**, 11, 154. [Google Scholar] [CrossRef] - Stankevich, S.; Kozlova, A.; Zaitseva, E.; Levashenko, V. Multivariate Risk Assessment of Land Degradation by Remotely Sensed Data. In Proceedings of the 2023 International Conference on Information and Digital Technologies (IDT), Zilina, Slovakia, 20–22 June 2023. [Google Scholar] [CrossRef]
- Kharchenko, V.; Illiashenko, O.; Fesenko, H.; Babeshko, I. AI Cybersecurity Assurance for Autonomous Transport Systems: Scenario, Model, and IMECA-Based Analysis. In Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2022; pp. 66–79. [Google Scholar] [CrossRef]
- Izonin, I.; Tkachenko, R.; Krak, I.; Berezsky, O.; Shevchuk, I.; Shandilya, S.K. A cascade ensemble-learning model for the deployment at the edge: Case on missing IoT data recovery in environmental monitoring systems. Front. Environ. Sci.
**2023**, 11, 1295526. [Google Scholar] [CrossRef] - Auzinger, W.; Obelovska, K.; Dronyuk, I.; Pelekh, K.; Stolyarchuk, R. A Continuous Model for States in CSMA/CA-Based Wireless Local Networks Derived from State Transition Diagrams. In Proceedings of International Conference on Data Science and Applications; Springer: Singapore, 2021; pp. 571–579. [Google Scholar] [CrossRef]
- Deng, P.; Li, T.; Wang, D.; Wang, H.; Peng, H.; Horng, S.-J. Multi-view clustering guided by unconstrained non-negative matrix factorization. Knowl.-Based Syst.
**2023**, 266, 110425. [Google Scholar] [CrossRef] - De Handschutter, P.; Gillis, N.; Siebert, X. A survey on deep matrix factorizations. Comput. Sci. Rev.
**2021**, 42, 100423. [Google Scholar] [CrossRef] - De Clercq, M.; Stock, M.; De Baets, B.; Waegeman, W. Data-driven recipe completion using machine learning methods. Trends Food Sci. Technol.
**2016**, 49, 1–13. [Google Scholar] [CrossRef] - Shu, L.; Lu, F.; Chen, Y. Robust forecasting with scaled independent component analysis. Finance Res. Lett.
**2023**, 51, 103399. [Google Scholar] [CrossRef] - Moneta, A.; Pallante, G. Identification of Structural VAR Models via Independent Component Analysis: A Performance Evaluation Study. J. Econ. Dyn. Control.
**2022**, 144, 104530. [Google Scholar] [CrossRef] - Zhang, R.; Dai, H. Independent component analysis-based arbitrary polynomial chaos method for stochastic analysis of structures under limited observations. Mech. Syst. Signal Process.
**2022**, 173, 109026. [Google Scholar] [CrossRef] - HLi, H.; Yin, S. Single-pass randomized algorithms for LU decomposition. Linear Algebra its Appl.
**2020**, 595, 101–122. [Google Scholar] [CrossRef] - Iwao, S. Free fermions and Schur expansions of multi-Schur functions. J. Comb. Theory Ser. A
**2023**, 198, 105767. [Google Scholar] [CrossRef] - Terao, T.; Ozaki, K.; Ogita, T. LU-Cholesky QR algorithms for thin QR decomposition. Parallel Comput.
**2020**, 92, 102571. [Google Scholar] [CrossRef] - Trendafilov, N.; Hirose, K. Exploratory factor analysis. In International Encyclopedia of Education, 4th ed.; Elsevier: Amsterdam, The Netherlands, 2023; pp. 600–606. [Google Scholar] [CrossRef]
- Fu, Z.; Xi, Q.; Gu, Y.; Li, J.; Qu, W.; Sun, L.; Wei, X.; Wang, F.; Lin, J.; Li, W.; et al. Singular boundary method: A review and computer implementation aspects. Eng. Anal. Bound. Elements
**2023**, 147, 231–266. [Google Scholar] [CrossRef] - Roy, A.; Chakraborty, S. Support vector machine in structural reliability analysis: A review. Reliab. Eng. Syst. Saf.
**2023**, 233, 109126. [Google Scholar] [CrossRef] - Çomak, E.; Arslan, A. A new training method for support vector machines: Clustering k-NN support vector machines. Expert Syst. Appl.
**2008**, 35, 564–568. [Google Scholar] [CrossRef] - Chen, H.L.; Yang, B.; Wang, S.J.; Wang, G.; Liu, D.Y.; Li, H.Z.; Liu, W.B. Towards an optimal support vector machine classifier using a parallel particle swarm optimization strategy. Appl. Math. Comput.
**2014**, 239, 180–197. [Google Scholar] [CrossRef] - Pineda, S.; Morales, J.M.; Wogrin, S. Mathematical programming for power systems. In Encyclopedia of Electrical and Electronic Power Engineering; Elsevier: Amsterdam, The Netherlands, 2023; pp. 722–733. [Google Scholar] [CrossRef]
- Li, P.; Pei, Y.; Li, J. A comprehensive survey on design and application of autoencoder in deep learning. Appl. Soft Comput.
**2023**, 138, 110176. [Google Scholar] [CrossRef] - Mishra, D.; Singh, S.K.; Singh, R.K. Deep Architectures for Image Compression: A Critical Review. Signal Process.
**2022**, 191, 108346. [Google Scholar] [CrossRef] - Zheng, J.; Qu, H.; Li, Z.; Li, L.; Tang, X. A deep hypersphere approach to high-dimensional anomaly detection. Appl. Soft Comput.
**2022**, 125, 109146. [Google Scholar] [CrossRef] - Costa, M.C.; Macedo, P.; Cruz, J.P. Neagging: An Aggregation Procedure Based on Normalized Entropy. In Proceedings of the International Conference Of Numerical Analysis And Applied Mathematics ICNAAM 2020, Crete, Greece, 19–25 September 2022. [Google Scholar] [CrossRef]
- Bisikalo, O.; Kharchenko, V.; Kovtun, V.; Krak, I.; Pavlov, S. Parameterization of the Stochastic Model for Evaluating Variable Small Data in the Shannon Entropy Basis. Entropy
**2023**, 25, 184. [Google Scholar] [CrossRef] - Zeng, Z.; Ma, F. An efficient gradient projection method for structural topology optimization. Adv. Eng. Softw.
**2020**, 149, 102863. [Google Scholar] [CrossRef] - El Masri, M.; Morio, J.; Simatos, F. Improvement of the cross-entropy method in high dimension for failure probability estimation through a one-dimensional projection without gradient estimation. Reliab. Eng. Syst. Saf.
**2021**, 216, 107991. [Google Scholar] [CrossRef] - Liu, B.; Chai, Y.; Huang, C.; Fang, X.; Tang, Q.; Wang, Y. Industrial process monitoring based on optimal active relative entropy components. Measurement
**2022**, 197, 111160. [Google Scholar] [CrossRef] - Fujii, M.; Seo, Y. Matrix trace inequalities related to the Tsallis relative entropies of real order. J. Math. Anal. Appl.
**2021**, 498, 124877. [Google Scholar] [CrossRef] - Makarichev, V.; Kharchenko, V. Application of Dynamic Programming Approach to Computation of Atomic Functions. In Radioelectronic and Computer Systems; no. 4; National Aerospace University-Kharkiv Aviation Institute: Kharkiv, Ukraine, 2021; pp. 36–45. [Google Scholar] [CrossRef]
- Dotsenko, S.; Illiashenko, O.; Kharchenko, V.; Morozova, O. Integrated Information Model of an Enterprise and Cybersecurity Management System. Int. J. Cyber Warf. Terror.
**2022**, 12, 1–21. [Google Scholar] [CrossRef]

**Figure 1.**(

**a**) Dependence α = f(m, Met1, r), $m=\overline{5,10}$, $r=\overline{10,5}$. (

**b**) Dependence α = f(m, Met2, r), $m=\overline{5,10}$, $r=\overline{10,5}$. (

**c**) Dependence α = f(m, Met3, r), $m=\overline{5,10}$, $r=\overline{10,5}$.

**Figure 2.**(

**a**) Dependence α = f(Met, r, DS1), $r=\overline{10,5}$, Met = {Met1, Met2, Met3}. (

**b**) Dependence α = f(Met, r, DS2), r = {100, 90,…, 50}, Met = {Met1, Met2, Met3}.

**Figure 4.**(

**a**) Dependence ${\delta}_{E}=f\left(r,Met\right)$ for $DS1$ dataset. (

**b**) Dependence ${\delta}_{E}=f\left(r,Met\right)$ on $DS2$ dataset.

**Figure 5.**(

**a**) Dependence δ

_{E}= f(r, δ) for Met3 and DS1 dataset. (

**b**) Dependence δ

_{E}= f(r, δ) for Met3 and DS2 dataset.

Criterion | Machine Learning | Deep Learning |
---|---|---|

The number of data points | One can use small amounts of data to create forecasts | It is necessary to use large volumes of training data to create forecasts |

Dependence on equipment | It can work on low-power computers. Large computing power is not required | Depends on high-performance computers. At the same time, the computer performs a large number of operations on the matrix. The graphic processor can effectively optimize these operations |

The process of constructing features | Requires an accurate determination of the signs and their creation by users | Recognizes high levels based on data and independently creates new signs |

Claim to training | The training process is divided into small steps. Then, the results of each step are combined into a single output block | The problem is solved by the method of thorough analysis |

Training time | Training takes relatively little time, from a few seconds to several hours | As a rule, the training process takes a long time since the deep learning algorithm includes many levels |

Output | The output data is usually a numerical value, for example, assessment or classification | The weekend can have several formats, such as text, estimate or sound |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kovtun, V.; Zaitseva, E.; Levashenko, V.; Grochla, K.; Kovtun, O.
Small Stochastic Data Compactification Concept Justified in the Entropy Basis. *Entropy* **2023**, *25*, 1567.
https://doi.org/10.3390/e25121567

**AMA Style**

Kovtun V, Zaitseva E, Levashenko V, Grochla K, Kovtun O.
Small Stochastic Data Compactification Concept Justified in the Entropy Basis. *Entropy*. 2023; 25(12):1567.
https://doi.org/10.3390/e25121567

**Chicago/Turabian Style**

Kovtun, Viacheslav, Elena Zaitseva, Vitaly Levashenko, Krzysztof Grochla, and Oksana Kovtun.
2023. "Small Stochastic Data Compactification Concept Justified in the Entropy Basis" *Entropy* 25, no. 12: 1567.
https://doi.org/10.3390/e25121567