Open Access This article is
- freely available
Entropy 2018, 20(10), 722; doi:10.3390/e20100722
Analytic Study of Complex Fractional Tsallis’ Entropy with Applications in CNNs
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia
School of Mathematical Sciences, Faculty of Sciences and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
Correspondence: [email protected]; Tel.: +60-012-279-7153
Both authors contributed equally to this work.
Received: 29 July 2018 / Accepted: 10 September 2018 / Published: 20 September 2018
In this paper, we study Tsallis’ fractional entropy (TFE) in a complex domain by applying the definition of the complex probability functions. We study the upper and lower bounds of TFE based on some special functions. Moreover, applications in complex neural networks (CNNs) are illustrated to recognize the accuracy of CNNs.
Keywords:fractional calculus; fractional operator; fractional entropy; CNNs; analytic function; unit disk
A strategic amount in information theory is entropy. Entropy measures the amount of uncertainty appearing in the assessment of a random variable or the outcome of a random process. In 1988, Tsallis  presented the nonadditive entropy, aiming at a generalization of Boltzmann–Gibbs (BG) statistical mechanics. The purpose of this generalization is to study complex systems. Its applications appeared in many fields, such as thermodynamics, chaos, artificial neural networks, image processing, complex systems, information theory, etc. (see [2,3,4,5,6,7,8,9,10,11,12,13,14,15]).
The scheme of the axioms of probability theory placed in 1933 by Kolmogorov can be extended to include the imaginary set of numbers and this by accumulation to his original five axioms. Later, an additional three axioms were given in . Consequently, the complex probability domain is defined by the sum of the real set with its corresponding real probability and the imaginary with its corresponding imaginary probability. In general, the advantages of complex probability theory are that it is considered a supplementary dimension (imaginary part) to the event appearing in the real dimension laboratory (real part). It represents physical quantities of complex networks in terms of currents, complex potentials and impedance. Moreover, it fulfills luck and chance in substituted by total determinism in a complex domain. Finally, it extends many well-known concepts of the traditional probability theory, such as expectation and variance, to the complex probability theory with more accuracy in applications. One of the important applications of complex probability theory is in realistic quantum mechanics ; for example, the two slit experiment where a source releases a single particle, which moves to a wall with two slits and is spotted at position on a shelter placed behind the wall. The typical argument that an interference design on the shelter infers that the particle did not either drive through one slit or the other is ultimately an argument in probability theory such that where and are the probability via the first and second slit, respectively, which is a critical process. This process leads to the use of complex probability theory.
Recently, Abou Jaoude  extended Shannon’s information theory by using the complex probability. The author calculated the magnitude of the chaotic factor, the channel capacities in the probability and the degree of knowledge. In general, complex probability leads to better information for all processes compared to the classical probability [19,20]. Figure 1 shows the relation between complex analysis and information theory.
Our investigation is based on the concept of complex probability to extend the idea of Tsallis’ fractional entropy (TFE). The study of the technique delivered by using the approximation theory of special functions of complex variables was useful in information theory. We introduce the upper and lower bound of TFE. Sharpness is discussed as well in the sequel.
Let A be an event in a complex domain . The real and imaginary terms of the complex probability function (CPF):where the argument and and are the real probability and the imaginary probability in the real set and imaginary set , respectively. Following Axiom 7 in , we have:such that with and hence, is always equal to one. Abou Jaoude et al.  inferred that (the open unit disk).
Tsallis presented an entropic formalization characterized by an index , which implies a non-extensive statistics. TFE () is the basis of the so-known non-extensive statistical mechanics, which modifies the Boltzmann–Gibbs theory. Tsallis statistics has been used in various fields such as applied mathematics, physics, biology, chemistry, computer science, information theory, engineering, medicine, economics, business, geophysics, etc. Since we study the analytic properties of TFE, therefore, we focus on the continuous formula. The general continuous form of this entropy is given by:
By applying the concept of CPF in Equation (1), we extend TFE into complex values as follows (CTFE ):
For a special domain we have:
For the analytic study, we shall use the definition:where is analytic in U, having the form:It is clear that and
TFE has been maximized by using different techniques depending on its parameter . This problem was discussed in [1,2] for real power index and in  for the complex power index. The authors showed that the Tsallis distribution reserves its fractional power formula, decorating with some specific log-periodic oscillations (convergence dynamics of z-logistic maps). As a result, the authors introduced a complex measure of the thermal bath heat capacity Thus, in general, the heat capacity becomes complex as well. In this work, CTFE approximates some special functions in a complex domain. These functions are popular in various applications.
Next, we approximate Equation (4) for some special functions. The advantageous of the approximation are: First, for recognizing target functions, the approximation technique studies how certain known functions (for example, special functions) can be approximated by a definite class of functions (for example, polynomials or rational functions) that often have desirable properties (inexpensive computation, continuity, integral and limit values, etc.). Second, the target function, call it , may be unknown; instead of a clear formula, only a set of points of the form () is delivered. Depending on the organization of the domain and codomain of , several methods for approximating may be applicable. For example, if is an operation on the complex numbers, the techniques of geometric function theory can be used.
2.1. Bernoulli Function
The function is not convex when (see Figure 3).
Series expansions at are given as follows:
Moreover, when we have:
For and for all , we have if and only if Note that this concept is called majorization coefficients.
We have the following properties (upper bounds):
For CTFE approximated by Bernoulli function,
Then, we obtain:
Furthermore, for we have:
For CTFE approximated by Bernoulli function, there is a probability measure μ on for all
Let ; then, we have:
In view of Theorem 1.11 in , the admits a probability measure in satisfying:
Then, by virtue of Proposition 1, there is a constant (diffusion constant) such that:
This completes the proof. ☐
2.2. Gaussian Function
The function is defined by the series:
A special case of this function is We consider CTFE approximated by Clearly, we have the following results:
For CTFE approximated by :
For CTFE approximated by , there is a probability measure μ on [0, 1].
In view of Equation (1.2-8) in , there is a probability measure on [0, 1] such that:
By Proposition 3, we have the desired assertion. ☐
2.3. Fractional Sigmoid Function FSF
CTFE can be approximated by FSF. In our investigation, we focus on the type of function, which is analytic in U. We suggest the function (see Figure 4):
The expansion CTFE are given as follows:
For sufficient values of a and CTFE approximated by FSF can be majorized by
3. Complex-Valued Neural Networks
CNNs are a necessary extension of the analysis of real-valued neural networks. CNNs are networks that utilize complex-valued variables and parameters, effectively distributing in this style with complex-valued information. They are very well matched with wave phenomena, and they are suitable for the procedures connected with complex altitude . The have been used for a long list of applications, essentially in learning tasks, loss function, cost function, utility function and combinatorial optimization.
In CNNs, the neurons in each layer are systematized as a three-dimensional array rather than as a vector in ANNs (artificial neural networks). The first two dimensions are titled spatial, and the third is a partition to networks. The CNN system charts three ideologies characteristic of natural systems: locality, sharing and pooling.
The locality behavior is the information that neurons depend only on their neighbors, rather than on far away neurons. Sharing is the limitation that various pi neurons should undergo the same processing. It is challenging that an affine layer follows locality, and sharing results in a convolution layer. Pooling is used to indicate invariance to small translations. A pooling layer does so by splitting each input channel into patches and replacing each patch with a single representative assessment in the output layer.
Suppose the CNN is delivered by n fully connected in a Hopfield-like net. The output is given by a complex number for each neuron:
Thus, the network state (information of the net) is a complex vector. In this work, we shall use the total information, which is given by the relation:where is approximated by Equation (9). Therefore, a large amount of information can be realized from both theoretical study and numerical computations from The stability of Equation (10) is given by the energy equation:where is the conjugate of The energy provides a tool for studying the dynamics of CNNs. Figure 5 shows the steps of finding the energy. The minimum energy is bounded by the value , which is suggested during the training of CNN.
It has been shown by experiences, for a CNN of four neurons, that the minimum energy is satisfying Equation (11) for the output on as follows:
The energy is equal to one for all values ; while the energy is increasing for outcomes inside the unit disk For example, the output set:has energy , for different values of
Let be the outcome set of CNN. To apply our algorithm, we pursue the following steps:
Step 1. Calculate from Equation (8) as follows: for we have:
Step 2. Compute the total information by using Equation (10):
Step 3. Estimate the energy of CNN by applying Equation (11):
One can show that for all the estimate energy for the set is equal to one. The algorithm will stop at the value which was given previously. In our example, we consider for all
Moreover, to estimate the energy of the outcomes set we follow the above steps:
Comparing with the CNN needs more training.
Comparing with the complex Shannon entropy , we obtain the following values for the set :
This implies total information Consequently, we have
- Equation (10) refers to the amount of information in the complex system, which is given in the CNN. The advantage is that CNN does not depend on the number of neurons to get full training of the system (see [11,12,13,14,15,26]). Furthermore, the complex value of the output converges to the stability state faster than the real value. All the complex value outputs are given in the open unit disk where (see ). In this case, we may use the properties of geometry function theory (GFT). For example, the sigmoid function of the complex value is studied widely in view of GFT. The convexity and other geometric representations of this function have been studied by many authors (see ).
- The parameter from is: the simplest non-trivial perturbation of any unperturbed complex system; the complex system (CNN) in which obvious necessary and sufficient conditions are recognized for a small divisor problem is stable.
- The output may cause a complex-valued function incited by the set In this situation, the stability comes from the first derivative of with respect to z. This type of stability is called Lyapunov stability. At a fixed point :At a periodic point of period ℘, the first derivative of a function:
- At a non-periodic point, the derivative, can be iterated by:
- The above derivative can be replaced by any derivative for a complex variable such as the Schwarzian derivative. We may suggest this as a future work.
- Derivative with respect to (parametric derivative): This type of derivative is called the distance estimation method. In this case, CNN has one output in the set , and it is fixed. Therefore, we suggest to use the parameter plane collecting information. This occurs as follows: On the parameter plane: is a variable, and is constant. The first derivative of with respect to is given by the relation:This derivative can be defined by the following iteration:
5. Conclusions and Future Research
In the present paper, we have been applying the model of complex probability to Tsallis’ entropy. Henceforth, we established a fitted connection between the new model and the classical FTE. Therefore, we developed the theory of information. As an application, we made a generalization of CNNs; its result implied minimization of the energy in this complex system. The aid of extending FTE leads to very stimulating and successful consequences and outcomes illustrated in this work. Therefore, we are calling this original and beneficial new study in applied mathematics and analytics: “the theory of complex information”.
It is intended that additional development of this original study will be done in subsequent work such as convergence, convexity and concavity. It is proposed that in future research studies, the novel planned analytic method will be elaborated more, and the complex probability model, as well as extensive and various sets of stochastic processes will be applied.
Conceptualization, R.W.I. and M.D.; methodology, R.W.I.; software, R.W.I.; validation, R.W.I. and M.D.; formal analysis, R.W.I. and M.D.; investigation, R.W.I. and M.D.; writing—original draft preparation, R.W.I.; writing—review and editing, M.D.; funding acquisition, M.D.
This research was funded by Universiti Kebangsaan Malaysia grant number GUP-2017-064.
The authors would like to express their thanks to the reviewers for their important and useful comments to improve the paper. The work here is partially supported by the Universiti Kebangsaan Malaysia grant: GUP ( Geran Universiti Penyelidikan)-2017-064.
Conflicts of Interest
The authors declare no conflict of interest.
The following abbreviations are used in this manuscript:
|real set of events|
|imaginary set of events|
|probability in the real set|
|probability in the imaginary set|
|U||the open unit disk|
|the degree of our knowledge of the random experiment; it is the square of the norm of z|
|the real part of CFTE|
|the upper bound of energy|
- Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Tsallis, C. The nonadditive entropy Sq and its applications in physics and elsewhere: Some remarks. Entropy 2011, 13, 1765–1804. [Google Scholar] [CrossRef]
- Ibrahim, R.W.; Jalab, H.A. Existence of entropy solutions for nonsymmetric fractional systems. Entropy 2014, 16, 4911–4922. [Google Scholar] [CrossRef]
- Ibrahim, R.W.; Jalab, H.A. Existence of Ulam stability for iterative fractional differential equations based on fractional entropy. Entropy 2015, 17, 3172–3181. [Google Scholar] [CrossRef]
- Ibrahim, R.W.; Jalab, H.A.; Gani, A. Cloud entropy management system involving a fractional power. Entropy 2015, 18, 14. [Google Scholar] [CrossRef]
- Ibrahim, R.W.; Jalab, H.A.; Gani, A. Perturbation of fractional multi-agent systems in cloud entropy computing. Entropy 2016, 18, 31. [Google Scholar] [CrossRef]
- Jalab, H.A.; Ibrahim, R.W.; Amr, A. Image denoising algorithm based on the convolution of fractional Tsallis entropy with the Riesz fractional derivative. Neural Comput. Appl. 2017, 28, 217–223. [Google Scholar] [CrossRef]
- Ibrahim, R.W. The maximum principle of Tsallis entropy in a complex domain. Ital. J. Pure Appl. Math. 2017, 601–606. [Google Scholar]
- Ibrahim, R.W. On new classes of analytic functions imposed via the fractional entropy integral operator. Facta Univ. Ser. Math. Inform. 2017, 32, 293–302. [Google Scholar] [CrossRef]
- Al-Shamasneh, A.A.R.; Jalab, H.A.; Palaiahnakote, S.; Obaidellah, U.H.; Ibrahim, R.W.; El-Melegy, M.T. A new local fractional entropy-based model for kidney MRI image enhancement. Entropy 2018, 20, 344. [Google Scholar] [CrossRef]
- Rubio, J.D.J.; Lughofer, E.; Plamen, A.; Novoa, J.F.; Meda-Campaña, J.A. A novel algorithm for the modeling of complex processes. Kybernetika 2018, 54, 79–95. [Google Scholar] [CrossRef]
- Meda, C.; Jesus, A. On the estimation and control of nonlinear systems with parametric uncertainties and noisy outputs. IEEE Access 2018, 6, 31968–31973. [Google Scholar] [CrossRef]
- Rubio, J. Error convergence analysis of the SUFIN and CSUFIN. Appl. Soft Comput. 2018, in press. [Google Scholar]
- Meda, C.; Jesus, A. Estimation of complex systems with parametric uncertainties using a JSSF heuristically adjusted. IEEE Lat. Am. Trans. 2018, 16, 350–357. [Google Scholar] [CrossRef]
- De Jesús Rubio, J.; Lughofer, E.; Meda-Campaña, J.A.; Páramo, L.A.; Novoa, J.F.; Pacheco, J. Neural network updating via argument Kalman filter for modeling of Takagi-Sugeno fuzzy models. J. Intell. Fuzzy Syst. 2018, 35, 2585–2596. [Google Scholar] [CrossRef]
- Abou Jaoude, A. The paradigm of complex probability and Chebyshev’s inequality. Syst. Sci. Control Eng. 2016, 4, 99–137. [Google Scholar] [CrossRef]
- Youssef, S. Quantum mechanics as Bayesian complex probability theory. Mod. Phys. Lett. A 1994, 9, 2571–2586. [Google Scholar] [CrossRef]
- Abou Jaoude, A. The paradigm of complex probability and Claude Shannon’s information theory. Syst. Sci. Control Eng. 2017, 5, 380–425. [Google Scholar] [CrossRef]
- Abou Jaoude, A.; El-Tawil, K.; Seifedine, K. Prediction in complex dimension using Kolmogorov’s set of axioms. J. Math. Stat. 2010, 6, 116–124. [Google Scholar] [CrossRef]
- Abou Jaoude, A. The complex probability paradigm and analytic linear prognostic for vehicle suspension systems. Am. J. Eng. Appl. Sci. 2015, 8, 147. [Google Scholar] [CrossRef]
- Wilk, G.; Włodarczyk, Z. Tsallis distribution with complex nonextensivity parameter q. Phys. A Stat. Mech. Its Appl. 2014, 413, 53–58. [Google Scholar] [CrossRef]
- Mocanu, P.T. Convexity of some particular functions. Studia Univ. Babes-Bolyai Math. 1984, 29, 70–73. [Google Scholar]
- Ruscheweyh, S. Convolutions in Geometric Function Theory; Presses de l’Université de Montréal: Montréal, QC, Canada, 1982. [Google Scholar]
- Miller, S.S.; Mocanu, P.T. Differential Subordinations: Theory and Applications; CRC Press: Boca Raton, FL, USA, 2000. [Google Scholar]
- Kaslik, E.; Ileana, R.R. Dynamics of complex-valued fractional-order neural networks. Neural Netw. 2017, 89, 39–49. [Google Scholar] [CrossRef] [PubMed]
- Ibrahim, R.W. The fractional differential polynomial neural network for approximation of functions. Entropy 2013, 15, 4188–4198. [Google Scholar] [CrossRef]
- Ezeafulukwe, U.A.; Darus, M.; Olubunmi, A. On analytic properties of a sigmoid function. Int. J. Math. Comput. Sci. 2018, 13, 171–178. [Google Scholar]
Figure 1. The connection of the main objectives of this research.
Figure 2. Bernoulli function .
Figure 3. Bernoulli function .
Figure 4. Sigmoid function .
Figure 5. The algorithm of using CTFE in CNNs.
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).