Entropy-Based Strategies for Rapid Pre-Processing and Classification of Time Series Data from Single-Molecule Force Experiments

Recent advances in single-molecule science have revealed an astonishing number of details on the microscopic states of molecules, which in turn defined the need for simple, automated processing of numerous time-series data. In particular, large datasets of time series of single protein molecules have been obtained using laser optical tweezers. In this system, each molecular state has a separate time series with a relatively uneven composition from the point of view-point of local descriptive statistics. In the past, uncertain data quality and heterogeneity of molecular states were biased to the human experience. Because the data processing information is not directly transferable to the black-box-framework for an efficient classification, a rapid evaluation of a large number of time series samples simultaneously measured may constitute a serious obstacle. To solve this particular problem, we have implemented a supervised learning method that combines local entropic models with the global Lehmer average. We find that the methodological combination is suitable to perform a fast and simple categorization, which enables rapid pre-processing of the data with minimal optimization and user interventions.


Introduction
Only the set of averaged characteristics can be captured by bulk experimental methods, which significantly limits our understanding of the heterogeneity of molecular states. On the other hand, single-molecule techniques provide deep insights into the complex dynamics of individual molecules [1]. Namely, the detection of various molecular sub-states, conformations as well as interstate transformations is one of the main benefits of single molecule techniques [2].
Hence, in general, single-molecule techniques offer the possibility to characterize molecular heterogeneity and to quantify a number of sub-states, interconversion rates, and their occurrences. A development of advanced approaches is essential to enhance experimental resolution, which is needed for describing rare, low-populated states of molecules. It is important to note that in biological systems, such low-populated rare sub-states can have profound effects. For example, the rare, low-population infectious states of prion protein PrP are highly crucial as they act as a nucleated seed that recruits native PrP into fibrils that ultimately contribute to amyloid disease [3]. Importantly, single-molecule force spectroscopy of prion protein PrP has identified and characterized low populated rare misfolded states [4]. This example demonstrates the power of single-molecule techniques to detect relevant low-populated rare sub-states. Naturally, capturing and detecting low-populated rare sub-states using single-molecule techniques is experimentally challenging; it requires extensive time-series data collection, selection, and categorization. Additionally, several complications arise from and [16]. Briefly, the E. coli Hsp70 nucleotide-binding domain protein construct was genetically modified to serve cysteine residues for the attachment of the required double-stranded DNA handles [15,16]. These DNA handles carried the modifications on each end to ensure coupling to the one µm functionalized beads. The beads could be trapped in our optical tweezers setup and manipulated in a so-called passive mode (for details see [17]). Trapped beads were calibrated according to a method [18], trap stiffness was between 0.25 to 0.30 pN/nm. Signals were acquired for a 10-30 min at a sampling rate of 30 kHz. For the data analysis, the difference between both signals was calculated after the experiment to increase the signal-to-noise ratio [14].
The signals were corrected for a cross-talk for both due to the depolarization and proximity of the beams. For the final analysis, long time traces were analyzed after the resampling to 10 kHz. Glass beads (1 µm in diameter; Bangs Laboratories, Inc., Fishers, Indiana, United States), which were previously covalently functionalized with a digoxigenin Fab fragments (Roche), were mixed with protein-DNA constructs. After the addition of streptavidin-coated silica beads (1 µm in diameter; Bangs Laboratories, Inc.), the protein-DNA-bead mixture was introduced into a flow cell. Measurements were carried out at ∼28 • C in PBS (10 mM phosphate buffer, 2.7 mM potassium chloride, 137 mM sodium chloride, pH 7.4, at 25 • C), with an added oxygen scavenger system (26 U/mL glucose oxidase, 17 000 U/mL catalase, 0.65% glucose). During the single-molecule mechanical measurements, trapped beads were brought into proximity to build a bead-DNA-protein dumbbell. Protein-DNA concentrations were adjusted to sparsely cover the beads leading mainly to single-tether formation. The trapping potentials were held at a constant separation to record passive-mode force vs. time traces.

Problem Formulation-Data Categories
We will now describe the idea of activities and the role of an expert in the classification. The expert will assume that she/he disposes of a set of single-molecule force experiments (see Figure 1). For simplicity, let us consider the experiments generating two types of time series data, i.e., two types (categories) of samples, which are denoted as A and B. For type A (category A), further detailed processing and research is necessary to gain insights into single-molecule kinetics. On the other hand, experiments of type B are considered to be the result of entirely different molecular states (e.g., damaged molecule, or in a transient misfolded state) and will not be further investigated in detail. Still, the counting of such experiments in category B provides numbers for statistical evaluations. Type A (category A) means that the measurement provides only a few discrete molecular states. There are visible transitions between these states. With the type B, the states are not spatially and temporally separated enough or only the molecules resting in a single state, and hence no transitions can be identified.
Only high-quality single-molecule data can provide reliable information on the underlying free energy landscape. Here we show that histogram analysis can play a dual role in the data processing from single-molecule force spectroscopy. Single-molecule data pre-processing, as demonstrated in the presented study, can be included in the beginning of the data analysis pipeline. As our histogram-based pre-processing method energy is general and independent of the underlying energy landscape, the outcoming experimental data in category A can be further processed. There are several ways to extract effective free energy landscapes from single-molecule time series using histogram analysis [19,20]. The procedure identifies a distribution of the observable associated with each local equilibrium state. By assessing how often the molecule visits and resides in a chosen state and escapes from one state to another, their analysis naturally leads to a reconstruction of the free energy landscape. In another approach, the time series of a single intramolecular distance can be analyzed by a network-based method for determining basins and barriers of complex free energy surfaces (e.g., the protein folding landscape).

Measures and Methods of Supervised Classification
In the next we go step by step through the main elements of the classification system described in Sections 2.2.1-2.2.3.

Time Series, Averages, Adaptive Histograms
In compliance with data, we consider time series {x} t real-valued subsequent observations x t . The experimental conditions do not allow us to assume that observations are uniformly distributed. To make the problem computationally feasible, the situation can be improved by splitting the original signal into smaller parts -time windows. The data is considered to be partially stationary in the respective window. For each window, t ∈ [T wdn , T wup ], T wup − T wdn = T w = const. the local mean values resulting from the iterative evaluation can be obtained as presented in the Algorithm 1.
Algorithm 1: Conditional mean values for given time window.
Result: Three local numbers: µ L < µ M < µ T Inputs: Begin with the counter c iter ← 1 Update c iter ← c iter + 1; end Because histograms change dynamically, the peak heights and valley depths of valleys between different time windows, we have designed a processing method which we called adaptive. In this particular framework, it is envisaged that the shape of the bins can be adapted to immediate situations rather than just inefficiently increasing the number of breaks to achieve a certain level of complexity.
Specifically, we gain adaptability by sustaining a constant number of breaks in a changing position. After the repeated stabilization and iterative improvements of the respective average values, we calculated the respective conditional probabilities For the sake of simplicity, the values π ... , x ... , µ ... are not provided with a time stamp. Another rationale for this reduction is that we are concerned of some form of possible window rearrangement at this level, as there is no influence on the outcome. The result can be considered in the form of a elementary histogram with only three adaptive breakpoints µ L , µ M , µ H . Adaptability is essential because data properties can change over time. A well-adapted, concise and substantially reduced histogram can consist of only a few uneven breaks.

Entropy of Histograms
Further considerations are central to the concept of entropy, which is a natural and integral and universal part of the probabilistic description. The entropy measure does not highlight some of the details of the histogram, but reflects the level of organization required for the success of preprocessing. The preprocessing information only becomes relevant when entropy values are affected by specific control parameters. If the internal parameters (meta-parameters) of the data mapping model are incorporated into the learning process, some of their instances may be more suitable for certain types of processed data. The T-entropy introduced by Tsallis [11,21] is an ideal parametric candidate that can provide distinguishable inter-class separation in the output values. Its form uses the real parameter q T . An alternative to this is, for example, the Reny's form of the entropy. It should be noted that we are not introducing the entropy of the entire time series, instead, our proposal is T-entropy suggested for different time windows. Now it is useful to look at the overall computational model depicted in Figure 2, which briefly describes the structure of data processing flows as well as the organization of the time windows. Each data treatment is based on the exchangable collection of T-entropy values constructed for the constant T w . The selection of T w uniquely determines the number of the non-overlapping windows n w = floor( Number_of_ time_ series_tics / T w ). Of course, the overlaps are not ignored as they provide additional statistical information that partially eliminates the reliance on selecting the initial time window. The overlap effect is characterized by the independent positive integer n ws (see details described by the Algorithm 2). The method described above transforms the original data series into 2D array of the local T-entropies with the structure S T,( index of non-overlaping window , index characterizing overlap ) .
The statistics of S T,(.,.) became evidently non-Gaussian due to constraints and therefore ceased to be suitable for simple characterization by mere arithmetic means.

Algorithm 2: Lehmer mean of set of entropy values.
Result: 1. Update position of the sliding windows [T wdn , T wup ]

2.
Calculate L{S T } (see Equation (5)) for the array S T,(.,.) (q T ); use Equation (6) • summation loops over the sliding time window: Procedure: Local Tsallis entropy S T,(.,.) (ii) π 0 , π 1 , π 2 , π 3 defined by Equation (1); (iii) partial, interval output : S T,(.,.) given by Equation (2); ( see also Equation (3)) summation of Lehmer mean values of S T (.,.) Let us now turn to the main physical properties of the data that we want to identify and quantify. Their details and manifestations fall within the scope of the classification, which will depend upon the decisions of the specialist. The classification process in our specific application means that the sample is assigned to one of the defined classes (A or B). We believe that after a series of transformations we make, there will be a continuous separation zone between A and B that will be sufficiently wide enough.
Of course, more experiments with control parameters should point to the potential for higher sensitivity of our transformation in data processing. Our concept evolved mainly from the preliminary requirement that the transformation of a sample with bimodality or multimodality is adequately separated from the transformation of a sample without these statistical characteristics. However, we did not follow these requirements strictly below because we do not want to focus too much on a specific pattern. Instead, we prefer a more general approach. T-entropy [11] could be ideal for this purpose. In the following, we assume that T-entropy on relatively small scales or its generalized mean values (large scales) could be effective in the classification process. We realize that what we are now proposing is a more abstract, not a definitively valid strategy, but that numerical analysis can ultimately reveal knowledge and bring (parametric) improvements that can be applied in the upcoming learning and optimization process. The numerical schemes we use here are in principle consistent with supervised learning methods. We note that we have attempted several approaches, but only a few attempts have worked well, leading to the basic empirical version that we publish in this work.
The scheme shows how partial blocks are organized into an overall algorithm. It is a kind of nonlinear filtering of the input time series. In a hypothetical inference process, a comparison can be made with the transformed elements sampled from data categories A or B. The result of the design is a non-linear filter-classifier, which conceptually relies on the need for a supervised learning phase.
Nevertheless, let us also mention details regarding the numerical experiments with classification, which initially do not produce satisfactory results. For example, an alternative direct calculation of the so-called Sarle's b ( b Sarle ), which is typically used to detect bimodality [22] (based on combination of kurtosis and skewness), did not provide a proper segregation of A and B and was therefore not a valid distinguishing feature for sets A and B. An obvious explanation is that the value of the b Sarle fluctuates considerably along time series values. For example, a particular window may not necessarily be in the correct place to extract the entire statistically representative sample. In Section 3.1 we present several examples of variants for which the averaging method is of high relevance. An interesting alternative to the conventional approach to b Sarle is described in Subsection 3.1.3.

Long-Term Transformation into Entropic Systems with Related Lehmer Means
Obviously, multimodality and bimodality can reduce entropy compared to uniform distribution of states. However, this also applies to individual isolated distribution peaks that are not of interest. Paradoxically, therefore, entropy may seem to be a relatively general and to some extent imperfect indicator, which may not suit the needs of experts. In other words, this seems to be a weak alternative to identifying detailed changes in each distribution. On empirical basis, the fundamental premises regarding the entropy series will be sufficient for a given classification, and the entropy will be effective enough to enable rapid classification of sample types.
Let's turn to the details now that we need a generalized averaging of the entropy series. Any candidate averaging method that seeks to achieve a sufficient separation of A and B should take into account the fact that not all entropy data should be considered with the same weight. For example, the Lehmer mean can characterize the asymmetric distributions of { S T } values reliably. To be more explicit, when consider the set {X j | X j ∈ R + } the Lehmer mean [23][24][25][26] is given by L{X}(p L , .) = . There is, of course, freedom of assessment samples using the weights ∼ x (p L −1) j depending on the p L ∈ R parameter.
The above framework helps us to create a particular mean of the entropy sequence, equipped with a variety of window indices. The entropy events collected according to the scheme from Equation (3) lead to the mean Since we do not know yet which component of the recognition and classification system will be more productive in terms of the projected data, we are also interested in the derivativê Here we have intentionally omitted the sum information used in S T,(.,.) (see Equation (5)). More specifically, it would be helpful at this point to grasp the details of the information gathered. For this purpose, the Scheme Algorithm 2 is provided, to give details of how partial contributions are summed up to determine the Lehmer mean values.
In order to differentiate inputs using various techniques of the filtering, we have implemented two entropy-based weighing versions Their effectiveness for a given type of data will be directly examined and commented in the numerical part of the paper. Subsequently, these alternatives were involved in the introduced here system of the effective Tsallis indices In the applications, we limit ourselves to 1 < q m ≤ q M region. In such case we do not need to go through the singular point q T = 1 (although the singularity is removable). Factor 1/z represents an attempt to "power z compensation" in its essence. We used only q TEyz (z > 0) for the four variants of y, z in the implementation of the proposed method. Of course, the use of very small z should be avoided because of a poor separation effect expected. The assumption beyond Equation (8) is that the corresponding q TEyz indicator provides values of the expected order. This also implies the standardization. The reason for this is that construction is subordinated to Tsallis concept where q TEyz is by some convolution interlinked to q T . Let us repeat again for a better understanding that q TEyz characterizes the whole time series.
While predictions of q T are not directly included into the underlying theories, many scientific works assume that q T is near the Boltzmann limit q T → 1. As we will demonstrate in the results section, this also applies to the effective version of the parameter with the weights w 1 , w 2 . Although the methodology we are discussing can in principle provide information on a macroscopic statistical property called non-extensivity, it is not clear what happens when the series is processed by Lehmer averaging. Therefore, no attention is paid to this particular issue in the paper.

Numerical Results
For the purposes of analysis, we have chosen the following parameter values N it = 6, n ws = 8. There are also three primary alternatives T w = 500, 1000, 2000 which we will later justify by examining the T w dependencies. The behaviour of q TEyz as a function of p L is depicted on the partial plots of Figure 3. The common basis for the simulation is the use of the boundaries q m = 1.01 and q M = 6.01 (see Equation (8)). As checked by our preliminary studies, the efficiency of the separation A and B is highly determined by a sufficiently large q M choice. Initially, we approximated the quadrature by the summation over 1000 evenly spaced nodes. However, we later revealed that numerical quadrature based only upon 10 rectangular sampling of q T not only reduce the calculation load by a factor of 100 but also allow the separation of A and B to be preserved. Exact integration in the sense of Equation (8) is therefore not necessary. In our computational approach, we deal with a quick estimate by means a strongly diluted integration grid (over q T ). Note that there is a parallel with experimental data analysis that only uses a selection of several different exponent of different regimes using Tsallis distribution [27].
The detailed calculations of p L have been done for three alternatives T w ∈ {500, 1000, 2000} that offer qualitatively the same result. We are not providing results for the last value here for reasons of redundancy, as there is no significant qualitative impact. We will explain later why the T w performance comparison leads to a benefit of T w ∈ {1000, 2000} variants. (According to the redundancy, there is no figure for T w = 2000, because there are no qualitatively new effects in the analyzed scenarios). The partial plots of Figure 3 are organized according T w and the choices of w 1 and w 2 : w 1 (case y = 1), w 2 (case y = 2), and z = 1, z = 2 (see Equation (7)). As one can see, the use of different weights and various intervals of p L changes the separation effects of A and B. For instance, y = 1 admits the substantial separation for the control parameter −150 < p L < −50. On the other hand, there is no change in the variant y = 2 (i.e. for w 2 ∼D p L ) but there might be hope in the −150 < p L < −100 domain.
However, how does the size of the window affect the separation into A, B? Obviously, not all window sizes are the source of appropriate solutions. Systematic results from T w ∈ [0, 1800] are summarized in Figure 4 for four combinations y, z as well as for constant p L = −100. Prior to these calculations, we verified that above p L = −50 the separation between A and B is blurred. In addition, somewhere above T w = 2000, the results are burdened by considerable diversification and specimen specificity. Another extreme of classification is the small T w domain (for given data, say T w < 200). This provides very good statistical estimates of averages, but determined only on the basis of a series of significantly biased local entropy.

Comparison of Methods for Specific Time-Series Classification
The purpose of this subsection is to show the broader context and specific comparison between methods. The scope and proposals of comparison are based on the following principles and motivations: 1.
the evaluation with the goals to emphasize the gains within the framework of applicability; 2.
the design of new potential classifiers with unified and specific mathematical structure; 3.
the comparison of new and previously established classification schemes; 4.
the identification of the proper parameters (meta-parameters) that are useful for the classification.  Three other indicators are used to compare with Tsallis-based strategies. On the one hand, although the new effective indicators focus on specific aspects, their common feature is the use of the Lehmer average.

Classification Adapted from Kullback-Leibler Form
To adapt our attempts to one of the more traditional approaches of classification, we let ourselves be inspired by the concept of difference and dissimilarity. Therefore, in one of our alternative proposals we favor the use of Kullback-Leibler form.
Let us consider a problem-specific form of Kullback-Leibler divergence (9) measuring the difference between the original {π k } 3 k=0 and the symmetric reference distribution The parametric form dependent on θ KL is suggested to play role similar to q T . To be consistent with the previous classification by means of q TE11 , we proposed In addition to testing by means of S KL (θ KL ), we work with symmetrized form S KL, Then, analogously as in the case of Equation (11), we defined θ KLE,sym . Obviously, the symmetry achieved by the exchange of distributions brings the classification process much closer to the concept of distance.
The numerical results obtained for θ KLE (p L ), θ KLE,sym (p L ) are shown in Figure 5. They indicate that symmetrization does not provide remarkable differences in the outputs. In addition, there is some robustness in the process of integration. The use of many simulation cycles shows that the choice off [θ m , θ M ] is less important for the global quality of the classification. This is in part due to observed fact that specific regions of p L may improve the accuracy classification process.
The classification based θ KLE is freely inspired by the nearest centroid classification method (see, e.g., application in protein detection [28]). The method is based on the premise of distance from the positions of centroids. Inspired by this approach, we have used parametrized reference distributions instead of centroid to define the possible neighbors. The concept of distance, albeit in probability space, remains the basic determinant. Nevertheless, we assume that the positions of centroids are not critical to successful classification. We replaced them by simple reference distribution approach. This is due to the classification refinement by the application of L{.} with p L choices, which represent the meta-optimization type settings.

Classification which Converts the Original Time Series into Rényi Entropy Series
In analogy with the structure of the effective parameter q TE11 defined by Equation (8) we propose The scheme is built on Rényi entropy in which one parameter α R > 0 is present. Similar to other applications we propose here, the values of α R are delimited by the selection of the interval [α m , α M ]. The averaging of the entropy series represented by L{S R }(p L , α R ) is understood in the sense of Equation (5). Again, as in the case of q TEyz , two 1d integrations over α R and α R are present in Equation (12). In agreement with previous minimalist implementation of integration rules we delimit ourselves to the ten function values that contribute to integration quadrature.

Problem of Sarle's b Revisited
In this subsection we revisit the problem of Sarle's coefficient which standarly serves for diagnosing bimodality. In distinction to previous models we do not use probability distributions, but instead conditional local statistical averages which are constructed by Kurtosis(T wdn , T wup ) = Arithm.Mean z 4 t t ∈ [T wdn , T wup ] with auxiliary variable By combining Equation (14) terms, we get the interval (local) value b Sarle (T wdn , T wup ) = 1 + Skewness 2 (T wdn , T wup ) Kurtosis(T wdn , T wup ) (16) applicable for ∀t ∈ [T wdn , T wup ]. However, observations showed b Sarle (., .) sequence highly fluctuates in time within the samples. It implies that some generalized form of signal averaging is required to evaluate samples as a whole. Previous practice has indicated that we must be selective in dealing with fluctuations in the different signal parts. Thus, the Lehmer average L({ b Sarle } )(p L ) over the { b Sarle (T wdn , T wup )} set of events is powerful option. With the selective averaging we obtained results depicted in Figure 5. They clearly explain why the original Sarle's indicator (its value can be roughly associated with small p L ) is not sufficient for the classification and why the modification by means of the selective Lehmer weights play crucial role in the classification.

Integration over the p L Values-Option for t-Testing
We assume that it can be correctly expressed in the cumulative manner in which a particular number is assigned to each sample. To this end, for j-th sample we introduced the indicator Here label(. . .) is the operator that assigns the respective label sets label(A), label(B) to the possible inputs A or B. The following comments on the above formula must be made: (I) No high precision integration over p L is required. The approximate tool for integral calculus we use is based on standard Riemann partitioning by means of 10 uniform rectangles per [p m , p M ]. It is important to note that it is not the precision of the integration itself, but the contribution to the level of deviations between the projections of A, B that matters most. (II) The integration boundaries p m , p M should be properly chosen to include the negative relevant p L . We used p m = −150, p M = 0.
The selected statistical characteristics of {I (j) TEyz } for j ∈ label(A) and j ∈ label(B) are summarized in the Table 1. In all investigated cases of T w , it was unexpectedly found that the average values of TE22 } showed higher relative medians (approximately 10 percent) when comparing A, B. This also indirectly points to the importance of introducing w 2 including the derivative of L{.} (see Equation (7)). However, the illustrative summary involved in Table 1 does not accurately represent the role of fluctuations. In the Table 2, a statistically more accurate standard view is given. It presents statistical testing based on the two-sample t-values calculated using where #A, #B are the respective cardinalities, while Var(. . .) stands for the unbiased variance. Therefore, by means of I (j) TEyz we use guidelines developed in the hypothesis testing. The degrees of freedom of t-distribution d f are taken in the consistence with the standard Welch's modified statistics [29,30]. Two samples, two sided t-tests for mean difference, the null hypotheses t AByz = 0 are tested against t AByz = 0 alternatives. As a result, the significance of p-values supports the rejection of the null hypothesis tested in all four I TEyz cases. Owing to the tendency to believe alternative hypotheses, the conclusions from the t-test are fully consistent with the classification proposed for A and B. The t-test is interestingly in some contrast with findings regarding the best practice for the choice of TEy,z . The tests generally provide higher t for (y, z) ∈ {(1, 1), (1, 2)}. However, this result does not preclude the use of (y, z) ∈ {(2, 1), (2, 2)} options, as the corresponding values of t remain very high in all situations.  (17). In line with the previous considerations we deal with the three selected values of T w . The differences in the corresponding values of A and B in the respective columns Min, 1st Qu, . . . , Max. (Note that 1st Qu means first quartile, while 3rd Qu is the label of the third quartile of observations.) All A items are larger than B, indicating observable separability at different time window sizes. For example, greater inter-group changes might indicate a better contrast in distinguishing between classes A and B. For clarity, the items where the relative median changes exceed 10 percent are marked with an asterisk >10% . In such cases the corresponding rather strongly varying indicators I TE21 , I TE22 are marked in blue. More passive tendency regarding changes is labeled by the circles • <2% .  Table 2. Comparison of A, B projections of the type I (j) ... quantified in terms of t-statistics. Calculated for four types of I (j) TEyz with the variants (y, z) ∈ {(1, 1); (1, 2); (2, 1); (2, 2)}. The effective number of degrees of freedom d f is calculated, which represents the input of Student's t distribution function. Accordingly, these sufficiently small p-values imply the rejecting of H0: t AByz = 0.

Conclusions
While rich in information, single-molecule data are often heterogeneous and extensive. Additionally, the detection of rare and slow exchanging molecular states (category A) can be challenging due to interference with inactive, dormant states (category B). Here we have developed a specific supervised learning approach to address state classification problems in time-series data originating in a single molecule experiment. Our approach enables a clear identification of dormant molecular states and, hence, it makes statistical evaluations possible. Once statistical evaluation is performed, the analysis can proceed further to evaluate and characterize rare molecular states. While our particular method, where entropy is an important component of the evaluation, has shown progress, it can be further developed in a variety of directions.
For example, the additional goal and the next step might be to optimize the efficiency of categorization. Thanks to the outcomes of the statistical tests, t-values can be used as an optimization criterion. In this respect, there may be different choices for w 1 , w 2 , which may cause variations in the efficiency of the separation of A from B time series classes. Thus, the next goal also be to concentrate more systematically on the function spaces generated by w 1 , w 2 arguments.
The comparison of several methodological variants shows that Lehmer averaging has a much deeper impact on results than we originally expected. The optimality of the classification may come from different sources and effects, which is also confirmed by the fact that it manifests itself in different areas of the control parameter p L .
Using the transition probabilities for a sequence of stable molecular states, one can systematically explore the potential of the entropy-based approach. For example, a transition study will certainly offer a new perspective on updating the classification. Furthermore, adaptive conditional averages used herein can improve the manner of discriminating the state of space. These inputs can be implemented by HMM Viterbi's method, which is considered standard in today's analysis. Hence, our new conceptual framework can further enhance an in-depth understanding of the dynamics of individual molecules.