A Simpliﬁed Matrix Formulation for Sensitivity Analysis of Hidden Markov Models

: In this paper, a new algorithm for sensitivity analysis of discrete hidden Markov models (HMMs) is proposed. Sensitivity analysis is a general technique for investigating the robustness of the output of a system model. Sensitivity analysis of probabilistic networks has recently been studied extensively. This has resulted in the development of mathematical relations between a parameter and an output probability of interest and also methods for establishing the effects of parameter variations on decisions. Sensitivity analysis in HMMs has usually been performed by taking small perturbations in parameter values and re-computing the output probability of interest. As recent studies show, the sensitivity analysis of an HMM can be performed using a functional relationship that describes how an output probability varies as the network’s parameters of interest change. To derive this sensitivity function, existing Bayesian network algorithms have been employed for HMMs. These algorithms are computationally inefﬁcient as the length of the observation sequence and the number of parameters increases. In this study, a simpliﬁed efﬁcient matrix-based algorithm for computing the coefﬁcients of the sensitivity function for all hidden states and all time steps is proposed and an example is presented.


Introduction
A Hidden Markov Model (HMM) is a stochastic model of the dynamic process of two related random processes that evolve over time.A set of stochastic processes that produces the sequence of observed symbols is used to infer an underlying stochastic process that is not observable (hidden states).HMMs have been widely utilized in many application areas including speech recognition [1], bioinformatics [2], finance [3], computer vision [4], and driver behavior modeling [5,6].A comprehensive survey on the applications of HMMs is presented in [7].A simple dynamic Bayesian network can be used to represent a stationary HMM with discrete variables [8,9].Consequently, algorithms and theoretical results developed for dynamic Bayesian networks can be applied to HMMs as well.
Sensitivity analysis in HMMs has been done by perturbation analysis in which a brute-force computation of the effect of small changes on the output probability of interest is done [10,11].Sensitivity analysis research has shown that a functional relationship can be established between a parameter variation and the output probability of interest.This function can represent the effect of any change in the parameter under consideration as compared to the perturbation analysis.Consequently, the goal of sensitivity analysis becomes developing techniques to create this function.This general function will also enable us to compute bounds on the output probabilities without sensitivity analysis [12][13][14][15].
The objective of this work is to develop an algorithm for sensitivity analysis in HMMs directly from the model's representation.Our focus is on parameter sensitivity analysis where the variation of the output is studied as the model parameters vary.In this case, the parameters are the (conditional) probabilities of the HMM model (initial, transition, and observation parameters) and the output is some posterior probability of interest (filtering, smoothing, predicted future observations, and most probable path).
Parametric sensitivity analysis can be used for multiple purposes.The sensitivity analysis can be used to identify those parameters which have significant impact on the output probability of interest.Consequently, the result can be used as a basis for parameter tuning and studying the robustness of the model output to changes in the parameters [16][17][18].
Many researchers study the problem of sensitivity analysis in Bayesian Networks and HMMs.There are two approaches used to compute the constants of the sensitivity function in the standard Bayesian network inference algorithms.The first method is solving systems of linear equations [19] by evaluating the sensitivity function for different values of the varying parameter θ.The second is based on a differential approach [20] by which the coefficients of a multivariate sensitivity function can be computed from partial derivatives.The tailored version of the junction tree algorithm has been used to establish the sensitivity functions in Bayesian networks more efficiently [21,22].In [23], sensitivity analysis of Bayesian networks for multiple parameter changes is presented.In [24], credal sets that are known to represent the probabilities of interest are used for sensitivity analysis of Markov chains.In [10,11], sensitivity analysis of HMMs based on small perturbations in the parameter values is presented.A tight perturbation bound is derived and it is shown that the distribution of the HMM tends to be more sensitive to perturbations in the emission probabilities than to the transition and initial distributions.A tailored approach for HMM sensitivity analysis, the Coefficient-Matrix-Fill procedure, is presented in [25,26].It directly utilizes the recursive probability expressions in an HMM.Consequently, it improves the computational complexity of applying the existing approaches for sensitivity analysis in Bayesian networks to HMMs.In [27], imprecise HMMs are presented as a tool for performing sensitivity analysis of HMMs.
In this paper, we propose a sensitivity analysis algorithm for HMMs using a simplified matrix formulation directly from the model representation based on a recently proposed technique called the Coefficient-Matrix-Fill procedure [26].Until recently, sensitivity analysis in HMMs has been performed using Bayesian network sensitivity analysis techniques.The HMM is represented as a dynamic Bayesian network unrolled for a fixed number of time slices, and the Bayesian sensitivity algorithms are used.However, these algorithms do not utilize the HMMs' recursive probability formulations.In this work, a simple algorithm based on a simplified matrix formulation is proposed.In this algorithm, to calculate the coefficients of the sensitivity functions, a series of Forward matrices F k , k = 1, ..., t are used, where k represents the time slice in the observation sequence.The matrices (Initial, Transition, and Observation) where the corresponding model parameter θ varies are decomposed into the parts independent of and dependent on θ for mathematical convenience.This enables us to compute the coefficients of the sensitivity function at each iteration in the recursive probability expression and implement the algorithm in a computer program.These matrices are computed based on a proportional co-variation assumption.The Observation Matrix O is represented as n × n diagonal matrices O t whose vth diagonal entry is P(y t e |x t v ) for each state v and whose other entries are 0 at the time t of the observation sequence.The proposed algorithm is computationally efficient as the length of the observation sequence increases.It computes the coefficients of a polynomial sensitivity function by filling coefficient matrices for all hidden states and all time steps.
The paper is organized as follows.In Section 2, the background on HMM and Sensitivity Analysis in HMM is presented.The details of the proposed algorithm for the filtering probability sensitivity function are explained in Section 3. In Section 4, the sensitivity analysis of the smoothing probability using the proposed algorithm is discussed.Finally, the paper is concluded with a summary of the results achieved and recommendations for future research works.

Background
In this section, we present the background on HMMs and Sensitivity Analysis in HMMs.First, we will discuss HMMs, HMM inference tasks and their corresponding recursive probability expressions.Then sensitivity analysis in HMMs is explained based on the assumption of proportional co-variation, and a univariate polynomial sensitivity function whose coefficients the proposed algorithm computes is defined.Here, the variables are denoted by capital letters and their values by lower case letters.

Hidden Markov Models
An HMM is a stochastic statistical model of a discrete Markov chain of a finite number of hidden variables X that can be observed by a sequence of a set of output variables Y from a sensor or other sources.The probability of transitioning from one state to another in this Markov chain is time-invariant, which makes the model stationary.The observed variables Y are stochastic with the observation (emission) probabilities, which are also time invariant.The overall HMM consists of n distinct hidden states of the Markov chain and m corresponding observable symbols.In general, the observations can be discrete or continuous, however, in this work, we focus on the discrete observations.The variable X has n ≥ 2 hidden states, denoted by x i , 1 ≤ i ≤ n, and the variable Y has m ≥ 2 observable symbols, denoted by y j , 1 ≤ j ≤ m.Formally, an HMM Λ is specified by a set of parameters (A, O, Γ) that are defined as follows: 1.The prior probability distribution (initial vector) Γ has entries γ i = p(x i ), 1 ≤ i ≤ n, which are the probabilities of state x i being the first state in the Markov chain. 2. The transition matrix A has entries a i,j = p(x j |x i ), 1 ≤ i, j ≤ n, which are the probabilities of transit from state x i to state x j in the Markov chain.

The observation matrix O has entries
which are the probabilities to observe y j if the current state is x i .
An example of an HMM where X has two states and Y three symbols is shown in Figure 1a.The two states are x 1 and x 2 and based on them three symbols y 1 , y 2 or y 3 are observed.The parameters of the HMM (including initial vector, transition matrix and observation matrix) are also shown in Figure 1a.
The dynamic Bayesian network representation of the HMM in Figure 1a is shown in Figure 1b, unrolled for T time slices [8,9].The initial vector, transition matrix and observation matrix of the HMM are represented by the labels Γ, A and O respectively, attached to the nodes in the Bayesian network graph.The superscript for the variables X and Y and their values indicate the time slice under consideration.
In e where e is a sequence of observations from the set y j , 1 ≤ j ≤ m.The representation x t i denotes the actual state x i , 1 ≤ i ≤ n, for the variable X in time slice t.

Inference in HMMs
In temporal models, inference means computing the conditional probability distribution of X at time t, given the evidence up to and including time T, that is p(X t |y 1:T e ).The main inference tasks include filtering for T = t, prediction of a future state for t > T and smoothing, that, is inferring the past for t < T. In an HMM, the Forward-Backward algorithms are used for the inference tasks and training of the model using an iterative procedure called the Baum-Welch method (Expectation Maximization) [28][29][30].The Forward-Backward algorithms computes the following probabilities for all hidden states i at time t ≤ T:

•
forward probability F(i, t) = p(x t i , y For T < t, the algorithm can be applied by taking B(i, t) = 1 and adopting the convention that the configuration of an empty set of observations is TRUE, i.e., y T+1:t e ≡ TRUE, giving F(i, t) = p(x t i , y 1:t e ) = p(x t i , y 1:T e , TRUE) = p(x t i , y 1:T e ) As shown in Equation (1), any conditional probability p(x t i |y 1:T e ) can be calculated from the joint probabilities p(x t i , y 1:T e ) for all hidden states i using Bayes' theorem.Since our objective is sensitivity analysis of the HMM output probability of interest for parameter variations, in the remainder of this paper our focus will be on the joint probabilities p(x t i , y 1:T e ) for all hidden states i.The other important inference tasks are the prediction of future observations, i.e., p(y t e |y 1:T e ) for t > T, finding the most probable explanation, i.e., arg max x 1:t i p(x 1:t i |y 1:t e ) using the Viterbi Algorithm and learning the HMM parameters as new observations come in using the Baum-Welch algorithm, which can be achieved by the Forward-Backward algorithms.The prediction of future observations can be computed as the fraction of the two probabilities p(y t e , y 1:T e ), t > T, and p(y 1:T e ), which are computed using forward probabilities.
Note that if t > T + 1, then p(y t e , y 1:T e ) can be computed by setting all in-between observations y k e , t > k > T, to TRUE as above.

Recursive Probability Expressions
In order to establish the sensitivity functions for HMMs directly from the model's representation, the recursive probability expression for the Forward-Backward algorithm should be investigated.The repetitive character of the model parameters in the Forward-Backward algorithm is used to drive the sensitivity functions.Consequently, in this section the recursive expressions of the Forward-Backward algorithm [29,31] are reviewed briefly.
Filtering.The filter probability is p(x t v , y t e ) for a specific state v of X.This probability is the same as the forward probability F(v, t) in the Forward-Backward algorithm.For time slice t = 1, we simply have that where e 1 corresponds to the symbol of Y that is observed at time 1.For time slices t > 1, we exploit the fact that The first factor in the above product corresponds to o v,e t ; conditioning the second factor on the n states of X t−1 , we find with Taken together, we find for F(v, t) = p(x t v , y 1:t e ) the recursive expression Prediction.The prediction probability p(x t v , y 1:T e ) for t > T can be handled by the the Forward-Backward algorithm by computing F(v, t) with an empty set of evidence for Y T+1:t .It can be implemented by replacing the term o v,e t by 1 for t > T in Equation ( 2).As a result, for T = 0, the prediction probability p(x t v , y 1:T e ) becomes the prior probability p(x t v ).The prediction task can be seen as a special case of the filtering task.
Smoothing.Finally, we consider a smoothing probability p(x t v , y 1:T e ) with t < T. Using Y t+1:T ⊥ Y 1:t |X t we find that The second term in this product is a filter probability; the first term is the same as the backward probability B(v, t) in the Forward-Backward algorithm.By conditioning the first term on X t+1 and exploiting the independences Taken together, we find for B(v, t) = p(y t+1:T e |x t v ) the recursive expression In this work, we present the sensitivity of the HMM for filtering and smoothing probabilities for transition, initial and observation parameter variation.Therefore, the above discussion on the recursive probability expression for the filtering probability, Equation (2), and smoothing probability Equation (3), will be used to formulate our algorithm.

Sensitivity Analysis in HMM
As shown in recent studies [13,25,32,33], sensitivity analysis is establishing a functional relationship that describes how output probability varies as the network's parameters change.A posterior marginal probabilities, y = p(v|e), where v is value of a variable V and e is the evidence available, is considered to represent the output probabilities of interest.The network parameter under consideration is represented as θ = p(v j |π) for a value v j of a variable V, and π is an arbitrary combination of the values of the set of evidences of V.So p(v|e)(θ) represents the posterior marginal probabilities as a function of θ.
Consequently, when the parameter θ = p(v j |π) varies, each of the other conditional probabilities θ = p(v i |π) from the same distribution co-vary accordingly by the ratio between the the remaining probabilities.That is, if a parameter θ = p(v j |π) for a variable V is varied, then The co-variation simplifies to p(v i |π)(θ) = 1 − θ for binary-valued V.The sensitivity function is defined based on the stated proportional co-variation assumption.The constants in the general form of the sensitivity function are generated from the network's parameters that are neither varied nor co-varied, and their values depend on the output probability of interest.A sensitivity function for a posterior probability of interest is a quotient of two polynomial functions, since p(v|e) = p(v e)/p(e), and hence a rational function.
We have the following univariate polynomial sensitivity function to define the joint probability of a hidden state and a sequence of observations as a function of a model parameter θ.
The coefficients c t v,i are derived from the network parameters that are not co-varied by assuming proportional co-variation of the parameters from the same conditional distribution.The coefficients c t v,i depend on the hidden state v and time slice t under consideration (see for details [25,26]).

Sensitivity Analysis of Filtering Probability
In this section, the simplified matrix formulation for sensitivity of the filtering probability to the variation of the initial, transition and observation parameters is proposed.Let us consider the sensitivity functions p(x t v , y 1:t e )(θ) for a filtering probability and the variation of one of the model parameters, viz., θ.From Equation (2), the forward recursive filtering probability expression, it follows that where for transition parameter variation θ = θ a a z,v otherwise and The sensitivity function p(x t v , y 1:t e )(θ) coefficients are computed by the proposed method using a series of Forward matrices F k , k = 1, ..., t.The sensitivity of the filtering probability for transition, initial and observation parameter variations is formulated in the next subsections.

Transition Parameter Variation
In this subsection, the simplified matrix formulation (SMF) for sensitivity of the filtering probability to variation of the transition parameter is proposed.Let us consider the sensitivity functions p(x t v , y 1:t e )(θ a ) for a filtering probability and transition parameter θ a = a r,s .From Equation ( 6), the recursive filtering probability sensitivity function, it follows that for t = 1 we have the constant In the next sub-subsections, the formulation of the proposed method to compute the coefficients of the sensitivity function in Equation ( 5) is presented in detail.First, the decomposition of the transition matrix into the parts of A independent of and dependent on θ a and the observation matrix representation for the simplified matrix formulation is presented.Then the SMF representation for the sensitivity of the filtering probability to transition parameter variation and its algorithm implementation are discussed.

Transition Matrix Decomposition and Observation Matrix Representation
In the proposed algorithm, to calculate the coefficients in a series of Forward matrices F k , k = 1, ..., t, the Transition Matrix A is divided into Ā and A that are independent of and dependent on θ a , respectively, for mathematical convenience.The matrices Ā and A are computed based on the proportional co-variation assumption shown in Equation ( 4).Let us explain the decomposition considering an HMM with an n × n Transition Matrix A with the transition parameter ).This makes the transition matrix A a function of θ a , and, with the proportional co-variation assumption shown in Equation (4), it becomes This matrix is divided into Ā and A that are independent of and dependent on θ a , respectively, as follows, and such that A(θ a ) = Ā + A(θ a ).
In short, this decomposition can be represented in the algorithm simply as Ā and A that are independent of and dependent on θ a , respectively.
In the simplified matrix representation, the Observation Matrix O is represented as an n × n diagonal matrices O t whose v th diagonal entry is P(y t e |x t v ) for each state v and whose other entries are 0 at the time t of the observation sequence [31].Let us explain this representation by considering the HMM with an n × m Observation Matrix O. Given the following sequence of observations: y 1 2 , y 2 1 and y 3 3 , the corresponding diagonal matrices become

SMF for Sensitivity Analysis of the Filtering Probability to the Transition Parameter Variation
In the Simplified Matrix Representation (SMF), the sensitivity function p(x t v , y 1:t e )(θ a ) shown in Equation ( 7) can be written as a simple matrix-vector multiplication.It follows that for t = 1 we have the constant p(x which is represented in SMF as The overall procedure of computing the sensitivity coefficients of p(X t , y t e )(θ a ) using the SMF is described in the following algorithm.

Algorithm Implementation
This algorithm summarizes the proposed technique, where e is the sequence of observations and θ = a r,s , and solves the recursive Equation (7).Once the Transition and Observation matrices are represented in SMF as explained above, the sensitivity coefficients are computed by filling the forward matrices The details of filling contents of the forward matrices in Algorithm 1 are explained as follows.F 1 is initialized by the matrix multiplications of O 1 , which is a diagonal matrix whose v th diagonal entry is P(y 1 e |x 1 v ), and the initial distribution vector Γ , The remaining matrices F k of size n × k, 2 ≤ k ≤ t, are initialized by filling them with zeros.Two temporary matrices Ftmp 1 and Ftmp 2 are used for computing the coefficients as explained previously.Ftmp 1 is computed as the matrix multiplication of O k , Ā (which is independent of θ a ) and F k−1 , and a zero vector of size n ( = the number of states) is appended at its end.F k−1 represents the recursion in the filtering probability.In the same way Ftmp 2 is computed as the matrix multiplication of O k , A (which is dependent on θ a ) and F k−1 , and a zero vector of size n is appended to the front.Then F k is set as the sum of Ftmp 1 and Ftmp 2 : 12:

Initial Parameter Variation
In this subsection, the sensitivity of the filtering probability to the initial parameter variation is derived based on SMF.Let us consider the sensitivity functions p(x t v , y 1:t e )(θ γ ) for a filtering probability and variation in the initial parameter θ γ = γ r .From Equation ( 6), the recursive filtering probability sensitivity expression, for t = 1, we have p(x 1 v , y 1 e )(θ γ ) = o v,e 1 • γ v (θ γ ), and for t > 1,

Initial Vector Decomposition
In the proposed algorithm, to calculate the coefficients in a series of forward matrices F k , k = 1, ..., t, the Initial Vector Γ is divided into Γ and Γ that are independent of and dependent on θ γ , respectively, for mathematical convenience.The matrices Γ and Γ are computed based on the proportional co-variation assumption shown in Equation ( 4).
Let us explain the Initial Vector Γ decomposition considering the HMM presented in Section 3.1 where the Initial Vector 2 ).This makes the Initial Vector Γ a function of θ γ , and, with the proportional co-variation assumption shown in Equation ( 4), it becomes This matrix is divided into Γ and Γ that are independent of and dependent on θ γ , respectively, as follows, and such that Γ(θ γ ) = Γ + Γ(θ γ ).
In short, this decomposition can be represented simply as Γ and Γ that are independent of and dependent on θ γ , respectively.
Once the Initial Vector Γ is decomposed as shown above, to compute the coefficients of the sensitivity function p(x t v , y 1:t e )(θ γ ) using the recursive Equation ( 8), the Observation Matrix O is represented as an n × n diagonal matrices O t .

SMF for Sensitivity Analysis of the Filtering Probability to the Initial Parameter Variation
In the simplified matrix representation, the sensitivity function p(x t v , y 1:t e )(θ γ ) shown in Equation ( 8) can be written as a simple matrix-vector multiplication.It follows that for t = 1 we have the constant p(x To solve the forward (filtering) probability sensitivity function p(x t v , y 1:t e )(θ γ ), the following algorithm implementation is presented.

Algorithm Implementation
This procedure summarizes the proposed method, where e is the sequence of observations and θ = θ γ , and solves the recursive Equation (8).
As shown in Algorithm 2, once the Initial Vector is decomposed into the parts, Γ independent of θ γ and Γ dependent on θ γ and the Observation Matrix is represented in the diagonal matrices, the sensitivity coefficients are computed by filling the forward matrices F k for k = 1 • • • t.The details of filling the forward matrices in Algorithm 2 are discussed as follows.F 1 is initialized as a sum of two temporary matrices Ftmp 1 and Ftmp 2 which are used to compute the coefficients for the parts independent of and dependent on θ γ , respectively.Ftmp 1 is computed as the matrix multiplication of O 1 and Γ , and a zero vector of size n is appended at its end to represent the zero coefficients of degree 1.In the same way, Ftmp 2 is computed as the matrix multiplication of O 1 and Γ , and a zero vector of size n is appended in front of it to represent the zero coefficients of degree 0. Then Algorithm 2 This computes the coefficients of the forward (filtering) probability sensitivity function p(x t v , y 1:t e )(θ γ ) with θ γ = γ r in forward matrices F k for k = 1, • • • , t using SMF.Input: A, O, Γ, e, and θ γ Output: The remaining matrices F k of size n × k, 2 ≤ k ≤ t, are computed using recursion.They are computed as the matrix multiplication of O k , A and F k−1 .

Observation Parameter Variation
In this subsection, the sensitivity of the filtering probability p(x t v , y 1:t e ) to the observation parameter variation is derived based on SMF.Let us consider the sensitivity functions p(x t v , y 1:t e )(θ o ) for a filtering probability and variation in the observation parameter θ o = o r,s .From Equation ( 6), the recursive filtering probability sensitivity function, it follows that for t = 1 we have p(

Observation Matrix Decomposition
In the proposed algorithm, to calculate the coefficients in a series of forward matrices F k , k = 1, ..., t, the Observation Matrix O is divided into Ō and O that are independent of and dependent on θ o , respectively, for mathematical convenience.The matrices Ō and O are computed based on the proportional co-variation assumed as shown in Equation ( 4).Let us explain the Observation Matrix O decomposition by considering the HMM presented in the previous section and let the observation parameter θ o be o 2,1 = p(y t 1 |x t 2 ).The observation matrix O becomes a function of θ o based on the proportional co-variation assumption shown in Equation ( 4) as follows.
This matrix is decomposed into Ō and O that are independent of and dependent on θ o , respectively, such that O(θ o ) = Ō + O(θ o ), just as we decomposed A(θ a ) = Ā + A(θ a ) in Section 3.1.1.

SMF for Sensitivity Analysis of the Filtering Probability to the Observation Parameter Variation
In the simplified matrix representation, the sensitivity function p(x t v , y 1:t e )(θ o ) can be written as a simple matrix-vector multiplication.From Equation ( 9), it follows that for t = 1 we have the constant which is represented as The derivation of the proposed technique for the sensitivity analysis of the forward (filtering) probability p(x t v , y

Algorithm Implementation
In this algorithm, e is the sequence of observation and θ = θ o .It solves the recursive sensitivity function Equation (9) in forward matrices F k for k = 1, • • • , t.The observation matrix is decomposed into the independent Ō and dependent O observation matrices on θ o .Then, these observation matrices are represented as diagonal matrices in the SMF computation.The detailed steps of filling the contents of the forward matrices F k are explained as follows.F 1 is initialized as a sum of two temporary matrices Ftmp 1 and Ftmp 2 , which are used to compute the coefficients for the independent and dependent parts on θ o .Ftmp 1 is computed as the matrix multiplication of Ō1 and Γ , and a zero vector of size n states is appended at its end to represent the zero coefficients of degree 1.In the same way, Ftmp 2 is computed as the matrix multiplication of O 1 and Γ , and a zero vector of size n states is appended in front of it to represent the zero coefficients of degree 0.
The remaining matrices F k of size n × k, 2 ≤ k ≤ t, are computed using recursion.Again, two temporary matrices Ftmp 1 and Ftmp 2 are used for computing the coefficients as it is explained above.Here, Ftmp 1 is computed as the matrix multiplication of the diagonal matrix Ōk that is independent of θ o , A and F k−1 , and a zero vector of size n states is appended at its end.F k−1 represents the recursion in the filtering probability.Likewise, Ftmp 2 is computed as the matrix multiplication of the diagonal matrix O k that is dependent on θ o , A and F k−1 , and a zero vector of size n states is appended in front of it.Then, F k is set as the sum of Ftmp 1 and Ftmp 2 :

Filtering Sensitivity Function Example
The proposed algorithm is illustrated by the following example.The example is also used to illustrate the computations in the Coefficient-Matrix-Fill procedure in [26].The computed coefficients using the proposed algorithm are the same as those computed using the Coefficient-Matrix-Fill procedure.However, the proposed algorithm is simplified and works for any number of hidden states and evidence (observation) variables.It is also efficient in terms of computational time as the length of the observation sequence increases.Let us compute the sensitivity functions for the filtering probability using Algorithm 1 for the two states of X t as a function of transition parameter θ a = a 2,1 = p(x t 1 |x t−1 2 ) = 0.15 given the following sequence of observations: y 1 2 , y 2 1 and y 3 1 .The simplified matrix formulation procedure divides the Transition Matrix A into Ā and A : A = 0.95 0.05 For the observation sequence given, y 1 2 , y The coefficients for the sensitivity functions are computed by the proposed procedure using the forward matrices that are constructed as follows: Then, F 2 = Ftmp 1 + Ftmp 2 , where the temporary matrices Ftmp 1 and Ftmp 2 are computed as follows: and finally, for F 3 : The sensitivity functions for probability of evidence by summing column entries of the forward matrices become: and p(y The sensitivity functions for the two filtering tasks become:

Run-time Analysis
The time for computing the coefficients of the filtering probabilities for transition parameter variation is recorded for the Coefficient-Matrix-Fill procedure and our method (SMF) using the HMM shown in Example (1).It is shown in Figure 3, where the Coefficient-Matrix-Fill procedure is labled as CMFP.In [26], the algorithm for the Coefficient-Matrix-Fill procedure is presented.It is shown for an HMM model with n = 2 hidden states and m = 2 observable symbols.The proposed SMF algorithm has shown a significant improvement in computational time compared to the Coefficient-Matrix-Fill procedure as the length of the observation sequence increases.For example, for an observation sequence length of 1000, the proposed technique takes a mean time of 0.048 s.In comparison, it takes the Coefficient-Matrix-Fill procedure 2.183 seconds.This is an improvement of 98%.The algorithms are implemented in MATLAB on a 64-bit Windows 7 Professional Workstation (Dell Precision T7600 with Intel(R) Xeon CPU E5-2680 0 @ 2.70 GHz dual core processors and 64GB RAM).
The computational complexity of our method is linear time, O(l), where l is the length of the observation sequence, whereas the Coefficient-Matrix-Fill procedure is quadratic time, O(l 2 ).In our method, the sensitivity function coefficients are computed in one for loop that runs for the length of the observation sequence, as shown in Algorithms 1-6, whereas in the Coefficient-Matrix-Fill procedure two nested for loops are used.It computes the sensitivity coefficients in the forward matrices element-by-element in the inner loop, which runs in increasing time up to the maximum of the length of the observation sequence, and the outer loop runs for the length of the observation sequence.

Sensitivity Analysis of Smoothing Probability
In this section, the sensitivity analysis of smoothing probability for variation of the transition, initial and observation parameters are presented using simplified matrix formulation.The sensitivity function p(x t v , y 1:T e )(θ) where t < T and θ is one of the model parameters, can be computed using the recursive probability expression shown in Equation ( 2) and (3).p(x t v , y 1:T e )(θ) = p(x t v , y 1:t e , y t+1:T e )(θ) = p(y t+1:T The first term in this product can be computed using the backward procedure as shown in Equation (3); the second term is the filtering probability that is computed using the forward procedure.Consequently, the sensitivity function for the smoothing probability p(x t v , y 1:T e )(θ) is the product of the polynomial sensitivity functions computed using the forward and backward procedures.The sensitivity function for the filtering probability p(x t v , y 1:t e )(θ) is discussed in Section 3. From Equation (3), the backward recursive probability expression, it follows that p(y t+1:T where To compute the coefficients of the sensitivity function p(y t+1:T e |x t v )(θ), a series of Backward matrices B k , k = t, ..., T − 1, based on the simplified matrix formulation as presented in Section 3 are used.In the next subsections, the sensitivity functions for the smoothing probability p(x t v , y 1:T e )(θ) for transition, initial and observation parameter variations are discussed.

Algorithm Implementation
In this algorithm, e is the sequence of observation and θ = θ a .It solves the recursive backward probability sensitivity function Equation (10) using the backward matrices B k for k = T, • • • , t.Once the transition matrix is decomposed into the transition matrices Ā independent of θ a and A dependent on θ a , the coefficients of the sensitivity function are computed by filling the backward matrices B k as follows.First, B T is initialized as a vector of ones for n hidden states.

B T = ones(n, 1)
The remaining matrices B k of size n × k, k = T − 1 ≤ k ≤ t are computed by filling them as follows: Once the observation matrix O k+1 is represented as a diagonal matrix, two temporary matrices Btmp 1 and Btmp 2 are used for computing the coefficients as previously explained.Btmp 1 is computed as the matrix multiplication of Ā that is independent of θ a , O k+1 and B k+1 and a zero vector of size n is appended at its end.B k+1 represents the recursion in the backward probability.The same way Btmp 2 is computed as the matrix multiplication of A that is dependent on θ a , O k+1 and B k+1 and a zero vector of size n is appended to the front.Then B k is set as the sum of Btmp 1 and Btmp 2 ,

Algorithm Implementation
As discussed above, the sensitivity function for the backward probability p(y t+1:T e |x t v )(θ γ ) with θ γ = γ r is a polynomial of degree zero.The contents of the backward matrices B k are filled as follows.B T is initialized as a vector of ones for n hidden states.

B T = ones(n, 1)
The remaining matrices B k of size n × k, k = T − 1 ≤ k ≤ t, are computed by filling them as follows.Once the observation matrix O k+1 is represented as a diagonal matrix, B k is set as the matrix multiplication of A, O k+1 and B k+1 .B k+1 represents the recursion in the backward probability.The remaining matrices B k of size n × k, k = T − 1 ≤ k ≤ t, are computed by filling them as follows: Once the independent Ōk+1 and dependent O k+1 observation matrices on θ o is represented as a diagonal matrix, two temporary matrices Btmp 1 and Btmp 2 are used for computing the coefficients of the sensitivity function as previously explained.Btmp 1 is computed as the matrix multiplication of A, Ōk+1 and B k+1 .A zero vector of size n is appended at its end.B k+1 represents the recursion in the backward probability.In the same way Btmp 2 is computed as the matrix multiplication of A, O k+1 and B k+1 and a zero vector of size n is appended to the front.Then B k is set as the sum of Btmp 1 and Btmp 2 , B k = Btmp 1 + Btmp 2

Conclusions and Future Research
This research has shown that it is more efficient to compute the coefficients for the HMM sensitivity function directly from the HMM representation.The proposed method exploits the simplified matrix formulation for HMMs.A simple algorithm is presented which computes the coefficients for the sensitivity function of filtering and smoothing probabilities for transition, initial and observation parameter variation for all hidden states, as well as all time steps.This method differs from the other approaches in that it neither depends on a specific computational architecture nor requires a Bayesian network representation of the HMM.
The future extension of this work will include sensitivity analysis of predicted future observations p(y t e |y 1:T e )(θ), t > T, and the most probable explanation for the corresponding parameter variations.
Future research on the sensitivity analysis of HMM, where different types of model parameters are varied simultaneously, as well as the case of continuous observations will be considered.

Figure 1 .
Figure 1.(a) An example of an HMM representation.(b) Its dynamic Bayesian network representation, unrolled for T time slices.

Example 1 .
Consider an HMM with binary-valued hidden state X and binary-valued evidence variable Y. Let Γ = [0.20,0.80] be the initial vector for X 1 .The transition matrix A and observation matrix O be as follows:

Figure 3 .
Figure 3.Time in seconds to compute the sensitivity coefficients for an observation sequence length from 1 to 1000 with a step size of 10.

4. 1 .Algorithm 4
Transition Parameter Variation In the simplified matrix formulation, the sensitivity function p(y t+1:T e |x t v )(θ a ) for transition parameter θ a = a r,s can be represented as follows: p(y t+1:T e |x t v )(θ a ) = A(θ a )O t+1 p(y t+2:T e |x t+1 z )(θ a ) For t = T − 1, p(y T+1:T e |x T z )(θ a ) = 1, the recursive probability expression reduces to the following: p(y T:T e |x T−1 v )(θ a ) = A(θ a )O t+1 The proposed technique for the sensitivity analysis of the backward probability sensitivity function p(y t+1:T e |x t v )(θ a ) for the transition parameter is formulated in Algorithm 4. This computes the coefficients of the backward probability sensitivity function p(y t+1:T e |x t v )(θ a ) in backward matrices B k for k = T,