Data Driven Modelling of Nuclear Power Plant Performance Data as Finite State Machines

: Accurate modelling and simulation of a nuclear power plant are important factors in the strategic planning and maintenance of the plant. Several nonlinearities and multivariable couplings are associated with real-world plants. Therefore, it is quite challenging to model such cyberphysical systems using conventional mathematical equations. A visual analytics approach which addresses these limitations and models both short term as well as long term behaviour of the system is introduced. Principal Component Analysis (PCA) followed by Linear Discriminant Analysis (LDA) is used to extract features from the data, k-means clustering is applied to label the data instances. Finite state machine representation formulated from the clustered data is then used to model the behaviour of cyberphysical systems using system states and state transitions. In this paper, the indicated methodology is deployed over time-series data collected from a nuclear power plant for nine years. It is observed that this approach of combining the machine learning principles with the ﬁnite state machine capabilities facilitates feature exploration, visual analysis, pattern discovery, and effective modelling of nuclear power plant data. In addition, ﬁnite state machine representation supports identiﬁcation of normal and abnormal operation of the plant, thereby suggesting that the given approach captures the anomalous behaviour of the plant.


Introduction
Cyber Physical Systems (CPS) involve proper monitoring and control of physical processes through computational procedures. A feedback mechanism between the embedded computer and physical process is used for modelling and design of such systems [1]. Therefore, accurate modelling of CPS is one of the crucial factors for their efficient and economical operation [2].
Nuclear Power Plants (NPPs) belong to a key domain of CPS whose operation must not only be kept economically viable to compete with other energy sources but also kept safe to avoid any potential hazards to humanity. This, therefore, requires a proper maintenance strategy that does not violate the safety standards as well as generate maximum revenue [3]. The operation of the power plant is remotely monitored with the help of sensors which provide time-stamped measurements for internal and external parameters of the plant. Accurate modelling and simulation of a NPP are important factors in the strategic planning and maintenance of the plant. Physical models used for simulation are based on the physics of the processes and components of the power plant. As the plant ages, the uncertainty of a physical model to correctly represent the plant increases over time. As an alternative, a visual analytics approach to model the working principle of NPP is presented in this paper. Visual analytics helps to provide meaningful insights into the time series data by combining the machine's strength with the analytical understanding of the humans. A straightforward way to visualise a large amount of time-series data is to simply plot them with a visualisation tool. However, such an approach has several drawbacks, namely: (i) it is difficult to identify correlations among data blocks far away from one another; (ii) it is difficult to identify the existence of clustering of data blocks in the display of raw data; and (iii) much human efforts are needed to interpret raw data.
In problem domains dealing with small data space, regular time series plots are sufficient. However, when the data is recorded over longer periods, implementing common tasks such as feature extraction, pattern discovery, labelling of data, and getting summary of a compressed or uncompressed time-series data becomes challenging [4]. These issues can be addressed through interactive visualisation of such long time series, which not only aids in the extraction of meaningful information from the raw data but also helps in understanding the behaviour of the system over time.
For example, the reactivity state of the reactor can be measured as a function of time. This will require an analysis of control rod positions and any burnable poisons in the moderator or coolant. This will provide information on the controllability of the reactor and the efficient use of the nuclear fuel. Usually, statistical models like ARMA and ARIMA are used for time-series data modelling but these are mainly efficient on linear and univariate data and are therefore a misfit in this domain. Hence, the application of Machine learning methods for the visualisation of reactivity data collected over long periods of time can be useful for the plant owner and operator to get a better visualisation of reactor control, safety, and economics.
The idea behind this approach is to transform the data in the time series domain to the feature domain by computing several statistical features of the data. Dimensionality reduction techniques are then applied to extract features that carry the highest amount of information. It ensures escaping from the issue of "curse of dimensionality" as well as facilitates better visualisation in lower dimensional space. Principal Component Analysis (PCA) is used to extract features with the highest variance ratios from the feature domain. Data corresponding to extracted features is grouped with the help of k-means clustering algorithm such that data instances with common characteristics are clustered together. However, an issue associated with PCA is that it lacks the interpretability of the resulting clusters [5]. PCA assigns high weights to the features with more significant variabilities irrespective of whether they are meaningful for the classification or not [6]. Linear Discriminant Analysis (LDA) is an alternate dimensionality reduction technique that tries to extract a feature subspace which maximises the separation between classes and deals directly with discrimination between classes [7]. Therefore, LDA is applied over the clustered data to maximise the distance between the cluster centroids. Data extracted corresponding to linear discriminants (LDs) is clustered again using k-means clustering to normalise the original clusters. A notion of the finite state machine is introduced in which system states corresponding to the clusters obtained are defined. State machine diagrams are designed for each year of the original data. Transitions between the cluster labels of consecutive data instances are defined in terms of state transitions.
The proposed visual analytics approach assists plant operators to understand and visualise a large time-series data using scatter plots and state machine diagrams. While handling long time series data it becomes complex to identify patterns or behaviour of the time-series signal considering several features of the data. However, applying feature extraction and dimensionality reduction techniques such as PCA/LDA provides an insight of the data considering principal features of the data. It enhances the capability to find the common patterns by extracting only key features that carry most of the information present in the data. Looking for such patterns directly by observing the time series plot is not trivial and requires great manual effort and expertise especially for large datasets. State machines formulated with the help of clustered data help in visualising both the local as well as global behaviour of plant data over the years. Locally, data for each year is visualised, and its state transitions overall illustrate the nature of that particular year. However, on a global level, data for almost one decade is visualised and transitions between different years represent the change in operation of the plant over the years.
Overall, the contributions made in this study are: 1.
Outline a visual analytics approach that facilitates feature exploration, visual analysis, pattern discovery and effective modelling of the NPP time-series data.

2.
Use Finite State Machine representation to visualise and model the working principle of NPP.

3.
Compare the behaviour of NPP over the years with the help of state machine diagrams and introduce the concept of normal and abnormal plant operations.
In the following sections, this paper discusses the related work done, defines the system model of NPP along with the detailed methodology, and proposes the pipelined architecture of the visual analytics approach. Finally, discussions are made over the observations after applying this approach to the NPP data.

Related Work
This section focuses on the literature that involves work done pertinent to the current study. Related work is divided into four subsections namely: (1) pattern discovery using k-means clustering, (2) dimensionality reduction techniques, (3) modelling problems using a finite state machine, (4) Nuclear Power Plant Data Modelling. An overview of each category along with some work related to it is discussed below.

Pattern Discovery Using k-Means Clustering
Pattern discovery is a discipline used to extract interesting patterns from the raw data. It helps in properly distinguishing the data instances with common properties. In large number of scenarios, time-series data is unlabelled, hence, using unsupervised learning methods such as k-means clustering help in identifying concrete clusters with similar characteristics as well as detecting outliers whose traits deviate from the other data samples.
Mohammed et al. [4] demonstrate an approach that aids in identification of patterns, clusters, and outliers in large time-series dataset. Deep Convolutional Auto-Encoder (DCAE) along with k-means clustering is applied over multivariate time series data to obtain the clusters and outliers (anomalies). A greedy version of the k-means clustering algorithm is introduced in [8] for pattern discovery of healthcare data. It emphasises a greedy approach which produces precursory centroids and utilises at most k passes over the dataset to calibrate these center points. Network Data Mining approach presented in [9] uses k-means clustering to cluster the network data as normal and anomalous traffic for real-time intrusion detection in the network.

Dimensionality Reduction Techniques
"The curse of dimensionality" refers to the issues which arise while working with data of high dimensions [10]. Dimensionality reduction is a cure which enhances the capability of extracting patterns in data [11].
Danyang, Tian [12] discusses the importance of Principal Component Analysis (PCA) as a dimensionality reduction technique in the clustering of time series data. It is shown that applying PCA reduces the time complexity of the clustering method. Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction technique which focuses on maximising the separation between the classes [7]. A study on combining the principal components (PCs) from PCA and linear discriminants (LDs) from LDA is presented in [13]. Discriminating power of LDA is improved with a combination of PCA as feature extraction for supervised learning. Martin et al. [5] introduce a cluster vector approach in which PCA followed by LDA is used for the clustering of a biological sample. PCA ensures dimensionality reduction, whereas LDA reveals clusters. PCA-k-means approach is investigated in [14] for the clustering of high dimensional and overlapping signals. The approach effec-tively reduces the dimension and clusters the signals accurately. PCA is commonly used for preprocessing the data before training neural networks to reduce the computational time and complexity of the training process [15]. Please refer to Appendixes A.1-A.3 for the mathematical explanation of PCA, LDA, and k-means clustering techniques, respectively.

Modelling Problems Using a Finite State Machine
A Finite State Machine (FSM) is a model of computation which is often used for simulation of logic or to control the flow of execution of a system. FSMs are used to model problems of several domains, including artificial intelligence, machine learning, and natural language processing [16].
Multifault prediction for industrial processes using finite state machines is studied in [17]. It is observed that state machines in conjunction with relevance vector machine (RVM) give better prediction accuracy. The problem of handwriting recognition is addressed in [18] using finite state machines. Signatures for handwritten activities are computed which are then used to build finite state machine recognisers. F1 score of 0.7 and above is achieved for each activity by the recognisers.

Nuclear Power Plant Data Modelling
To ensure safe and secure operation of Nuclear Power Plant (NPP), proper modelling and analysis of the system parameters is essential. Several research studies have been performed where machine learning techniques have been applied on the time series data collected from the plant sensors to model the plant behaviour over the years.
S. N. Ahsan et al. [19] demonstrate the application of supervised machine learning techniques, namely Binary Tree and Artificial Neural Networks, to predict faults in the Primary Heat Transfer system of a Canada Deuterium Uranium type reactor. Here, feature extraction was done from 11 reactor parameters that were related to the coolant rate flow, coolant header temperature, and the neutron power rate, and these features were then labelled with running, transient, and shutdown plant status. Performance metrics precision, recall, and F-Measure were used for model evaluation and the highest accuracy determined for the prediction of system abnormality was 99%. Hardik A. Gohel, et al. [20] have aimed their research at the predictive maintenance of NPP infrastructure by using machine learning algorithms, like Support Vector Machine and Logistic Regression. Turbofan engine degradation simulation dataset was used for implementation and the engine failures for different number of cycles were predicted. The system gave 95% prediction accuracy. Jianping Ma et al. [21] have proposed a semisupervised classification method for fault diagnosis. In this paper, labelled data was generated by creating different fault scenarios on a desktop NPP simulator and a physical NPP simulator. For training the semisupervised model, new unlabelled data collected from the NPP was integrated with the generated labelled data. A. Chandrakar et al. [22] focused on the uncertainty propagation and reliability estimation of passive safety systems in a nuclear power plant. The paper used an autoregressive integrated moving average (ARIMA) model to fit the atmospheric temperature, which is an independent process parameter in a passive safety system. The paper also proposed a method to generate synthetic time-series data from the modelled data with the same mean and variance. The proposed methodology comprised of three stages, namely, model identification, parameter estimation, and model checking. In model identification, stationarity tests like Mann-Kendal tests are considered. If data was found to be nonstationary, then difference transformation was done to convert it into a stationary dataset. Next, to determine stationarity, use of autocorrelation/partial autocorrelation (ACF/PACF) plots were discussed. For the selection of the best model, parameters like Akaike information criterion (AIC), Bayesian information criterion (BIC), and Maximum Likelihood Rule (ML Rule) were considered. Next, for parameter estimation, ML estimation was proposed. For model checking, residual significances were determined by using histograms, Quantile-Quantile plots, and Anderson Darling normality test. Periodicity significances were also determined by using ACF/PACF plots of the residuals. For evaluation of the proposed model, the paper used a time-series atmospheric temperature data of Chittaugarh district in India.
However, the above ARIMA modelling technique is valid only if the data belongs to a univariate process. In our case, the collected data is multivariate in nature and the parameters are highly correlated. Therefore, we do not perform their kind of analysis on our datasets. Furthermore, the aforementioned papers are mainly focused on building models for making predictions related to abnormal plant operations and have therefore reported the prediction accuracies of their evaluated models. These papers have not prioritised the visualisation and pattern analysis of the data. Therefore, the primary objective of our work is to perform a highlevel visualisation of the multivariate datasets to have a broad understanding of the operation of a Nuclear Power Plant. The proposed methodology of combining machine learning techniques with a finite state representation is expected to provide a clearer visualisation of the patterns in the Nuclear Power Plant behaviour. Building models for classification of behaviours and making real-time forecasts of plant parameters is not considered under the current scope of this paper.

System Model
Nuclear power plants (NPPs) work on the principle of harnessing thermal energy produced by splitting mostly uranium atoms. Uranium-fuelled nuclear power is a clean and efficient way of boiling water to produce steam, which drives the turbine generators and converts the mechanical energy into useful electrical energy. Following are the design features that make it possible for a nuclear power plant to operate safely and economically.
• Fuel: The most commonly used nuclear fuels around the world are isotopes of Uranium. • Core: The core of a reactor contains uranium fuel. It is kept in a horizontal or vertical cylindrical tank known as calandria based on a heavy water reactor or light water reactor. Calandria comprises of concentric fuel channels that run from its one end to the other [23]. • Control rods: Based on neutron-absorbing materials, these rods are inserted or withdrawn from the core to control the rate of reaction [23]. • Moderator: Nuclear fuels such as isotopes of uranium requires a moderator to slow down neutrons in order to absorb them. Depending on the reactor, the moderator can be ordinary water (for light water reactors) or deuterium oxide (for heavy water reactors) [23]. • Coolant: It is used to maintain the reactor core temperature at a safe operating temperature. It also helps in reactor cooldown to avoid a meltdown that can halt the production of energy [23]. • Boiler Feed Pump: It increases the pressure of feed water and then moves it to the feed water heaters. • Feed Water Heater: High pressure feed water is preheated to be supplied to the boiler. • Boiler Drum: Boilers are used to produce high pressure steam which further generates electricity. • Steam Generation: The high-pressure water from the reactor cooling circuit transfers heat to the feed water in the boiler producing steam to drive the turbine. This steam is then transferred to the turbine for driving the generator to produce electrical energy. • Reactor Headers: Several Reactor Inlet Headers (RIH) form a part of the reactor's Primary Heat Transport (PHT) System. The PHT is a closed circulating system that maintains the flow from Inlet Headers through the reactor to the Outlet Header (ROH). The basic arrangement of reactor headers is shown in Figure 1. Figure 2 shows the block diagram for a typical Heavy Water NPP. The nuclear core heats the heavy water under pressure, which is then passed through the boiler and used to transform light water, at a lower pressure, from the feed water heaters to steam. The heavy water also serves as a moderator to make the nuclear core more reactive. Water in the boiler is then converted into steam. This steam is then passed to turbines using different channels. In the steam turbine, the energy from the steam moves the turbines. In the later stages, the steam which has been passed through the turbine enters the condenser where it condenses back to its original state and is circulated back to the boiler for further cycles [24].

Methodology Overview
This section discusses the methodology adopted to implement visual analytics approach. Figure 3 shows the methodology pipeline which helps plant operators to understand, visualise, explore, and model the Nuclear Power Plant (NPP) data. The pipeline is divided into two parts based on the operation to be performed on the data. Figure 3a represents the pipeline architecture responsible for carrying out clustering of the data. Clustered representation of data helps in better visualisation and outlier analysis. In order to explore patterns and model the NPP data, clustered representation is generalised as finite state machines. Transformation of the same is illustrated in Figure 3b. The steps and operations associated with both the pipelines are discussed below.

Clustering Pipeline
This pipeline aims at extracting meaningful information from the raw data obtained from the NPP. Data in the time domain is transformed to the feature domain by computing its statistical features. Principal Component Analysis(PCA) followed by Linear Discriminant Analysis (LDA) is used to extract features with maximum variance and highest class information. Extracted features are then clustered together based on shared properties with the help of k-means clustering algorithm. Clusters obtained are visualised over scatter plots for pattern discovery and identification of outliers. Algorithm 1 outlines the operations carried out to obtain clusters of the raw data.

Data Preprocessing
The data extracted from the sensors installed at the NPP is prone to noise, outliers, missing data during shutdown periods. Therefore, data cleaning is performed before any other processing. Missing data is imputed with suitable values; for instance, during shutdown of NPP, the power measurements of the plant are set to zero. The measurements from the sensors are captured on different scales, for instance, Reactor Inlet Coolant Temperature is measured in°C and Reactor Power is measured in %. Therefore, Min-Max scaling is performed to bring all parameters on the same scale (into the range [0-1]) for easy analysis and understanding.
x denotes the NPP parameter, x i : the value of x at ith time step, x i : the scaled value of x i . After data cleaning and scaling, large time-series data is split into smaller chunks of months and years to avoid complexity in the processing. Data Preprocessing is performed in lines 2 and 3 of Algorithm 1.

Feature Extraction
The preprocessed data obtained from the last step is used to compute the statistical features of each chunk of the large time-series data (line 4 in Algorithm 1). This converts the data in the time domain to the feature domain. Thirteen basic analytical features are computed from the data which are sum, mean, median, minimum value, maximum value, standard deviation, variance, standard error of the mean, skewness, kurtosis, and sample quantile at 25%, 50%, and 75%. The mean of the data gives the arithmetic average of the data and is calculated by dividing the sum of the data by the total number of data collected. The median represents the middle element of the data after it is sorted. Mean and median of a data help to understand the central tendency of the data. The variance of the data measures the variability in the data and the square root of the variance gives the standard deviation which represents the factor by which a particular observation is close to the mean of the data. On dividing the standard deviation by the square root of the total number of data observations, standard error of the mean is calculated, which represents the difference between the calculated mean and the true mean of a sample. Moreover, the calculation of quantiles helps to divide the data into three equal probability areas. Hence, it can be said that the variance, standard deviation, standard error of mean, and quantiles help to analyse the dispersion tendency of the data. Furthermore, Skewness and kurtosis help to understand the deviation of data from normal distribution. Skewness determines the asymmetrical nature of the data on the basis of the number of samples that are greater/less than the mean value, and Kurtosis determines the degree to which the outliers are extreme in the data distribution hence, measuring the sharpness of the peaks in the data. These features are then fed to PCA algorithm for dimensionality reduction, thus extracting only principal features with maximum variance. The primary purpose of using PCA is to reduce the dimensionality while retaining most of the information present in the data [26]. In lines 5-8 of Algorithm 1, PCA is applied to extract "i" number of features which are capable of retaining at least δ amount of the original information.

Unsupervised Learning
After obtaining the principal components which retain most of the data's information, they are grouped based on their similar properties. k initial means are randomly generated such that k clusters are formed after every observation is assigned to a cluster. An observation is assigned to a cluster with the nearest mean value (based on euclidean distance). However, what value of k should be selected at the beginning of the clustering algorithm is a major challenge. Silhouette analysis is used to measure the separation between resulting clusters. Silhouette coefficients are values ranging from [−1, 1] describing how distant an observation is from neighbouring clusters. Refer Appendix A.4 for the mathematical function involved in silhouette value calculation. Positive Silhouette values describe good clustering whereas, negative values indicate improper clustering. Therefore, the value of k with maximum silhouette score is selected for the clustering algorithm. Lines 9 and 10 in Algorithm 1 perform clustering of principal components. Line 11, appends the clusters labels generated with the preprocessed data obtained from the first step of this pipeline.

Normalising Clusters
An issue associated with PCA is that it lacks the interpretability of the resulting clusters [5]. PCA assigns high weights to the features with more significant variabilities irrespective of whether they are meaningful for the classification or not [6]. Linear Discriminant Analysis (LDA) is an alternate dimensionality reduction technique which tries to extract a feature subspace that maximizes the separation between classes and deals directly with discrimination between classes [7]. Therefore, LDA is applied over the data collected from the last step to normalise cluster labels such that separation between the clusters is maximised (Line 12 of Algorithm 1). Finally, k-means clustering is applied over the LDs extracted from LDA to generate normalised cluster labels (Line 13) .

State Machine Pipeline
Clustered representation helps to visualise data instances grouped together into clusters based on their similar nature. However, representing this graphically does not show anything about the interaction between clusters, i.e., during the working of NPP over years, how transitions occur from one cluster to another. Concept of finite state machines is introduced to represent the cluster transitions and therefore, model the working of NPP. Algorithm 2 illustrates the operations performed to build state machine diagrams from clustered data.

Cluster Analysis
Data with normalised cluster labels (χ) is filtered on the basis of years and is then grouped together according to the cluster labels. Certain attributes for each grouped cluster are defined which uniquely identifies that cluster. Residence time of a cluster calculates what proportion of the year's data is assigned to it. Mean value of a cluster simply evaluates the mean of all data instances belonging to the cluster. To map the cluster transitions, change in values (∆) of consecutive data pairs of an year is calculated along with the change in cluster labels (E). Cluster analysis is done in lines 1-10 of Algorithm 2.

Finite State Machine
A system state that defines the state of the NPP's operation is built for each corresponding cluster. System states are then labelled with the attributes calculated for each cluster (V). These attributes help in describing the behaviour of the states based on their values. Now, state transitions are designed in accordance with the cluster transitions. ∆ change values along with the transitioning cluster labels (E) between consecutive data instances of a year are used to draw state transitions between the system states in a state machine diagram (Lines 11-13 of Algorithm 2).

Visualisation of NPP's Power Data
The data corresponding to five parameters of NPP is measured daily for 9 years (2007-2015). Reactor Power is a key parameter for economic operation of NPP, hence, in this section Power data is passed through methodology pipeline to get better visualisation, explore any patterns in Power levels and model the working of NPP with finite state machine.
Before passing the power data into the proposed pipelines, the nature of the data is first analysed. Here, a simple time series analysis is considered as a baseline analogy. The sensory data measurements of all system parameters are scaled between the value of 0 and 1 and are plotted for a period of one year. On observing Figure 4, it is found that certain parameters have similar variation patterns with respect to time. For instance, Main Steam Line Pressure and Reactor Inlet Header Temperature (RIHT) seem to be correlated. However, this correlation cannot be modelled by using linear methods. The time series linear power data is then decomposed into its components to look for any trend or seasonality present in it. As seen in Figure 5, no general trend or proper seasonality could be captured through the decomposition. Next, an Augmented Dickey Fuller (ADF) Test was done on the data to check if the data is stationary or nonstationary. As per the test if the test statistic value is greater than any of the critical values, the null hypothesis is rejected and state time series is not stationary. Similarly, if p value is greater than 0.05, then the null hypothesis is rejected with enough evidence. In this case, it is observed in Table 1 that the ADF statistic value is larger than all critical values and p-value is greater than 0.05, and hence the null hypothesis is rejected and reactor power data is declared nonstationary.   Power data is now normalised (scaled), preprocessed for any missing values and is split into months and years. For further analysis, data is chunked into calendar-months as months are used as a common line of reference for a number of management actions like Process and Quality control, budgetary analysis, and Utility studies. Next, Statistical features are calculated for each month of every year. The resultant feature matrix has a high number of dimensions; therefore, it cannot be directly visualised graphically.
On the application of PCA over feature matrix it is found that only three principal components are sufficient to represent the original matrix with 90% explained variance. Principal components are passed to k-means clustering algorithm to get preliminary clusters. It is found that k = 4 gives the best silhouette score, hence, the algorithm clusters the principal components into four clusters. These cluster labels are appended with feature matrix and the labelled feature matrix is passed to LDA algorithm so that linear discriminants (LDs) extracted have ability to discriminate cluster labels. Three LDs are extracted from the labelled feature matrix using LDA. On application of k-means clustering over LDs, four concrete, distinct clusters are found as shown in Figure 6. NPP over nine years tends to generate power at four levels in different states. Hence, clustered representation of Power data facilitates visual analysis and pattern discovery that cannot be simply achieved by observing time-series plots. However, to model how NPP makes transitions from one power level (cluster) to another, finite state machines are built for every year. Power data is grouped on the basis of cluster labels for each year and two primary attributes namely, mean power capacity of NPP and residence time are computed for all groups (clusters). Algorithm 2 is implemented to build state machine diagrams for all the years.   Table 2 computes the state transition rules for the state machine diagram. The table shows 11 month pairs, which are labelled as 1,2,3,4,5,6,7,8,9,10,11 in Figure 7. The mean of the powers of each month is subtracted from the mean of the powers of the previous month to calculate the amount of power transition (∆) between 2 consecutive months. For example, for the pair January-February, ∆ = mean power (February) -mean power (January) = −0.047 that implies that power in Feb is less than power in Jan but by a very small amount. Therefore, the transition occurs from state 1-1 and both Jan and Feb lie in the system state 1 (High). Similarly for the next pair, Feb-Mar, the ∆ is again very small. So, the transition again occurs between 1-1 and Mar also lies in state 1 (High). Next, for Mar-Apr, the value of ∆ = −33.489, which implies that mean power of Mar is greater than mean power of Apr by a significant amount. Therefore, state transition from 1-2 (High to Medium) takes place and Apr comes under state 2 which is Medium. In a similar way, transitions for all the month pairs are calculated subsequently. Figure 7a represents the complete finite state machine with all system states and state transitions, whereas, for better visualisation and to model high level behaviour of NPP a simplified version of state machine is represented in Figure 7b. The strip representation of state machine focuses only on the transitions that occur from one state to another and ignores self transitions. Therefore, labels 1 and 2 that include the months Jan-Feb and Feb-Mar are not shown. On observing the mean power capacity levels of system states in Figure 7, it can be stated that NPP operates in four distinct states namely, high power, medium power, low power, and shutdown (zero power) over the year. State transitions illustrate how each of these states communicate with each other and therefore model the working flow of the NPP.  Similarly, strip representations of state machines are defined for the other years and behaviour of NPP is modelled over the years. Figures 8 and 9 represent the strip visualisations from 2007 to 2015 based on time of residence and mean power, respectively. Each state of the strip is labelled with an order pair (a,b) where a: residence time of the state and b: mean power capacity of the state. It is observed that in all years, the representation and definition of four states remain consistent, thus validating the clustering algorithm. High power state always tend to have high mean power capacity and shutdown periods consistently show zero mean power across all the years. Similar patterns are observed for medium power and low power states as well in Figures 8 and 9. Therefore, the strip representation allows better visualisation over longer periods as well as extracts meaningful patterns out of the data. On closer analysis of this representation, it is found that there is a shutdown for one month (8.3% of year) every year from 2007 to 2011. However, in 2012, NPP is shut down for three months (33% of year); in addition, there is no shutdown period for years 2013 and 2015. Hence, the anomalous behaviour of NPP over the years is also captured by the strip representation of state machines. Figure 10 shows the finite machine representation that models the behaviour of NPP across all the years. State transitions are defined for both within year as well as across year transitions. Each state in Figure 10 is represented as combination of character (power level) and digits (year). In the normal operation, plant tends to operate at high power levels for the largest proportion of time and is in shutdown phase for the least duration of time. In addition, transition of power level takes from high to medium, medium to low, and finally from low to shut down. Then, gradually power levels are restored in a reverse order. However, in some years (2009-2010), plant deviates from its normal operational pattern and makes transitions directly from high power levels to low power levels. When the same happens, plant operates in high power levels for comparatively shorter period and tends to operate in suboptimal power range for the most of time in the remaining year. Therefore, with finite state machine representation of data, it is possible to capture the anomalous behaviour of NPP.

Results and Discussion
Feature extraction followed by k-means clustering effectively clusters the data into separate groups each corresponding to a unique state of the plant. This cannot be modelled by using simple time series statistical techniques. Moreover, the quality of the clustering can be validated both qualitatively as well as quantitatively. State machines formulated with the help of clustered data help in visualising both the local as well as global behaviour of plant data over the years. Locally, data for each year is visualised and its state transitions overall illustrate the nature of that particular year. However, on a global level, data for almost a whole decade is visualised and transitions between different years illustrate how plant operation changes over the years. The analysis of the change in plant behaviour over several months and years helps to extract interesting patterns in the plant behaviour and identify the demarcations between the normal and abnormal operations of the plant.
The whole proposed pipelined architecture is data insensitive, i.e., it can be applied to any parameter of the plant and the behaviour of that parameter can be visualised independently. Most importantly, all the transformations in the model pipeline are automatic with negligible manual input; therefore, this methodology can be turned into an online system where time-series data is fed into the pipeline and corresponding visualisation of the parameters can be done on the go. Dashboards for the physical system can be designed that provide with the visual representation of the system behaviour over the time along with the feedback tool which raises a flag in case of any abnormal system conditions.

Conclusions
Modelling the operation of critical cyberphysical systems, such as nuclear power plants, is a challenging and yet important task. In this paper, a new data analytics technique is proposed to assist the plant operators to infer insights from raw complex data. The work considers extracting features from raw time-series signals, then applying PCA and LDA to reduce the dimensionality of the data. This allows the visualisation of information in a compressed format, thus facilitating the task of identifying patterns from the data. Furthermore, the proposed approach models the behaviour of the plants over the years using finite state machines. Such a modelling allows the operators and decision makers to visualise the behaviour of the plants on local and global scales.
In the future, we will extend this line of work by considering physics-based models of Nuclear Power Plants as baseline methodologies and compare our proposed data-driven model with them. We will further explore forecasting technologies to make predictions and compare the prediction accuracy with the baseline models.
Author Contributions: All the authors have equally contributed to this paper. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
There is no conflict of interest in performing this research and publishing the results. Now, M eigenvectors U M corresponding to the maximum eigenvalues are considered for the final transformation since higher eigenvalues retain higher variance.

Appendix A.2. Linear Discriminant Analysis
Unlike PCA, linear discriminant analysis (LDA) is a supervised method in which each point in the data sample belongs to a particular category or class.
Let us consider a data sample X of dimensions d × n, where d is the number features and n is the number of samples, and let it be categorised into K classes. Then the projection of data needs to be done into a (K − 1) « d dimensional space. This implies that a projection matrix W of dimensions d × (K − 1) needs to be calculated in order to separate the different classes well. The transformed data Y with dimensions (K − 1) × n can be written as follows: The main aim of LDA is to find W such that the between-class scatter (S B ) is maximised while the within-class scatter (S W ) is kept constant. Now, the scatter matrix for the mth class can be defined as: represents the data of the ith feature in the mth class and µ (m) represents the mean of that feature. The within-class scatter matrix S W is then the summation of the scatter matrices of all classes: The between-class scatter matrix S B considers the distance between each and every class and is represented as follows: where Q (m) is the number of samples present in the mth class and µ is the mean of the data matrix X. Next, let's define S T = S B + S W .
To satisfy the goal of LDA, S B needs to be maximised with S W S T being constant. By solving the optimisation problem, projection matrix W can be obtained with K-1 eigenvectors of S −1 W S B . Nonzero eigenvalues corresponding to the eigenvectors are also obtained. The final transformation can then be done by choosing the highest eigenvalues.

Appendix A.3. K-Means Clustering
This is an unsupervised method in which data is separated into different clusters based on similarity. Each cluster has a centroid. K-means algorithm is designed to minimise the distance of each and every data point from the centroids so that the data points are mapped to the right clusters.
Let us consider a set of n data points x 1 , x 2 , ...., x n . Initially, K random data points are chosen as the centroids of K-clusters. Then, euclidean distance is calculated between each data point and the K-centroids. The data points are then assigned to their nearest clusters based on the calculated distance. If each centroid is denoted by c j , then the assignment of each data point to a particular cluster can be explained by the following function where the distance between the centroids and data points is minimised: arg min c j distance(c j , x) 2 (A11) In order to determine the optimal centre for the data points, average of all the data points belonging to a particular cluster is calculated so that a new centroid is obtained.
In the above equation, c j is the updated centroid, C i is the the total number of data points belonging to the ith cluster, and x i represents the data points present in that ith cluster. After a certain number of iterations, the centroid is observed to be same, thus implying that the data points have been clustered successfully with the least possible variance between them.

Appendix A.4. Silhouette Analysis
Silhouette analysis is a method used to measure the similarity of a data point to other data points present in the same cluster or in the neighbouring clusters. This method is based on the determination of tightness and separation of data points in the clusters.
The range of silhouette value is [−1,1]. Higher values indicate that the data point is assigned to the right cluster and is located away from its neighbouring clusters. Therefore, a higher silhouette value is desirable.
Silhouette value for a data point i present in cluster C i is defined as follows: Here a i is the mean distance of data point i compared to all other points in C i , and b i is the minimum mean distance of i from all points in other neighbouring clusters.