Application of Electrical Network Frequency of Digital Recordings for Location-Stamp Veriﬁcation

: Electrical network frequency (ENF) is a signature of a power distribution grid. It represents the deviation from the nominal frequency (50 or 60 Hz) of a power system network. The variations in ENF sequences within a grid are subject to load ﬂuctuations within that particular grid. These ENF variations are inherently located in a multimedia signal, which is recorded close to the grid or directly from the mains power line. Thus, the speciﬁc location of a recording can be identiﬁed by analyzing the ENF sequences of the multimedia signal in absence of the concurrent power signal. In this article, a novel approach to location-stamp authentication based on ENF sequences of digital recordings is presented. ENF patterns are extracted from a number of power and audio signals recorded in different grid locations across the world. The extracted ENF signals are decomposed into low outliers and high outliers frequency segments and potential feature vectors are determined for these ENF segments by statistical and signal processing analysis. Then, a multi-class support vector machine (SVM) classiﬁcation model is developed to verify the location-stamp information of the recordings. The performance evaluations corroborate the efﬁcacy of the proposed framework.


Introduction
Power system frequency yields to instantaneous changes in accordance with load variations and control methodologies.Electrical network frequency (ENF) is the base frequency (50 or 60 Hz) of a power distribution network and ENF sequences are generated due to the fluctuations in frequency from the nominal ENF value.These ENF sequences contain recognizable patterns of a power grid, since the ENF variations are considered to be uniform for a particular grid and are separable from grid-to-grid observation.When a multimedia signal like an audio signal is recorded close to a grid or directly from the power supply line, power signatures of that specific grid location are embedded into the audio recording due to the electromagnetic interference (EMI).Thereby, the audio recording can be applied for location forensic analysis in the situation when the concurrent power recording is absent.
Since ENF sequences carry the power signatures of a distribution grid, audio authenticity can be tested for location-stamp verification by extracting and analyzing the ENF signals of the recordings.Location-stamp investigation yields to a significant tool for anti-terrorist drives and for preventing and prosecuting cyber crimes.
In this article, ENF signals are extracted from a number of digital recordings of power and audio signals captured in different grid locations and then a classification model is developed based on some potential feature vectors of the estimated ENF patterns.A comprehensive ENF analysis of digital audio recordings for forensics and security applications is articulated in Reference [1].A thorough study on the factors affecting the capture of ENF sequences in audio recordings is documented in Reference [2].However, a number of novel methodologies in regard to ENF extraction from power and multimedia (audio or video) signals are reported in References [3][4][5][6][7][8][9][10][11][12].In this framework, Root MUSIC algorithm [5] is applied to extract ENF sequences from the digital recordings.Root MUSIC algorithm provides quite a high precision with a moderate computational cost to determine ENF signals.
As an intrinsic investigation of ENF signals, these sequences are analyzed based on the fluctuation trends with respect to the nominal ENF values.In this follow-up, the extracted ENF signals are decomposed into low outliers and high outliers frequency segments, where low outliers segments show less fluctuations in frequency value from the nominal ENF than the high outliers segments.Then, applying statistics and signal processing methods a number of potential feature components are determined separately from the low outliers and high outliers ENF signals.A multi-class support vector machine (SVM) classifier is developed to locate the sources of recordings.However, a number of ENF based location forensics analyses are proposed in References [13][14][15][16][17][18][19][20][21][22].
The reported ENF based location-stamp verification framework presents a decomposition of ENF sequences of digital power and audio signals considering fluctuation trends.Then potential features are experimentally extracted from the decomposed ENF sequences and to verify the quality of the features, Euclidean distance matrices are applied.The power and multimedia signals localization methods reported in the previous works do not consider the implicit analysis based on decomposition of the extracted ENF sequences in terms of frequency fluctuation trends.The presented outliers determination and potential features extraction from the decomposed ENF signals yield better understanding of frequency behavior of power and multimedia recordings collected from different grid locations around the world.In addition, an intrinsic experimental approach is carried out to extract potential feature vectors, which can be the basis for an efficient classification model.The quality of the proposed features is ensured as well.Thereby, the features extraction process represents a unique premise of the presented work.Moreover, based on the extracted feature vectors, a novel multi-class SVM classification model is developed, which incorporates custom and efficient training and testing algorithms and decision making structures to classify the grid locations accurately.The classification models and decision making structures represent another novel premise of the framework.For performance analysis, power and audio recordings are localized separately.Also, the system consisting of both types of recordings is tested for location authenticity.The proposed system yields to a very efficacious location-stamp verification framework, which presents location authenticities of both power and audio recordings comparable with and in most of cases superior to those reported in the earlier works.Moreover, along with reliability for location forensics analysis the proposed system yields to a computationally cost-effective framework for cyber-security applications.
The proposed location forensic application system is developed and tested in MATLAB R and the training and testing accuracies to locate the regions of the digital (power + audio) recordings are obtained as 97.42% and 92.00% respectively.
The remainder of the manuscript proceeds as follows.Section 2 describes the ENF extraction method and ENF database formation process.Section 3 explains the extracted features from the ENF signals, developed classification model and its performance evaluation.Section 4 concludes the article.

Enf Extraction and Database Formation Based on Location Specific Recordings
The training dataset contains power and audio recordings collected from 9 different grids [23].The recordings in the training dataset are sampled at 1 kHz.Three grids operate at 60 Hz ENF, whereas six grids operate at 50 Hz ENF.Table 1 shows the grid locations and the associated grid names of the training dataset.In this section, separation of the recordings in terms of power and audio type, extraction of ENF signals and formation of the ENF database are articulated.

Separation of Power and Audio Recordings
The sequential approaches to separate the digital recordings as power and audio signals are described as follows: Step-1: Initially a raw power or audio recording is segmented into a number of time frames.The window length of each time frame is empirically taken as 5 min.The dominant or center frequency F c of each signal component is determined using short time Fourier Transform (STFT).
Step-2: Then, the signal to noise ratios (SNRs) are evaluated considering [F c − F b , F c + F b ] as the power band of each signal.Here F b = 0.5 Hz is the step size of variation from the base frequency.Except this one, other bands are considered as noisy segments.The band power values are estimated using Welch power spectrum method.From the obtained SNR values, it is decided whether the recording is a power signal or an audio one.The underlying consideration is that at the nominal frequency or harmonics, the SNR values of power signals are greater than those of the audio signals.

Extraction of Enf Sequences
After separating the power and audio signals, ENF sequences are extracted from the recorded signals as follows:

Enf Sequences of Power Recordings
Step-1: Each 5 min long power signal is processed through a 2nd order Butterworth band-pass filter designed with a frequency band of [40, 70] Hz.The filtered signal is then segmented into a number of time frames for ENF estimation.In this work, each ENF time frame is set as 5 s long empirically.
Step-2: Root MUSIC algorithm is applied to extract the ENF sequence of each 5 s long signal segment.Thus, the power signatures embedded in a power recording are estimated.

Enf Sequences of Audio Recordings
Step-1: Each 5 min long audio signal is processed through a 2nd order Butterworth band-pass filter designed with a frequency band of [F c − F b , F c + F b ] Hz.The filtered signal is segmented into 15 s long time frames, which overlap with each of the corresponding previous frames for 10 s.Thereby, It can be implied that the overlapping of a time frame with its previous frame is about 66.67%.
Step-2: Root MUSIC algorithm is applied to extract the ENF sequence of each 5 s long signal segment.Thus, the power signatures embedded in an audio recording are estimated.
Figures 1-4 present the sample ENF sequences extracted from power and audio recordings from different grid locations.The variations in frequency depend on the load fluctuations and power system control methodologies.Loads change from time to time during operation and thus the system frequency experiences variations accordingly.If the control techniques are effective and reliable, there are less fluctuations in frequency and less instabilities are observed in a power system network.Thus the ENF patterns can be viewed as a stability index of a particular power distribution grid.From Figure 1 it can be implied that the approximated maximum ENF variations from the nominal 60 Hz value for power recordings collected from Texas, Eastern U.S. and Western U. From the above analysis based on ENF variations it can be yielded that the 60 Hz grids have better controlled and stable power systems to mitigate frequency fluctuations in comparison with the 50 Hz grids.However, the three 60 Hz grid locations are mainly in United States of America.Among the six 50 Hz power grids, France has the least ENF variations.

Formation of Enf Database
After estimating the ENF signals, a database containing the extracted values is developed following a sequential process: Step-1: After extracting the ENF sequences from the power and audio recordings, 50 Hz and 60 Hz ENF components are separated by measuring the mean values.Thus, for each power and audio recording 50 Hz and 60 Hz ENF sequences are recognized separately.
Step-2: Each ENF pattern can be segmented into two sequences-low outliers frequency (LOF) and high outliers frequency (HOF), where LOF sequences show less fluctuations in frequency than HOF sequences.For each case, LOF sequence is determined by passing the original ENF signal through a smoothening filter and HOF sequence is determined by subtracting the LOF sequence from the ENF signal [5].HOF sequence can be determined as: Here f (n) is the ENF value at sample n, f hp (n) is the HOF value, f l p (n) is the LOF value, w(•) and M are the smoothening filter coefficient and order respectively.Thus, for 50 Hz and 60 Hz power and audio recordings separate databases containing LOF and HOF ENF sequences are developed.

Features Extraction, Classification Model and Performance Analysis
This section describes the extracted feature vectors, developed multi-class classification model and its performance analysis for the proposed ENF based location forensic application.

Analysis of Extracted Feature Vectors
Statistics and signal processing techniques are applied to extract potential feature vectors from the ENF sequences.Table 2 presents the extracted feature vectors from the HOF and LOF segments of 60 Hz and 50 Hz power and audio ENF signals.From analysis, mean and median are proved to be good candidates for potential feature functions.Waveform length is another good candidate to extract a potential feature vector F v,wl , which is measured as Here f [•] is an ENF sequence and N is the sequence length.However, from analysis, crest factor (CF) and interquartile range (IQR) are found to be potential feature functions.For an ENF signal, CF is measured as the ratio of the peak value to the root mean square (rms) value.IQR refers to the difference between the ENF value below which lie 25% of the entire sequence data and that below which lie 75% of the entire sequence data.IQR analyzes the ENF sequence in terms of quartiles.Quartiles divide the data into four equal parts.The values that divide each part are called the first (Q 1 ), second (Q 2 ) and third (Q 3 ) quartiles.Q 1 is the middle value of the first half of the sequence.Q 2 is the median value and Q 3 is the middle value of the second half of the sequence.IQR is equal to Another potential feature vector F v,ma is derived from the modified mean absolute value function.It is defined as Welch power spectrum method is used to measure the power spectral density of an ENF sequence, which is proved to be a good feature component.However, 4th order autoregressive AR(4) model of an ENF sequence can be expressed as Here G 1 -G 4 are the AR coefficients and H is the final prediction error (the variance estimate of the white noise input to the AR model).In this work, AR parameters are estimated using Burg method, where G 2 and log of H are analyzed as potential feature vectors.However, no potential feature component is extracted from LOF 50 Hz power, HOF 60 Hz audio and LOF 50 Hz audio ENF segments.
Experiments are conducted for extracting feature vectors from the ENF signals.Then the most promising candidates for final features are selected by analyzing the Euclidean distance between each pair of features.The higher distance value of a particular feature from other features makes it a better feature to be selected.The concept and properties of Euclidean distance matrices are reported in Reference [24].The Euclidean distance matrices of the selected features for 60 Hz power, 60 Hz audio, 50 Hz power and 50 Hz audio ENF sequences are presented in Tables 3-6 respectively.All other features those are extracted in the experimental analysis such as 1st and 3rd coefficients of AR(4) model, kurtosis, mode, average amplitude change, r.m.s.shape factor, impulse factor, 5th and 6th order moments, entropy and so forth have very small and inconsiderable Euclidean distances with respect to the selected features.Therefore, those are not considered for final features selection.Thereby, Tables 3-6 present the quality assurance of the selected feature components for classifying the digital recordings in terms of calculated Euclidean distance values between each pair of the features for 60 Hz and 50 Hz power and audio ENF sequences, respectively.

Classification Model
Based on the extracted feature components, a multi-class SVM classification model is developed.In this work, radial basis function (RBF) kernel is used and the classification technique follows a "one-versus-one" approach.Algorithm 1 describes the training algorithm, whereas Algorithm 2 describes the testing algorithm.

Algorithm 1 Training Algorithm of the SVM Classifier
1: Let X be the raw audio or power signal.The center frequency F c of X is determined applying STFT.
2: X is classified as audio or power using SNR of the signal.
3: The ENF sequence is extracted from X applying Root MUSIC algorithm.The ENF sequence is divided into two segments-one is low outliers frequency (LOF) segment and another is high outliers frequency (HOF) segment.LOF is determined by passing the ENF sequence through a smoothening filter and HOF is determined by subtracting LOF from the original ENF sequence.
4: Feature vectors for both LOF and HOF segments based on the audio or power signal and its center frequency F c are determined.
5: Each feature vector F v is used in SVM classification algorithm as an input vector for training the prediction model.The output of M i is the grid name (GN) associated with the input raw signal Y.
5: Thereby, the input signal is classified as a particular GN.However, if the posterior probability of the predicted GN is less than a specified threshold value (0.7 in this case), then the output of the classifier is N. Class N means that the input is not a sample signal from any of the grids used for training.However, for 50 Hz power dataset, there are six classes corresponding to six grids.Binary SVM classifier is applied in three ways to employ a multiclass classification method.Firstly, six classes are trained and tested following one versus all classification approach.Secondly, all possible combinations of the six classes are used taking two at a time.Then, all fifteen trained models are used for predicting new instances by means of the maximum vote as the decision making criterion.Both the techniques are implemented and the respective training and testing accuracies 50 Hz power and audio data are reported in Table 8.After experimenting with the fifteen trained models individually, it is found that classes B, H, E, D and G are strongly separable from class F. Therefore, five models named as BF, HF, EF, DF and GF are used as a subset of the all fifteen trained models in a hierarchical structure to make the decision making system for 50 Hz power signals.Figure 5 presents the hierarchical structure for 50 Hz power signals.This hierarchical classification structure provides better results than the two conventional ways of multiclass SVM classifier.Table 8 presents the performance comparisons of the classification techniques for 50 Hz power data.A similar hierarchical classification technique is developed and employed for 50 Hz audio data.Table 9 presents the performance comparisons of the classification methods for 50 Hz audio data and Figure 6 presents the hierarchical structure for decision making purpose.However, for three 60 Hz grids, the trained models are developed using possible combinations of the three classes taking two at a time.

Performance Analysis
The ground truths of the testing dataset are available in Reference [23].Table 10 presents the classification accuracies (%) for different systems.From the results it can be observed that all the grids are identified correctly in training for only power signals.In case of testing for only power signals, all 3 grids of 60 Hz are identified correctly with 100% accuracy, whereas for 50 Hz power testing, approximately 95% accuracy is obtained for 6 grids.However, in case of only audio training, approximately 96% accuracy is obtained for 60 Hz grids and more than 89% accuracy is achieved for 50 Hz grids.In case of only audio testing, the correctness of the authenticity for 60 Hz grids is found to be more than 87% and for 50 Hz grids it is more than 83%.For the overall (power + audio) training data, the system is 97.42% accurate, whereas for the overall (power + audio) testing data, the system is 92.00% accurate.Table 11 presents the confusion matrix considering the power and audio ENF testing data.The confusion matrix is computed in terms of testing accuracies (%) of grid classification.The performance evaluations underscore the efficacy and reliability of the proposed ENF based location-stamp authentication system.

Figure 2 .Figure 3 .
Figure 2. Sample ENF sequences extracted from audio recordings from different 60 Hz grid locations.

Figure 4 .
Figure 4. Sample ENF sequences extracted from audio recordings from different 50 Hz grid locations.

Figure 5 .
Figure 5. Proposed hierarchical structure for classifying 50 Hz power data.

Figure 6 .
Figure 6.Proposed hierarchical structure for classifying 50 Hz audio data.
, C, D, E, F, G, H and I are the 9 grids and N denotes 'none of the grids'.

Table 1 .
Grid Location-Stamp Information of the Training Dataset.

Table 2 .
Extracted Feature Vectors from the high outliers frequency (HOF) and low outliers frequency (LOF) Segments of the Original 60 Hz and 50 Hz ENF Sequences.

Table 3 .
Euclidean Distance Matrix Calculated for Selected Features (Table 2) for 60 Hz Power ENF Signals.

Table 6 .
Euclidean Distance Matrix Calculated for Selected Features (Table 2) for 50 Hz Audio ENF Signals.50A 1 : Median.f 50A 2 : Modified Mean Absolute Value.f 50A 3 : 2nd Coefficient G 2 of AR(4) Model.f 50A 4 : Log of Variance of Auto Correlation Sequence. f Table 7presents the trained SVM models.Here the alphabets A, B, C, D, E, F, G, H and I denote the 9 grids used for training the SVM models.14SVMmodels are trained in total.Testing Algorithm of the SVM Classifier 1: Repeat steps: 1-4 of the training algorithm for a testing sample Y. M is the set of trained SVM models.2:Basedon the signal type (audio or power) and F c , the most appropriate model is pulled up from M. Let the pulled up model is M i .3:Each feature vector F v is fed into M i . 4:

Table 7 .
Trained support vector machine (SVM) Models for 60 Hz and 50 Hz Power and Audio Recordings.
A, B, C, D, E, F, G, H and I are the 9 grids used for training SVM models.There are 14 trained SVM models in total.

Table 8 .
Comparison of Different Classification Techniques for 50 Hz Power Data.

Table 9 .
Comparison of Different Classification Techniques for 50 Hz Audio Data.

Table 10 .
Training and Testing Accuracies (%) for 60 Hz and 50 Hz Power and Audio Recordings.