A Data Size Reduction Approach Applicable in Process Control System of Oil and Gas Plants

In oil and gas plants, the cost of devices applicable for supervising and controlling systems directly depends on the transmission and storage systems, which are related to the data size of process variables. In this paper, process variables frequency-domain and statistical analysis results have been studied to infer if there exists any possibility to reduce data size of the process variables without loss of any necessary information. Although automatic control is not applicable in a shutdown condition, for generalization of the obtained results, unscheduled shutdown data has also been analyzed and studied. The main goal of this paper is to develop an applicable algorithm for oil and gas plants to decrease the data size in controlling and monitoring systems, based on well-known and powerful mathematical techniques. The results show that it is possible to reduce the size of data dramatically (more than 99% for controlling, and more than 55% for monitoring purposes in comparison with existing methods), without loss of vital information and performance quality.


Introduction
In order to respond to increasing demand for safe and efficient oil and gas plant operation while considering environmental regulations, the subject of "process control" has become increasingly important in recent years [1]. In process control systems, some activities such as supervisory control, data logging and performance monitoring, with their hierarchical relation illustrated in Figure 1, are pursued based on available process variables [2]. These variables include pressure and flow rate of fluids, temperature of flame or materials, liquid level in tanks, and other quantitative items [3]. Some of these variables are measured by sensors, transferred on industrial networks, processed in distributed or central control systems, and monitored in control consoles [4]. Management of the huge amount of process variables of chemical process plants is recently categorized as a "BIG-DATA" concept [5]. Transmission and storage of these variables has considerable cost in many industrial plants, such as oil and gas refineries, and so reduction of these expenses is vital for managers [6]. The main motivation for this research originates from the important issue of "big data management" in large-scale chemical plants. Subjects such as appropriate alarm management, efficient decision-making, and simultaneously handling a large amount of received data in control and monitoring systems imply that an applicable algorithm for reducing the size of stored data is crucial. The presented methodology in this paper is a solution to ignore less important information and just to keep and save the vital data for controlling and monitoring purposes, and also for future studies and analysis. In addition, it should be highlighted that in large-scale chemical plants such as oil and gas refineries, due to high sampling rates and also the large number of field instruments and transmitters, the dedicated costs related to data storage capacity are too high. This issue also persuades field engineers to develop applicable algorithms to decrease the volume of the online and stored data, while not imposing negative effects on normal operational conditions of the plants. Horch and Isaksson [7] have introduced a method for sampling rate assessment based on the Harris Performance Index [8] for a control system. Adaptive sampling rate has been applied in [9] and [10] for data size reduction in remote monitoring and control systems. In [11][12][13] real-time and in [14,15] offline data compressing methods have been developed. Another work on this concept can be seen in [6], in which the compression effect on archived data in a control system has been studied. For industrial network performance enhancement, an effective method has been proposed in [16]. Although many other works are in progress on principal component analysis (PCA), which can reduce data size in control systems, the PCA method is not considered in this paper due to its high dependence on data correlation and its inability to reduce data size in the case of non-correlated variables [17]. In addition, the reliability of PCA-based methods depends on process condition stability [17]. Another limitation for this method is transforming the main process values to some new variables which cannot be interpreted by process engineers for process analysis [17]. Furthermore, some of the effects of variable characteristics such as serial correlation of the results of PCA-based methods are still unknown and should be investigated in future research [18]. As previously mentioned, signals and data handling in oil and gas plants are a subset of "Big-Data Management". According to [5,[19][20][21], although the study of chemical plants online and recorded variables can be categorized as a concept of BIG-DATA management, most of the existing methods for handling BIG-DATA are not totally applicable to chemical process controllers due to the following reasons. First, the proposed techniques are mostly appropriate solely for monitoring, and not controlling, purposes. Secondly, the cloud data processing needs outsourcing and hand-overing of data to a third party. In other words, this issue is not a routine task in the chemical industries, especially in oil and gas plants, due to security and safety concerns. Most of the established research noted here has focused only on one aspect of data size reduction in oil and gas plant control and monitoring systems. The main motivation for this research originates from the important issue of "big data management" in large-scale chemical plants. Subjects such as appropriate alarm management, efficient decision-making, and simultaneously handling a large amount of received data in control and monitoring systems imply that an applicable algorithm for reducing the size of stored data is crucial. The presented methodology in this paper is a solution to ignore less important information and just to keep and save the vital data for controlling and monitoring purposes, and also for future studies and analysis. In addition, it should be highlighted that in large-scale chemical plants such as oil and gas refineries, due to high sampling rates and also the large number of field instruments and transmitters, the dedicated costs related to data storage capacity are too high. This issue also persuades field engineers to develop applicable algorithms to decrease the volume of the online and stored data, while not imposing negative effects on normal operational conditions of the plants.
Horch and Isaksson [7] have introduced a method for sampling rate assessment based on the Harris Performance Index [8] for a control system. Adaptive sampling rate has been applied in [9,10] for data size reduction in remote monitoring and control systems. In [11][12][13] real-time and in [14,15] offline data compressing methods have been developed. Another work on this concept can be seen in [6], in which the compression effect on archived data in a control system has been studied. For industrial network performance enhancement, an effective method has been proposed in [16]. Although many other works are in progress on principal component analysis (PCA), which can reduce data size in control systems, the PCA method is not considered in this paper due to its high dependence on data correlation and its inability to reduce data size in the case of non-correlated variables [17]. In addition, the reliability of PCA-based methods depends on process condition stability [17]. Another limitation for this method is transforming the main process values to some new variables which cannot be interpreted by process engineers for process analysis [17]. Furthermore, some of the effects of variable characteristics such as serial correlation of the results of PCA-based methods are still unknown and should be investigated in future research [18]. As previously mentioned, signals and data handling in oil and gas plants are a subset of "Big-Data Management". According to [5,[19][20][21], although the study of chemical plants online and recorded variables can be categorized as a concept of BIG-DATA management, most of the existing methods for handling BIG-DATA are not totally applicable to chemical process controllers due to the following reasons. First, the proposed techniques are mostly appropriate solely for monitoring, and not controlling, purposes. Secondly, the cloud data processing needs outsourcing and hand-overing of data to a third party. In other words, this issue is not a routine task in the chemical industries, especially in oil and gas plants, due to security and safety concerns. Most of the established research noted here has focused only on one aspect of data size reduction in oil and gas plant control and monitoring systems. The main goal of this article is to develop an applicable algorithm for oil and gas plants to decrease the data size in the control and monitoring system. To this aim, three practical procedures have been introduced in a pilot-scale hydrocarbon refinery to reduce communication, control and storage costs. One method for decreasing the data size and consequently reducing the transmitted data from sensors or input cards to controllers is decreasing the samples at a given time. In this paper, at the first step, selected process variables have been analyzed in the frequency domain for all data gathering periods, by modelling the signals using Fourier transform and fast Fourier transform (FFT) in order to obtain an overall view of frequency components. Next, real signals have been analyzed by the discrete wavelet transform (DWT) method to study frequency component variations in time. We have also executed performance index analysis to verify the obtained results from FFT and DWT analysis by the frequency indirect method in determining sampling rate and data compressing method selection [22]. The performance index analysis technique is more useful for obtaining sampling rate of those process values which cannot be calculated by frequency-domain analysis methods like FFT or DWT. Frequency-domain analysis of process variables is beneficial because for a slow process value, data may be compressed by a simple algorithm and saved in a smaller memory space in comparison with uncompressed data [22]. Frequency analysis results can also be used for control device selection [23]. Some efforts for sampling rate assessment have been made, like the method in [7], by using a performance-index-based method in which an ARMA (auto-regressive moving average) model of a control system was applied. One of the superior aspects of the ARMA model for sampling rate assessment is the prediction of the future behavior of a system [24]. There are also ongoing works to improve the predictive property of the ARMA model [25]. Predictive property is essential for a control system to determine the best sampling rate not only according to the past events, but also to take into account the future conditions. By doing frequency and performance index analysis, a new and useful method for sampling rate selection has been introduced in this paper. In the next step, traffic model of a well-known industrial network (Foundation Fieldbus) has been studied to determine how useful the proposed approach in comparison with other methods is for the reduction of communication traffic in industrial networks, where process variables were studied from a statistical characteristic point of view. One of the most useful statistical criteria is correlation, which shows a relationship between process variables [26]. It may be used for data removing and compressing without information loss. Finally, based on Harris performance index benchmarking [8] and also other technical considerations, negative effects of the proposed methods, for data size reduction on control system performance and safety, have been fully studied. These methods can also be used for other chemical process plants [27], and in this article process variables in a dehydration unit of a gas refinery have been used.

Details of the Selected Process Unit
The main function of a dehydration unit, considered in this case study, is to reduce moisture and heavy hydrocarbon components of output gas in a gas refinery. The schematic of the unit and the selected process variables are shown in Figure 2. The applied symbols used in Figure 2 are standard ones recommended by ISA 5.1 1984 [28].
Gas enters Exchanger 1 at a nominal flow rate of 3.4 million cubic meters per day. The temperature of the input gas varies in the range of +50 • C to +70 • C. Exchanger 1 reduces gas temperature to +20 • C. Separator 1 separates condensed liquid from gas. Then, gas enters Exchanger 2 and its temperature decreases to about +5 • C. Finally, it travels to Exchanger 3 and its temperature comes down to about −15 • C. After separating remaining liquids from gas, it returns to Exchanger 2 and Exchanger 1 tube-side for cooling the input gas to the unit, and then leaves the dehydration unit as refinery output gas.
In the normal operation mode of the mentioned unit, the input gas-water component dew point is about +25 • C to +30 • C in summer and +15 • C to +20 • C in winter, and the dew point of output gas is −14 • C in summer and −20 • C in winter. The dew point of heavy hydrocarbon components in the input gas is +40 • C in summer and +30 • C in winter, while for the output gas it is −6 • C in summer and −10 • C in winter. In the present study, some key process values which have been accessible and have had a main role in the process functionality have been considered.
In this paper, the studied data have been taken in an interval of 18 months. The first selected period of time had the greatest changing rate in process values according to the long-term observations and the second batch of data is from moments after an unscheduled shutdown occurrence. The fastest and greatest changes in variables are when the output line pressure is low due to a large flow of gas consumption on the consumer side. The sample rate for data gathering has been set to 5 Hz and we have 27,391 (for normal operation) plus 25,586 (for shutdown condition) samples for each selected process variable. Each sample has been saved as a double floating point variable in the data base [29,30]. Ensemble length and sample intervals are larger than what is recommended in [31].

Frequency-Domain Method
Since process variables, which are used and recorded in modern control systems such as programming logic controllers (PLCs), distributed control systems (DCSs) and Fieldbuses, are generally digital [3], discrete Fourier transform (DFT) (fast Fourier transform (FFT)) and discrete wavelet transform (DWT) are appropriate approaches as discrete frequency analysis methods. FFT is utilized for an overall view of frequency components in a signal study time window [32] and the DWT method is a time-frequency analysis and is localized in a time domain [33,34].

Archived Data Specification
In this paper, the studied data have been taken in an interval of 18 months. The first selected period of time had the greatest changing rate in process values according to the long-term observations and the second batch of data is from moments after an unscheduled shutdown occurrence. The fastest and greatest changes in variables are when the output line pressure is low due to a large flow of gas consumption on the consumer side. The sample rate for data gathering has been set to 5 Hz and we have 27,391 (for normal operation) plus 25,586 (for shutdown condition) samples for each selected process variable. Each sample has been saved as a double floating point variable in the data base [29,30]. Ensemble length and sample intervals are larger than what is recommended in [31].

Frequency-Domain Method
Since process variables, which are used and recorded in modern control systems such as programming logic controllers (PLCs), distributed control systems (DCSs) and Fieldbuses, are generally digital [3], discrete Fourier transform (DFT) (fast Fourier transform (FFT)) and discrete wavelet transform (DWT) are appropriate approaches as discrete frequency analysis methods. FFT is utilized for an overall view of frequency components in a signal study time window [32] and the DWT method is a time-frequency analysis and is localized in a time domain [33,34].
In FFT analysis of selected process variables, signal concentrates in 0 Hz and decays rapidly in the near neighborhood of 0 Hz for all process variables. Since most process variables can be estimated by a second order differential equation, whose magnitude decays rapidly after the second pole, it is not necessary to consider process values after the cut-off frequency [1,32]. Table 1 shows a 3 db cut-off frequency for each process value and also a comparison between absolute mean values of a process value (zero frequency component) and a maximum value of FFT analysis.  Figure 3 shows a sample of a FFT curve of a process variable, which looks like a low-pass filter transfer function [35]. It is obvious that the absolute mean value of each process variable is equal to the maximum of frequency component magnitude in FFT analysis. The mean value of a signal is its zero-frequency component magnitude [36]. So, the maximum of frequency component magnitude occurs at zero frequency and this result validates FFT curves. In FFT analysis of selected process variables, signal concentrates in 0 Hz and decays rapidly in the near neighborhood of 0 Hz for all process variables. Since most process variables can be estimated by a second order differential equation, whose magnitude decays rapidly after the second pole, it is not necessary to consider process values after the cut-off frequency [1,32]. Table 1 shows a 3 db cutoff frequency for each process value and also a comparison between absolute mean values of a process value (zero frequency component) and a maximum value of FFT analysis. Figure 3 shows a sample of a FFT curve of a process variable, which looks like a low-pass filter transfer function [35]. It is obvious that the absolute mean value of each process variable is equal to the maximum of frequency component magnitude in FFT analysis. The mean value of a signal is its zero-frequency component magnitude [36]. So, the maximum of frequency component magnitude occurs at zero frequency and this result validates FFT curves.
It can be seen in Table 1 that a 3 db bandwidth of frequency components in FFT analysis is from 0 Hz to frequencies between 5 × 10 −5 Hz and 6 × 10 −5 Hz. This shows that the shortest period of change in process values is about 5 h.   It can be seen in Table 1 that a 3 db bandwidth of frequency components in FFT analysis is from 0 Hz to frequencies between 5 × 10 −5 Hz and 6 × 10 −5 Hz. This shows that the shortest period of change in process values is about 5 h.
For further study, the mean value of each process variable has been removed. According to the band-pass property of the frequency-domain pattern of the remaining signal, low and high cut-off Sustainability 2020, 12, 639 6 of 22 frequencies and the peak values in the frequency domain, in comparison with the mean value of process variables, have been stated in percentages in Table 2. In Table 2 it can be seen that 79% of process values have about 0.0001 Hz bandwidth and only DPT3, DPT4 and PT1 have more than 0.0001 Hz bandwidth, which has a small magnitude in comparison with the DC value (under 1%) and hence their bandwidth is not important or influencing [3]. The worst case happens for LT1 which has a 0.0003 Hz cut-off frequency and 7.81% of DC component magnitude as its peak magnitude. For LT1, according to Nyquist rule [32], we need 0.7 h sampling interval as the shortest interval between the samples. Bandwidth difference between an original signal and a DC-removed signal originates from the fact that bandwidth is obtained from cut-off frequencies according to finding a 3 db frequency drop in comparison with the maximum value of the signal. For signals in which the DC component is dominant, when the DC component is removed, the remaining signal will have a completely different maximum amplitude value and therefore different 3 db frequencies. As a result of FFT analysis of process variables and also based on Nyquist theorem, we found 5 h as the shortest sampling interval for process variables with DC component and 0.7 h for process variables without DC frequency component for change analysis. It is clear that both sampling intervals are too long for typical oil and gas processes to take an action against fluctuation [1].
Generally, FFT analysis for overall frequency-domain analysis is perfect, and other frequency analysis methods cannot give a complete view of the overall properties of a signal in the frequency domain [33]. However, for the detection of frequency components which take place in a short time and cannot be studied in FFT analysis, wavelet analysis may be performed. For wavelet analysis, normally the mother function Daubechies 3 is selected. The reason for this selection is its fast vanishing, orthonormality and also compactly supported property, which shorten calculation time for wavelet transformation beside its relative smoothness [33]. Figure 4a,b shows the time domain and wavelet analysis trend for one of the variables. As it can be seen in Figure 4a,b, the most common background magnitudes (colored) in 2D diagrams are zero. But in TT2, non-zero magnitudes start from step 13 (smallest frequency = 4.8828 × 10 −4 Hz) and continues up to step 9 (0.0078 Hz). In LT2, non-zero magnitudes start from step 13 to 1. It is clear that non-zero magnitude components are observed in some moments, but not for the entire signal existence duration. So, it can be stated that LT2 has instantaneous frequency components in greater frequencies in comparison with TT2.  In wavelet output analysis, according to 2D diagrams, no important frequency component is observed at high frequencies except for LT2, LT3, and TT1. In the next step, time-varied frequency components obtained by wavelet analysis for each process value have been studied in detail. In this study, for each step of the process value wavelet analysis, the maximum of absolute value of each step output has been normalized by absolute of the process variable mean value. So, this method shows the ratio of maximum value of a frequency component to the mean value of the process variable (mean value of a process variable is the magnitude of zero frequency component which is generally the dominant frequency in process industries). In wavelet output analysis, according to 2D diagrams, no important frequency component is observed at high frequencies except for LT2, LT3, and TT1. In the next step, time-varied frequency components obtained by wavelet analysis for each process value have been studied in detail. In this study, for each step of the process value wavelet analysis, the maximum of absolute value of each step output has been normalized by absolute of the process variable mean value. So, this method shows the ratio of maximum value of a frequency component to the mean value of the process variable (mean value of a process variable is the magnitude of zero frequency component which is generally the dominant frequency in process industries). In Figure 5, the ratio of absolute maximum magnitude of process variable time-varied frequency components to absolute of mean value of process variables has been depicted in percentage. It is obvious that, except for LTs, especially for LT2, there are not too many dominant frequency components greater than zero even for the absolute maximum of instantaneous frequency component magnitudes. So, we can claim that there is no valuable frequency component for frequencies greater than zero for process variables.
process variable bandwidth, and therefore best sampling rate, are calculated according to Nyquist rule. In this paper, 10% precision for the absolute value of each process variable has been considered.
In Figure 6, the best sampling rate can be obtained by doubling the frequency, which is written above the bars, for related process variables. These sampling rates are slower in comparison with the recommended refresh rate for process variables in the American Petroleum Standard API RP 554 [37] for a proportional integral derivative (PID) loop controller, which is 50 Hz.
To validate the obtained results, data for an unscheduled plant shutdown have been studied. As it can be seen in Figures 7 and 8, for FT1, which measures the main process variable that passes through the whole unit, unscheduled plant shutdown data was changing much slower in comparison with normal operation mode. FFT analysis for unscheduled shutdown trends shows that 3 db bandwidth for all process values is about 6 × 10 −5 Hz. In addition, in Figure 9 it is obvious that from wavelet analysis results for the fastest process variable (LT1), the greatest amplitude of frequency components takes place at a frequency of 1 Hz. Magnitude of this frequency component, for ordinary operation, is about 80% and for unscheduled shutdown is about 10% of the DC frequency component value. For a more precise study of the process variables in both the worst case of normal operation and also occurrence of an unscheduled shutdown, an analytical support for the obtained results by applying direct signal processing over the measured process variables and also proof for frequency Generally, measurements for control purposes rarely need precision above 10% of the absolute value [3]. For each process variable, the greatest frequency component with its absolute maximum greater than 10% of the process variable mean value is found (as showed in Figure 6), from which the process variable bandwidth, and therefore best sampling rate, are calculated according to Nyquist rule. In this paper, 10% precision for the absolute value of each process variable has been considered.
Sustainability 2020, 12, 639 9 of 23 component analysis of signals have been presented. In the dehydration unit, the main process variable is the flow rate of natural gas. So, this variable (FT1) has been selected to be modeled and also to provide mathematical and theoretical calculations to find an exact and closed form of the frequency-domain formula. In the normal operation case, the mathematical model of FT1 signal is a superposition of two types of Poisson narrow triangular pulse trains, which are different in height as depicted in Figure 7. For the unscheduled shutdown case, FT1 is a superposition of step functions, as can be seen in Figure 8. For the first case (worst case of normal operation), the narrow triangularpulses happening distribution in the time domain is assumed as a Poisson distribution, because it satisfies three conditions of Poisson processes stated by Chung and AitSahila [38]. Due to the Oppenheim, Willsky and Nawab statement [32], for the Poisson triangular pulse train it can be said that it is the result of the convolution of a triangle pulse and a Poisson impulse train in the time domain. The frequency-domain equation of a triangle pulse is [32]: Figure 6. Maximum frequency which has components more than 10% of the process variable mean value. Figure 6. Maximum frequency which has components more than 10% of the process variable mean value.
In Figure 6, the best sampling rate can be obtained by doubling the frequency, which is written above the bars, for related process variables. These sampling rates are slower in comparison with the recommended refresh rate for process variables in the American Petroleum Standard API RP 554 [37] for a proportional integral derivative (PID) loop controller, which is 50 Hz.
To validate the obtained results, data for an unscheduled plant shutdown have been studied. As it can be seen in Figures 7 and 8, for FT1, which measures the main process variable that passes through the whole unit, unscheduled plant shutdown data was changing much slower in comparison with Sustainability 2020, 12, 639 9 of 22 normal operation mode. FFT analysis for unscheduled shutdown trends shows that 3 db bandwidth for all process values is about 6 × 10 −5 Hz. In addition, in Figure 9 it is obvious that from wavelet analysis results for the fastest process variable (LT1), the greatest amplitude of frequency components takes place at a frequency of 1 Hz. Magnitude of this frequency component, for ordinary operation, is about 80% and for unscheduled shutdown is about 10% of the DC frequency component value.        By performing FFT analysis, it can be observed that the frequency bandwidth of process values in both normal operation and unscheduled shutdown cases is about 6 × 10 −5 Hz. However, by applying wavelet frequency-time analysis for finding transient events, it can be said that the unscheduled shutdown case has no process-variable transient changing (substantial transient frequency component greater than 0 Hz component) faster than the normal operation case.
For a more precise study of the process variables in both the worst case of normal operation and also occurrence of an unscheduled shutdown, an analytical support for the obtained results by applying direct signal processing over the measured process variables and also proof for frequency component analysis of signals have been presented. In the dehydration unit, the main process variable is the flow rate of natural gas. So, this variable (FT1) has been selected to be modeled and also to provide mathematical and theoretical calculations to find an exact and closed form of the frequency-domain formula. In the normal operation case, the mathematical model of FT1 signal is a superposition of two types of Poisson narrow triangular pulse trains, which are different in height as depicted in Figure 7. For the unscheduled shutdown case, FT1 is a superposition of step functions, as can be seen in Figure 8. For the first case (worst case of normal operation), the narrow triangular-pulses happening distribution in the time domain is assumed as a Poisson distribution, because it satisfies three conditions of Poisson processes stated by Chung and AitSahila [38]. Due to the Oppenheim, Willsky and Nawab statement [32], for the Poisson triangular pulse train it can be said that it is the result of the convolution of a triangle pulse and a Poisson impulse train in the time domain. The frequency-domain equation of a triangle pulse is [32]: And frequency-domain equation for Poisson impulse train is [39]: It is obvious that the convolution in time domain equals multiplying the frequency domain [32]. So, the frequency-domain definition of the Poisson triangular pulse train would be Flow normal operation (f) = T sin c 2 (πfT) × β[1 + 2πβδ(f)] = Tβ sin c 2 (πfT) + 2πβδ(f) By assuming T to be small enough, sin c 2 term in Equation (3) will be very small and Equation (3) becomes Flow normal operation (f) = 2πβ 2 Tδ(f) which shows 0 Hz property of signal and corresponds with results of frequency analysis from FFT and DWT of signals.
For an unscheduled shutdown signal, it can be seen that in the selected time window, the signal is the superposition of step functions as shown in Equation (5), while k 0 ≥ m n=1 k n (6) Hence, in the frequency domain we have [32]: Flow unscheduled shutdown (f) = 2πk 0 δ(f) − m n=1 k n e −j2πft n 1 j2πf + πδ(f) π(2k 0 − m n=1 k n )δ(f) By taking into account condition (6), the term in parentheses will be a positive value k and the result reduces to Equation (8): Flow unscheduled shutdown (f) = πkδ(f) (8) From Equation (8), it is clear that the unscheduled shutdown signals also have a dominant 0 Hz component and behave as a DC signal. This result corresponds with results of frequency analysis from the FFT and DWT of the signal.

Adaptive Sampling Rate Method Investigation
In this method, the sampling rate of process variables varies based on the rate of their changes (signal frequency-domain property) [10].
In existing control networks adopted for oil and gas plants, such as Foundation Fieldbus, measurements and commands for controlling purpose are categorized as scheduled transmissions, for which their transmission interval cannot be changed [40]. In addition, in control networks in other type of plants, besides sending measurements and receiving commands from remote supervisors in an adaptive sampling rate, control and monitoring systems include local main controllers which do not use an adaptive sampling rate. This policy is generally being followed to increase the total reliability of the plant [10].
It shows that the adaptive sampling rate has not been assessed as a reliable solution for control purposes in critical plants.

Performance Index Based Analysis
Based on performance index for control loops introduced by Harris [8], an approach to obtaining the almost-optimal sampling rate for controlling purposes has been developed by Alexander Horch and Alf J. Isaksson [7]. In this article, the mentioned approach for performance-index calculation has been used in a different way, in which its aim is finding the best sampling rate of control and also data storage systems of process variables. To find the best sampling rate using a performance index criterion, first, the fastest available sampling rate in a well-tuned controlling system can be used and the performance index will be calculated. If the selected sampling rate is greater than the needed value, the process variable time series present noisy behavior. So, the performance index becomes less than one due to oversampling and also taking many samples during the system time delay [6]. In the next step, samples will be picked up at a lower frequency, from the original ensemble of data. So, the process variable time series demonstrate more predictability, and the performance index will go up. By iterating the previous steps, and in each step picking up samples from the original sample series at a lower frequency, the performance index will increase. The sampling period which causes the performance index to become equal to one will be selected as the appropriate sampling rate. For applying the proposed method over the selected unit, in the first step an auto-regressive moving average (ARMA) model for the control loop in 1/80 of the basic sample rate (16 sec interval) has been considered. For more details on ARMA structure refer to [41]. ARMA model parameters for the subsystems studied in this paper are summarized in Table 3.
The performance index for 1/80 of the basic sample rate is equal to 2.06. The performance indexes for other data-saving sampling rates have been summarized in Figure 10. As it has been depicted, by decreasing the sample pick-up interval, performance index decreases, and for a specific interval it is equal to 1. For each control system, it is not possible to have intervals greater than the delay size between input and output. For LT2, due to controller sampling rate limitation and its wide frequency band, frequency components more than the basic sampling rate frequency cannot be tracked. As a result, curves after sampling rate frequency would be estimated by linear extrapolation. The performance index for 1/80 of the basic sample rate is equal to 2.06. The performance indexes for other data-saving sampling rates have been summarized in Figure 10. As it has been depicted, by decreasing the sample pick-up interval, performance index decreases, and for a specific interval it is equal to 1. For each control system, it is not possible to have intervals greater than the delay size between input and output. For LT2, due to controller sampling rate limitation and its wide frequency band, frequency components more than the basic sampling rate frequency cannot be tracked. As a result, curves after sampling rate frequency would be estimated by linear extrapolation.  By comparison of sampling intervals obtained in Figure 10 for selected process variables from the performance index analysis with those obtained from wavelet analysis, it can be deduced that performance index results have a smaller sampling period (greater sampling rate) than the obtained results from wavelet analysis. Hence, they can capture more precisely the fast phenomena details and changes. The obtained results have been summarized in Table 4.
Finally, although in this paper applicability of the proposed method has been presented for both normal operation and unscheduled shutdown cases, it should be reminded that during shutdown or startup condition all systems are in a manual state and, in fact, there is no need to apply the calculated sample rates to the control systems.

Combination of Frequency and Statistical Analysis Methods
By combining both wavelet and performance index methods, the flowchart for the developed algorithm to find a sampling interval for the process variables is presented in Figure 11. Sustainability 2020, 12, 639 14 of 23 Figure 11. Flowchart for obtaining best sampling rate for control, monitoring and historian purposes algorithm. Figure 11. Flowchart for obtaining best sampling rate for control, monitoring and historian purposes algorithm.
Moreover, it should be noted that sampling rate plays a vital role in modern process control systems. In other words, these systems are completely digital, and what connects real analog process values to the digital world is sampling rate selection. An illustration of the sampling mechanism and a comparison between conventional and proposed sampling-rate determination methods are shown in Figure 12.   Figure 12. A schematic of analog signal digitizing by hierarchical sampling rate: from the fastest (for controlling purpose) to the slowest (for historian purpose) sampling rates, a comparison between conventional methods and the proposed method.
To find the best sampling rate in oil and gas plant process control and monitoring systems, sampling rates obtained by conventional methods are compared with the sampling rates using the proposed method as shown in Table 4.  Figure 13 illustrates the size of data which were generated in 1.5 h from sampling rates obtained by using the proposed method in this paper for all process values in a gas refinery dehydration unit, compared to the conventional methods mentioned in Table 4. Numerical interpretation of Figure 13 can be seen in Table 5. It should be noted that for the outputs of the control system, such as control valve commands, the sampling rate of the input of the control loop may be used, because the output frequency spectrum of a control loop is the multiplication of the input spectrum and control loop transfer function spectrum. The output frequency spectrum is like a low-pass filter [1]. As it can be seen in Figure 13, for controlling purposes, the sampling interval calculated by the presented method would reduce the data communication traffic and data processing load of the controller by more than 99.11% in comparison with the API 554 method. For monitoring purposes, the presented technique will also reduce data size by 55.42% in comparison with the API 554 method. However, for historian purposes, the introduced method increases the size of data approximately 33.4 times. In comparison with the Åström and Wittenmark method [1], in controlling purposes about 99.71% and for both monitoring and historian purposes about 55.42% reduction in data size was observed. The reason for increasing data in comparison with the API 554 method for historian purposes is because of the disability of the API 554 method in historian purposes, to show the process dynamics. According to the API 554 standard, for the studies which may be done during the 30 days after data recording, the recorded data may be used for process behavior and failure analysis. So, it should consist of process dynamics. Among all the mentioned methods, only the presented technique can show process dynamics in historian purposes properly. So, it is not a drawback if the size of the data collected by this method is greater than other approaches for historian purposes. In this section, a correlation analysis of process variables has been presented. From linear correlation, we cannot judge non-linear correlation. So, as a complementary test of this method, a visual check of scatter diagrams of each pair of process values, which have correlation factors between +0.8 and −0.8, has been performed and no meaningful correlation in that range has been detected. In this study, correlation analysis between process variables in two cases (normal operation and unscheduled shutdown) has been done.   For verification of previous studies, data gathered in an unscheduled shutdown have been studied. The results show that for TT2, LT1, LT2 and DPT1 frequency-band widths are almost zero for all cases. This observation shows slower changes for an unscheduled shutdown case, compared to the normal operation worst-case signals. The obtained results verify the theoretical frequency-domain analysis of process variables and the slow rate of changes, presented in Section 4.1.1.

Traffic Model Analysis
Industrial networks can be modelled as a queuing system. This is possible because published packets on the network can be assumed to be arriving customers, and the length of a packet as the serving time of the server.
Foundation Fieldbus as an industrial network sample can be defined in Kendall's notation as a D/D/1 traffic model. D stands for Deterministic ( Figure 14) [40,42]. As a result, the time of packet transmission starting and its duration are known and cannot be changed.
Whereas a macrocycle is the least common multiple of the entire loop times on a given link [43], it will be possible to take the length of a macrocycle for the control system studied in this paper as 500 s. For existing fieldbus systems, such as the Delta-V system, we have a maximum 5 s length for the macrocycle [44] and this value could be revised according to the results of this paper. This extension of the macrocycle time is possible because for variables DPT1, DPT2, DPT3, DPT4, FT1, LT1, LT3, PT1, TT1, TT2, TT3, TT4 and TT5, the publishing period can be 500 s instead of 5 s. Using the proposed macrocycle in the presented manuscript and also the introduced methods in [16] for data obtaining after applying the sample-rate changing method in Section 4.1.4, it is possible to make a larger capacity for the Foundation Fieldbus network. For example, for the studied dehydration unit, saving about 82% of the occupied communication time is feasible. It should be noted that the time-saving calculated for the Foundation Fieldbus system is in comparison to the traffic for data after applying the enhanced sampling rate, not in comparison with the API 554 method sampling-rate selection. As a result, installing more transmitters on a Fieldbus network segment instead of 32 devices, which is the maximum capacity for a H1 segment [45], would be possible.

Correlation Analysis
In this section, a correlation analysis of process variables has been presented. From linear correlation, we cannot judge non-linear correlation. So, as a complementary test of this method, a visual check of scatter diagrams of each pair of process values, which have correlation factors between +0.8 and −0.8, has been performed and no meaningful correlation in that range has been detected. In this study, correlation analysis between process variables in two cases (normal operation and unscheduled shutdown) has been done.
These correlations and process-variables coupling in an oil and gas plant are often due to the laws of fluid mechanics which relate them to each other. By the aspect of the control system which has been considered in this paper, there is good linear correlation for DPT1, DPT2, DPT3, DPT4, PT1 and FT1 and also between TT3 and TT5 in a normal operation case. Moreover, there is a linear correlation for DPT1, DPT2, DPT3, DPT4, PT1 and FT1 and also for TT3, TT4 and TT5 in an unscheduled shutdown case. As a result, the removal of 4 process variables in a normal operation case and 5 process variables in an unscheduled shutdown case is acceptable.
Using the presented method in this section (for normal operation case), 55.56% (and if its association with changing sampling-rate method is used, 99.19%) of the data resources in the control system can be saved in comparison with the API 554 technique. For an unscheduled shutdown case, even removing redundant data is more effective to reduce data size, because instead of 5 variables, 6 variables are removed.
The results of using only this method and also associated with other methods for data monitoring and storage, and traffic (networking) systems, can be seen in Table 6.

Effect of Proposed Data Size Reduction Methods on Control System Performance and Plant Safety
For the implementation of data size reduction methods in industrial oil and gas plants introduced in the previous sections, it is essential to determine that the proposed techniques have no negative effect on control system performance and plant safety. Related to "process safety and control" concerns, three reasons can be mentioned: (A) It should be ensured that any important change in the selected process variables is not missed, which enables the control/safety system to take an on-time and accurate reaction as a response. Hopefully, by applying a time-frequency analysis method (DWT method), finding drastic changes in the process variables is possible. As a result, by choosing a fast enough sampling rate, based on the presented methodology and according to time-frequency analysis, a suitable scheme can be designed to guarantee not to miss important changes in the selected process variables. (B) Another issue which may cause the missing of important changes in the process variables is the high volume of calculations between two consecutive sampling times. To prove this concept, the Harris performance index is a reliable benchmarking tool [8]. Two essential factors that affect the Harris performance index are sampling rate and controller dead-time [7]. So, for each method, the effects on these two factors are evaluated to find out whether they have any side effects on the control system performance.

Sampling Rate Changing Method
Because sampling rate is fixed, it needs minimum processing power and it cannot cause significant delay due to increasing processing time. So, this method has no side effect on control system performance.

Traffic Enhancement Method
From a delay point of view, there exists no difference compared to the Foundation Fieldbus network, which is widely utilized in oil and gas plants. So, it has no negative effect on control system performance.

Correlation Analysis Method
By selecting the fastest sampling rate among linear correlated process variables, no negative effect on sampling rate of process variables will appear.
For delay analysis in this method, it can be stated that one real analog input is used as a main process variable. Then, by utilizing a linear equation like Equation (9), other variables which have linear correlation with the selected one are calculated.
Correlated Process Variable = (Coefficient × Main Process Variable) + Y _Intercept In industrial PLCs which are known as the main controllers in oil and gas plants, the program runs in a scanning manner. Each duration of the program running from start to end is called a scan time. Scan time depends on the number and types of instructions in the program, the interrupts and their duration, and the CPU type. Its typical duration can be from 5 ms to 20 ms or even more [46]. So, for investigating the delay effect of analog input by removing by correlation analysis method, time of multiplication and summation together in Equation (9) should be stated as a percentage of a typical scan time. This concept has been studied on a S7-400 CPU in a SIEMENS SIMATIC PLC system [47]. Floating-point math instructions have 0.3 µs duration, which is 0.006% of total scan time. This fact shows that the correlation-based method of data size reduction has no considerable effect on control system delay. (C) Finally, there are many safety associated systems/equipment generally embedded in chemical plants, such as emergency shut down (ESD), safety and relief valves, which ensure a safe operation even if failures occur in the regular "control and monitoring" system.
Thus, it can be concluded that the method presented in this paper to reduce the data size in typical oil and gas plants does not have a negative impact on the total performance of the control system and also plant safety.

Conclusions
In this paper, several process variables in a gas refinery dehydration unit in the frequency domain and also statistical, traffic and correlation points of view have been analyzed to find applicable methods of data size reduction. To check the general characteristics of the presented method, a normal operation worst-case and an unscheduled shutdown case have been studied. Adaptive sampling rate in oil and gas plants has been assessed and recognized as an improper method due to lack of reliability in industrial networks and also slow rate of change across many process variables.
In comparison with common techniques, by applying the introduced method, data size reduced to more than 99% for controlling purposes and more than 55% for monitoring objectives. In addition, there was no data size reduction for historian purposes, compared to the API 554 method. Using Harris performance-index benchmarking, it has been demonstrated that the data size reduction methods proposed in this paper have no negative effect on the control system performance and plant safety. Moreover, by applying the proposed methods, cost and size of the control system storage facilities, network hardware, and also input signal devices can be dramatically reduced, without significant data loss. Furthermore, as a future research plan, non-linear correlation analysis of the process variables can be considered to increase the efficiency of the presented methodology.