A Method for Structure Breaking Point Detection in Engine Oil Pressure Data

Aleksandra Grzesiek; Radosław Zimroz; Paweł Śliwiński; Norbert Gomolla; Agnieszka Wyłomańska

doi:10.3390/en14175496

,

and

¹

Faculty of Geoengineering, Mining and Geology, Wroclaw University of Science and Technology, Na Grobli 15, 50-421 Wroclaw, Poland

²

KGHM Polska Miedź S.A., M. Skłodowskiej-Curie 48, 59-301 Lubin, Poland

³

DMT GmbH & Co. KG, Am Technologiepark 1, 45307 Essen, Germany

⁴

Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wrocław University of Science and Technology, Wyspiańskiego 27, 50-370 Wrocław, Poland

Energies2021, 14(17), 5496;https://doi.org/10.3390/en14175496

This article belongs to the Special Issue Mining Technologies Innovative Development

Version Notes

Order Reprints

Abstract

In this paper, a heavy-duty loader operated in an underground mine is discussed. Due to extremely harsh operational conditions, an important maintenance problem is related to engine oil pressure. We have found that when the degradation process appears, the nature of variation of pressure engine oil changes. Following this observation, we have proposed a data analysis procedure for the structure break point detection. It is based on specific data pre-processing and further statistical analysis. The idea of the paper is to transform the data into a nearly monotonic function that describes the variation of machine condition or in the statistical language—change of the regime inside the process. To achieve that goal we proposed an original data processing procedure. The dataset analyzed in the paper covers one month of observation. We have received confirmation that during that period, maintenance service has been done. The purpose of our research was to remove ambiguity related to direct oil pressure analysis and visualize oil pressure variation in the diagnostic context. As a fleet of machines in the considered company covers more than 1000 loaders/trucks/drilling machines, the importance of this approach is serious from a practical point of view. We believe that it could be also an inspiration for other researchers working with industrial data.

Keywords:

machine diagnostics; LHD; engine oil pressure data; oil pump wear; statistical analysis; convergence functions

1. Introduction

Maintenance procedures for a fleet of the load-haul dump (LHD) mobile machines operated in underground mine are critically important and challenging. To reach the required level of efficiency, modern maintenance is supported by on-board monitoring systems and advanced analytics. The monitoring system is acquiring data in every 1 s (or more, depending on the type of variable) and measured values are stored in local memory. After the shift, when the machine is going back to the so-called machine chamber, the data are automatically transferred via WIFI to the server on the surface (data storage and processing center). Specific analysis is automatically performed and some reports are available via the website. Unfortunately, the variables are very specific and exhibit different characteristics, thus it is required to apply different algorithms to their analysis. Till now, simple comparison with threshold is used, for example, for low frequency temperature data. In this paper, we present an example related to the engine oil pressure measurement. A base for all analysis is the need to detect the structure break point (called also the anomaly or the structure/regime change point), interpret data automatically, and provide simple information to appropriate staff. Engine oil pressure is very different than mentioned temperature data. It is relatively high-frequency variable. It varies from zero to ca. 700 kPa, but we have found that distribution of the process is more informative than just simple particular sample values. It is hard to make any conclusion from the raw signal. If engine oil pressure is too high, there is an automatic safety valve that releases excess oil from the system. A much worse case is when oil pressure is too small—such a situation may lead to the damage of the engine. However, as we mentioned, the simple diagnosis via comparison with the threshold level is impossible. Thus, in this paper, we propose a statistical procedure for the detection of the so-called structure break point in the highly variable random data.

Our strategy is based on the statistical approach that minimizes the impact of local changes in the examined data. We consider the machine as a time-varying system, so we decide not to analyze the data sample by sample but to estimate its characteristics in the work shifts perspective. A daily operation in the considered mine is divided in four shifts, each 6 h. It is due to mining law regulation, specific processes required for mining production and extremely harsh condition underground. Even if some experts may notice changes in the raw signal, it is very hard to provide objective rules to detect anomalies in practice. Due to the mentioned specific operation of the machine in the mine, firstly we segment the data according to the work shifts. We received advice, that operation of LHD during the first and second work shifts is very different than during the third and fourth work shifts (during the night). The 1st work shift in the day is between 6 a.m.–12 p.m.; the 2nd work shift—12 p.m.–6 p.m.; the third work shift—6 p.m.–0 a.m. and the four work shift—0 a.m.–6 a.m. It appears that shifts are different, indeed, however, from our perspective, it is not critical. From the statistical point of view changes in the data related to change of condition are much stronger than changes related to different work shifts. Our approach allow us to process data to the form that contains information about the condition of the system, not about its daily operation parameters variation as speed, load, etc. From a highly time-varying process related to oil pressure variation, we receive a nearly monotonic function describing the change of condition in the system. Such data transformation is readable for maintenance staff and may be used as a basis for maintenance decisions. It has great practical importance as the population of the LHD fleet exceeds 1000 machines.

A novelty of the approach is related to the appropriate processing of raw data. In the data analysis, we use the fact that the empirical distribution of the engine oil pressure time series may change in the subsequent work shifts, which is strongly related to the degradation process of the machine. Thus, we examine the distance measures between distributions of the data corresponding to the work shifts. Here we utilize the divergence functions that measure the distance between two probability distributions expressed in the means of their probability density functions (pdfs). Based on this characteristic we apply the classical test statistics (i.e., Student’s t and Wilcoxon) used in the problem of testing if two samples have equal means. The final result is the detection of the structure break point which corresponds to the moment of the repair registered by the maintenance staff in the considered mine. From the signal processing perspective, the novelty of the approach is related to statistical processing of long term, historical industrial data to achieve unequivocal information about the change of condition. We propose analyzing data in segments to cancel out local disturbances and extract patterns seen as the probability density function. Then, the difference between segments is estimated by the distance between distributions and such a new feature is used to establish the structural break point in the process. We propose the original data processing procedure that covers pre-processing step (data validation, missing values handling, re-sampling, reshaping data to associate them with specific nature of the process), data analysis (probability density function of data instead of the sample by sample analysis, measures of distance in multidimensional space and finally structural break point detection—i.e., the identification of moment when the process change its nature) and the presentation of the results (transforming obtained information back to real samples domain to visualize change moment). The general concept of the procedure is demonstrated in Figure 1. Namely, the raw data is subjected to a three-stage procedure (successive blocks of the diagram), as a result of which we detect the structural break point. The steps highlighted in each of three stages are described in detail in the following sections, namely, the data pre-processing is presented in Section 2.4, the analysis is described in Section 3, and the visualization is presented in Section 4.

Figure 1. The general concept of the proposed diagnostic procedure. The steps highlighted in the schematic blocks are described in detail in the following sections of the paper.

1.1. Brief State of the Art

LHD machines are commonly used in the mining industry and are critically important for the production process. There are many papers related to various aspects of LHDs. Most of them are focused on reliability analysis [1,2,3,4,5,6], prediction and assessment of machines breakdowns [7], machine performance measures [8,9,10,11,12], utilisation analysis, optimisation, production analysis [11,13,14,15], risk evaluation [16], residual life estimation [17], etc. Just a few works have mentioned condition monitoring [18,19,20,21,22], technical and operational aspects related to LHD machines [23,24,25,26] and real measurement of output torque, identification of operation regimes, etc. [27,28,29,30,31,32]. LHD machines are equipped with on-board monitoring systems. They are used for control as well as for maintenance purposes. As one receives real data from the machine, it opens many opportunities for their analysis.

As it was mentioned, from the mathematical/statistical perspective, the considered problem is related to the segmentation of the data which, is strongly connected with the structure break point detection. In recent years, substantial works on segmentation methods for different applications appeared in the literature. A few interesting applications include condition monitoring [33,34] (where structural break detection method based on the adaptive regression splines technique has been proposed to recognize a change of operational regime in copper ore crusher vibration and local maxima method has been proposed in the time–frequency domain for spike detection in bearings vibration, respectively), biomedical signals (e.g., electrocardiogram) [35,36,37,38,39] (where hidden Markov models, moving average and Savitzky-Golay filter, cepstral analysis, wavelet transform, envelope-based segmentation, etc., have been used), speech analysis [40,41,42] (where nonlinear speech analysis based on the microcanonical multiscale formalism, adaptation of Appel and Brandt algorithm, innovation (Shur) adaptive filter have been discussed), econometrics [43,44] (where the regime switching model is applied), and seismic signals [45,46,47,48,49] (where, among others, cumulative sum of Gaussian probability density functions and Markov regime-switching models as well as the empirical second moment of given raw signal have been proposed for seismic signal segmentation in order to extract seismic events).

Many segmentation methods are based on simple statistics in time domain, the cumulative squared data or empirical second moment [50,51,52,53]. However, one can also find methods based on the representation of the data in different domains, such as time-frequency [34,54]. See also the effective segmentation methods used in the physical sciences, like the methods based on the so-called recurrence statistic [55,56,57,58,59]. The method proposed in this paper is based on the relatively simple statistics, however, they are applied to the characteristics of the data describing the distribution in the work shift.

The problem discussed in the paper could be seen as a process diagnosis (fault detection in the process). Various approaches to fault detection and diagnosis in process data have been developed over decades. They can be divided into three main categories: data-driven approaches, deep-knowledge-based approaches, and analytical-model-based approaches [60]. An analytical-model- and deep-knowledge-based methods rely heavily on fundamental knowledge of the process [60]. Unfortunately, in our data, it is very difficult to provide a model of the process as a pressure variation is depending on the behavior of the operator, the environment, performed task, etc. It leads to—using data science language—highly non-Gaussian, nonlinear, non-stationary data. A data-driven methods specifically refer to methods that rely purely on operational data without using process knowledge. Data-driven approaches learn from history and place no requirements on models or expert knowledge [60].

Data-driven process monitoring or statistical process monitoring applies multivariate statistics and machine learning methods to fault detection and diagnosis for industrial process [61]. Various variants of PCA as recursive PCA (RPCA), dynamic PCA (DPCA), and kernel PCA (KPCA), have been used for process diagnosis. The main idea is that structure of PCA will change when an abnormal situation will appear in the data. For data-driven approaches, methods based on the hidden Markov model, Ddynamic neural network, kernel independent component analysis, Gaussian mixture model, hidden semi-Markov model have been applied. A deep review has been provided in [60].

In the context of real-time analysis one can consider two approaches: decision “sample by sample” or decision “segment by segment”. In the first case, we can compare the amplitude of the incoming sample with the threshold or we can test if the sample belongs to the same distribution as previous samples. One may search for pre-defined events (zero value, flat line, min/max, outliers detection, etc.) [62]. It is very difficult to apply this approach in our case. In the second-mentioned technique, one may play with more reliable statistical approaches, instead of samples the statistics or the distributions may be compared. We are doing this in our research. We test if the mean value estimated from at least two new samples is different than the historical one.

It should be also mentioned that there is a class of data-mining based solutions, which could be adopted here, for example-time-series clustering, see review [63], anomaly/novelty detection in time series, [64], see review [65,66], process mining [67]. We consider checking this in near future, however, we believe that the proposed techniques are simple and quick and do not require advanced training algorithms as most of the data mining techniques.

As an interesting approach one may find in [68] where historical SCADA data for the normal condition from wind turbines were used to train this multi-layer network model layer-wise to extract the relationships between SCADA variable trained for the healthy case, the model will not be able to reconstruct real data if input will be abnormal. Thus, residual data will be higher, and easy to notice an anomaly. As mentioned it used an advanced neural network-based model rather than simple mean testing as we proposed.

1.2. Structure of the Paper

The paper is organized as follows: in Section 2 we describe the machine, experiment, and the data as well as we give the step by step pre-processing procedure used for the engine oil pressure. In Section 3 we present the statistical methodology applied for the data processing, it consists of the calculation of the distance measures for empirical probability density functions corresponding to work shifts and finally the application of the Student’s t and Wilcoxon statistics for the structure break point detection. In Section 4 we demonstrate the results for the real engine oil pressure data. The last section contains the summary and concludes the paper.

2. Machine, Obds, Experiment and Data Description

2.1. Machine Description

The machine used in the experiment is a Loader LKP-0903 produced by KGHM ZANAM. Basic parameters of the truck are: length 10,600 mm, width 3150 mm, height 1750 mm total weight 28,700 kg, standard bucket 4.6

m^{3}

, tramming capacity 96 kN, power rating 181 kW, driving speed 4.5 km/h (1st gear) up to 19.0 km/h (4th gear). The photo of the machine is shown in Figure 2. An LHD loader is quite a complex machine that consists of several subsystems as the drive unit, transmission system, hydraulic system for lifting the loader bucket, etc. The heart of the machine is the combustion engine. One of the most critical parameters of engine operation is Engine Oil Pressure. For more details see [69]. The loader is used to transport copper ore from the so-called mining face (mining front, where extraction of ore is performed) to the screen (reloading point, from loader to continuous belt conveyor system for transporting bulk material for long distances up to mining shaft and then to the surface). The mining process is complicated and consists of several steps. It implicates the way of the usage of LHD machines. Daily operation is divided into four shifts, but due to several factors (i.e., blasting procedures), the LHD cannot operate continuously. If LHD is not in operation, the monitoring system will produce NaNs (Not a Number values). It may happen that most of the data for a given shift are NaNs. During the weekend LHD is not used, either. All these specific cases lead to dedicated data processing and analysis methodology related to particular data stream from SCADA.

Figure 2. Machine LK3 used in underground mine.

2.2. On Board Diagnostic System Description

To maintain such types of machines a list of critical parameters has been defined to be measured on the machine. The list of parameters is case-depended. In Table A1, see Appendix A, we have present the parameters that are most important for our case study. Parameters are associated with the various components. Their dynamics, variability, etc, are different. Some of the data are sampled with 1 Hz, some others considered as low-frequency variables may be sampled with every 5 s or more. In this paper, we will use oil pressure signal only sampled every 5 s.

The data considered in this paper described one month of LHD operation. We received them from the data server as historical data.

2.3. Experiment Description

The machine considered in the paper is a regular example taken from the mine (LK3 419R machine). The data covers one month of operation. We have selected such a period because we received information that during that period some repair action has been done. Our experiment is a passive one. We just observe the operation of the machine using OBD monitoring data. According to information received from maintenance staff, turbocharger replacement has been done on 14 May 2019. The diagnostic task it to identify the replacement moment based on proposed feature obtained thorough raw pressure data processing. Once again, we would like to highlight that based on observation of raw data we cannot notice this point.

2.4. Data Pre-Processing

Raw engine oil pressure data has been presented in Figure 3 (top). As one can observe the pressure values vary from 0 up to 700 kPa. The engine oil pressure was acquired every 5 s, however, for each day the data acquisition process could start with slightly different time points (exactly at midnight or 1, 2, 3 .. seconds after midnight). The empty spaces indicate the weekends. Thus, only the working days were taken to the analysis. To be able to compare the variability of data during a single shift we re-sampled data to the same time-basis using the linear interpolation. As one may see, it is a cosmetic change—the shape of the signal has not been changed see Figure 3 (bottom).

Figure 3. Engine oil pressure data from the database (top panel) and re-sampled data (bottom panel).

As mentioned, we know that turbocharger has been replaced on 14 of May but it is still difficult to identify a significant change in data on that date. Similar data were analyzed in [21], where the specific representation of the oil pressure was proposed, namely, the authors reshaped the data into the two-dimensional array where the x-axis describes the number of shift or date for a given day, and on the y-axis we have the so-called local shift time (0–6 h) or time corresponding to a single day (0–24 h). In this paper, we propose to apply the local shift time representation of the one-month measurements, see Figure A1 given in Appendix B. As one can see, it is difficult to clearly indicate the structure break point (14th of May) on the map presented in Figure A1. Thus, we propose to seperately analyze also the first, second, third, and fourth shifts. Moreover, to have the same time for each shift we re-sampled the data according to the most frequent time-basis using linear interpolation. The representation of the oil pressure measurements broken down into four sub-sets (corresponding to four shifts) after re-sampling is demonstrated in Figure A2 given in Appendix B.

However, as was mentioned, the monitoring system produces NaNs. If in the analyzed sample a significant amount of data are NaNs, then the analysis and corresponding interpretation may not be reliable. Thus, we examined how much data (in percent) are NaNs and before the further analysis, we removed the work shifts with more than 40% of NaNs. The oil pressure after removing the days with more than 40% NaNs represented as shift by shift is demonstrated in Figure 4. The oil pressure broken down into four shifts after removing the shifts with more than 40% NaNs is presented in Figure 5. The data demonstrated in Figure 4 and Figure 5 are analyzed using the procedures described in the next section. The consecutive steps of the pre-processing scheme described above are demonstrated in Figure 6.

Figure 4. Map of re-sampled data (presented shift by shift) after removing the days with more than 40% of NaNs.

Figure 5. Maps of re-sampled data corresponding to four work shifts after removing the days with more than 40% of NaNs.

Figure 6. The concept of the pre-processing scheme.

3. Methodology

In this section, we present the methodology applied for the real data presented in Section 2. At first, we set the notation used further in the paper. Let

m_{i j}

denote the i-th measurement during the j-th work shift where

i = 1, \dots, N

and

j = 1, \dots, M

, and consequently let

m_{j} = (m_{1 j}, m_{2 j}, \dots, m_{N j})

denote a vector of measurement corresponding to the j-th work shift. Moreover, let

p_{j} (x)

indicate the theoretical probability density function of

m_{j}

. To detect the moment when the character of the data changes, we consider the pdfs corresponding to subsequent work shifts, namely

p_{1} (x), p_{2} (x), \dots, p_{M} (x)

.

In probability theory and statistics, to quantify the similarity of two distributions one can use the so-called divergence (or contrast) functions that measure the distance of one probability distribution to another. In general, the divergence is not a concept as strong as distance, because it does not have to be symmetric in arguments or satisfy the triangle inequality. Among various contrast functions, we distinguish one very important class of divergence coefficients, namely the class of the so-called f-divergences of the following form [70,71,72]

I_{f, g} (p_{k^{*}} (x), p_{k^{* *}} (x)) = g (\int p_{j^{* *}} (x) f (\frac{p_{k^{*}} (x)}{p_{k^{* *}} (x)}) d x),

(1)

where

p_{k^{*}} (x)

,

p_{k^{* *}} (x)

are the probability density functions corresponding to two variables,

f (t)

is a continuous convex real function on

R_{+}

and

g (t)

is an increasing function on

R

. It is important to mention that the f-divergences are always non-negative and they are equal to zero if and only if the densities

p_{k^{*}} (x)

and

p_{k^{* *}} (x)

coincide. For more properties of the divergence functions defined in Equation (1) we refer the readers to [72,73]. Depending on the choice of

f (t)

and

g (t)

one can obtain different forms of the contrast functions. In this paper, we consider three specific measures belonging to the class defined above.

The first measure, called the Hellinger distance, corresponds to the case of

g (t) = \sqrt{0.5 t}

and

f (t) = {(\sqrt{t} - 1)}^{2}

in Equation (1) and it is given by the following formula [72]

H (p_{k^{*}} (x), p_{k^{* *}} (x)) = \sqrt{0.5 \int {(\sqrt{p_{k^{*}} (x)} - \sqrt{p_{k^{* *}} (x)})}^{2} d x} .

(2)

Let us notice that the Hellinger distance is symmetric with respect to the arguments and obeys the triangle inequality. It satisfies the property that

0 \leq H (p_{k^{*}} (x), p_{k^{* *}} (x)) \leq 1

with the minimum value corresponding to the case when

p_{k^{*}} (x) = p_{k^{* *}} (x)

for every

x \in R

, and the maximum value achieved when

p_{k^{*}} (x)

is equal to zero for every x for which

p_{k^{* *}} (x)

is nonzero and vice versa.

As the second divergence measure, we consider a modification of the Hellinger distance defined above. Namely, for

g (t)

being an identity function and

f (t)

defined as in Hellinger case, we obtain the so-called Jeffreys distance of the following form [74]

J (p_{k^{*}} (x), p_{k^{* *}} (x)) = 2 H^{2} (p_{k^{*}} (x), p_{k^{* *}} (x)) = \int {(\sqrt{p_{k^{*}} (x)} - \sqrt{p_{k^{* *}} (x)})}^{2} d x .

(3)

Let us notice that similarly to the Hellinger distance, the Jeffreys measure is symmetric in the arguments. Moreover, it takes values between 0 and 2.

The Chernoff distance, the third example of the divergences conisidered in this paper, corresponds to the case when

g (t) = - log (- t)

and

f (t) = - t^{1 - α}

with

0 < α < 1

in Equation (1) and takes the following form [73,74]

C H (p_{k^{*}} (x), p_{k^{* *}} (x)) = - log (c h (p_{k^{*}} (x), p_{k^{* *}} (x))),

(4)

where

c h (p_{k^{*}} (x), p_{k^{* *}} (x)) = \int {p_{k^{*}} (x)}^{α} {p_{k^{* *}} (x)}^{1 - α} d x,

(5)

is called the Chernoff coefficient. In general, expect the case of

α = 0.5

, the Chernoff distance given in Equation (4) is not symmetric in the arguments nor does it satisfy the triangle inequality. Moreover, let us notice that since

0 \leq c h (p_{k^{*}} (x), p_{k^{* *}} (x)) \leq 1

we have that

0 \leq C H (p_{k^{*}} (x), p_{k^{* *}} (x)) \leq + \infty

with the minimum value corresponding to the instance when

p_{k^{*}} (x) = p_{k^{* *}} (x)

for every

x \in R

. The special case of the Chernoff coefficient given in Equation (5), i.e., the case of

α = 0.5

, is called the Bhattacharyya coefficient related directly to the Hellinger distance given in Equation (2), namely

H (p_{k^{*}} (x), p_{k^{* *}} (x)) = \sqrt{1 - c h (p_{k^{*}} (x), p_{k^{* *}} (x))} .

Let us note that from the practical point of view to calculate the empirical counterparts of the divergences defined in Equations (2)–(4) there is a need to estimate the probability density functions and use them instead of the theoretical pdfs in the above definitions. In our case we will apply the empirical divergences to the measurements corresponding to the work shifts, namely to the vectors

m_{j} = (m_{1 j}, m_{2 j}, \dots, m_{N j})

where

j = 1, \dots, M

. For this purpose, one can use the kernel density estimator of

p_{j} (x)

defined as follows [75,76]

{\hat{p}}_{j} (x) = \frac{1}{N h} \sum_{i = 1}^{N} K (\frac{x - m_{i j}}{h})

(6)

for any

j = 1, \dots, M

and

x \in R

, where

K (\cdot)

is the non-negative kernel smoothing function, and h is the bandwidth. The choice of kernel smoothing function determines the shape of the curve used to estimate the probability density function. In our case, we take the normal kernel that is simply the standard normal pdf of the form

K (x) = \frac{1}{\sqrt{2 π}} exp \{- \frac{x^{2}}{2}\} for x \in R,

(7)

and the bandwidth is chosen using the Silverman’s rule of thumb to be optimal for estimating normal densities [77]. The procedure is implemented in many programming languages and it is available in numerous mathematical packages, e.g., the function “ksdensity” in Matlab.

Since the distribution of the subsequent samples

m_{1}, m_{2}, \dots, m_{M}

changes for certain

j^{*} \in {1, 2, \dots, M}

, we can identify the moment of change by examining the empirical contrast functions describing the similarity of the pdf corresponding to the first work shift and the pdfs corresponding to all the other work shifts in the sample, namely we analyze the following vector

{\hat{I}}_{f, g} (p_{1} (x), p_{1} (x)), {\hat{I}}_{f, g} (p_{1} (x), p_{2} (x)), \dots, {\hat{I}}_{f, g} (p_{1} (x), p_{M} (x)),

where the f-divergence are specified in Equations (2)–(4). We expect that the values taken by the similarity measures in the sub-samples

{\hat{I}}_{f, g} (p_{1} (x), p_{1} (x)), \dots, {\hat{I}}_{f, g} (p_{1} (x), p_{j^{*}} (x))

and

{\hat{I}}_{f, g} (p_{1} (x), p_{j^{*} + 1} (x)), \dots, {\hat{I}}_{f, g} (p_{1} (x), p_{M} (x))

differ significantly. The preliminary analysis of the vector of measurements corresponding to the work shifts indicates the corresponding distributions are different, more precisely, we observe that some characteristics responsible for the location change with respect to time. We refer the reader to the next section for more details. Thus, to determine the structure break point in the data, we decided to apply the test statistics used commonly in the problem of testing if two samples have equal means. However, we apply this methodology not to the raw vector of observations, but to the empirical convergence distance described above. The procedure used to detect the structure break point

j^{*}

in the convergence distances corresponding to the work shifts is described below.

Let us consider M independent random variables

I_{f, g} (p_{1} (x), p_{1} (x))

,

I_{f, g} (p_{1} (x), p_{2} (x))

, …,

I_{f, g} (p_{1} (x), p_{M} (x))

with the cumulative distribution functions denoted as

F_{I, 1}

,

F_{I, 2}

, …,

F_{I, M}

and the following means

μ_{I, 1}, μ_{I, 2}, \dots, μ_{I, M}

. Now, let us examine the location testing problem based on the expected value with the null and alternative hypotheses defined as follows

\begin{matrix} H_{0} : μ_{I, 1} = μ_{I, 2} = \dots = μ_{I, M} \\ H_{1} : \exists M^{*} \in {2, \dots, M - 1} such that μ_{I, 1} = \dots = μ_{I, M^{*}} \neq μ_{I, M^{*} + 1} = \dots = μ_{I, M} . \end{matrix}

Let us notice that the alternative hypothesis can be also presented as

H_{1} = ⋃_{M^{* *} = 2}^{M - 1} H_{1, M^{* *}},

where

M^{* *}

is a fixed value and

H_{1, M^{* *}}

is the alternative hypothesis corresponding to the two-sample test comparing means in two populations, i.e.,

H_{1, M^{* *}} : μ_{1} = μ_{I, 1} = \dots = μ_{I, M^{* *}} and μ_{2} = μ_{I, M^{* *} + 1} = \dots = μ_{I, M} .

Now, to verify

H_{0}

against

H_{1}

we can use the maximum-based statistic of the following form

S_{M} (ϵ) = max_{⌊ ϵ M ⌋ \leq M^{* *} \leq ⌊ (1 - ϵ) M ⌋} |T (M^{* *})|,

(8)

where

T (M^{* *})

can be any statistic for testing

H_{1}

against

H_{1, M^{*}}

and

ϵ \in (0, 0.5)

is used to guarantee that there are at least

⌊ n ϵ ⌋

elements in both sub-samples. Let us notice that the choice of

ϵ

is crucial since a small value allows us to detect the very early or very late change in the distribution, but at the same time, it involves considering a small size sample which hinders the proper statistical inference. Moreover, it is important to mention that the distribution of the max-type statistic given in Equation (8) under

H_{0}

does not depend on the analyzed random sample distribution. In our method, we want to locate the moment of the most significant change in data, i.e., we want to identify the value of

M^{* *}

corresponding to the max-type statistic given in Equation (8).

For our purpose, we use two exemplary statistics from two-sample tests for equal means in independent groups, namely the one corresponding to the parametric Student’s t-test and the one corresponding to the non-parametric Wilcoxon test. For our data, the Student’s t statistic

T_{S} (M^{* *})

takes the following form

T_{S} (M^{* *}) = \frac{(\bar{I_{1}} - \bar{I_{2}})}{I_{s t d} \sqrt{1 / M^{* *} + 1 / (M - M^{* *})}},

(9)

where

I_{s t d}^{2} = \frac{(M^{* *} - 1) I_{1, s t d}^{2} + (M - M^{* *} - 1) I_{2, s t d}^{2}}{M - 2},

and

\bar{I_{1}}

,

\bar{I_{2}}

and

I_{1, s t d}^{2}

,

I_{2, s t d}^{2}

are the empirical means and empirical variances corresponding to the sub-samples

I_{1} = ({\hat{I}}_{f, g} (p_{1} (x), p_{1} (x)), \dots, {\hat{I}}_{f, g} (p_{1} (x), p_{M^{* *}} (x)))

and

I_{1} = ({\hat{I}}_{f, g} (p_{1} (x), p_{M^{* *} + 1} (x)), \dots, {\hat{I}}_{f, g} (p_{1} (x), p_{M} (x))),

respectively. In turn, the Wilcoxon statistic denoted by

T_{W} (M^{* *})

is given by the following formula

T_{W} (M^{* *}) = \frac{W (M^{* *}) - E W (M^{* *})}{\sqrt{Var (W (M^{* *}))}},

(10)

where

W (M^{* *}) = \sum_{i = 1}^{M^{* *}} R_{i}

,

E [W (M^{* *})] = M^{* *} (M + 1) / 2

,

Var [W (M^{* *})] = M^{* *} (M - M^{* *}) (M + 1) / 12

and

R_{i}

denotes the number of elements in a combined vector

I = ({\hat{I}}_{f, g} (p_{1} (x), p_{1} (x)), \dots, {\hat{I}}_{f, g} (p_{1} (x), p_{M} (x)))

which are smaller or equal to

i - t h

element of

I_{1}

. For more information regarding the Student’s t-test and the Wilcoxon test we refer the readers to [78,79,80,81,82,83]. The successive stages used in the data analysis are presented as the diagram in Figure 7. Similar methodology, also based on the above-mentioned tests, is applied by the authors in [84].

Figure 7. The diagram showing the successive stages of the methodology described.

4. Real Data Analysis

In this section, we analyze the real data introduced in Section 2 using the methodology described above. At the same time, we consider the data presented shift by shift in Figure 4, and the data divided into the separate classes corresponding to four work shifts presented in Figure 5. To indicate the difference in the distributions, we calculate the empirical probability density functions for the subsequent columns of the data matrices given in Figure 4 and Figure 5. The corresponding plots are presented in Figure 8 and Figure 9 as the two-dimensional graphs where the probability density functions are evaluated using the kernel density estimation as described in the previous section. As one can notice, the shape of the empirical pdfs changes at a certain point. As a consequence, the empirical pdfs take smaller values with greater probability.

Figure 8. The empirical probability density functions (two-dimensional plot) for the pre-processed data presented shift by shift.

Figure 9. The empirical probability density functions (two-dimensional plot) for the pre-processed data corresponding to four work shifts.

Since in Figure 8 and Figure 9 one can see the difference in the shape of the probability density function, to identify the structure break point we use the characteristics based on the distance between the empirical pdfs, namely the Hellinger distance, the Jeffreys distance and the Chernoff distance with

α = 0.5

introduced in Section 3. At first, we plot the maps presenting the matrices of the values taken by the above characteristics where each two empirical pdfs are compared to each other. For the Hellinger distance, the matrices are given in Figure 10 and Figure 11. As one can see they are symmetric to the diagonal. Besides, the calculated Helligner distance divides the work shifts into two groups, between which the values taken by the distance measure differ significantly. The analogous plots corresponding to the Jeffreys and Chernoff distances are given in Appendix B, see Figure A3, Figure A4, Figure A7 and Figure A8.

Figure 10. The matrix presenting the values of the Hellinger distance calculated based on the empirical probability density functions corresponding to the data presented shift by shift.

Figure 11. The matrices presenting the values of the Hellinger distance calculated based on the empirical probability density functions corresponding to the data representing four work shifts.

Now, we apply the Student’s t-based and Wilcoxon-based procedures leading to the identification of the structure break point in the data to the values representing one row of the Hellinger distance matrices presented in Figure 10 and Figure 11. The selected row corresponds to the Hellinger distance between the subsequent pdfs and the base pdf chosen here as the one corresponding to the first shift in a sample. In other words, we identify the structure break point based on the comparison of each empirical pdf with the empirical pdf of the first shift. The values taken by the distance measure and the identified structure break point are presented in Figure 12 and Figure 13 which correspond to the case of the data presented shift by shift and each work shift treated separately, respectively. As one can see, in most of the cases (except the fourth work shift in Figure 13) the procedures based on the Student’s t-test and the Wilcoxon test are consistent and indicate the same structure break point. We mention here that the shift marked on the plots is the last shift before the change in distribution occurs. This means that the nature of the data changes with subsequent work shifts taking place after the one marked on the plot. The similar graphs corresponding to the Jeffreys and Chernoff distances are presented in Appendix B, see Figure A5, Figure A6, Figure A9 and Figure A10. In Table A2, Table A3, Table A4 and Table A5 we present the summary of the results together with the values taken by the test statistics corresponding to both procedures (Student’s t-based and Wilcoxon-based) and the data presented shift by shift and each work shift treated separately. As one can see, for the data presented shift by shift, both procedures applied to all three considered measures of distance between pdfs indicate that the structure break point occurs after the 1st work shift on 14 May. On this day, from the 3rd work shift the character of the data changes. It is important to mention that the structure break point occurs between the 1st and 3rd work shift on 14 May because during the 2nd work shift the LHD machine was not operating. The same conclusions can be drawn when we analyze the results corresponding to four work shifts treated separately, besides the case of the Wilcoxon-based procedure applied to the data from the fourth work shifts where the structure break point is indicated only after 15 May. It is important to notice that the choice of divergence measure does not cause any change in the final result for the analyzed data. Although the values taken by the calculated distances differ, the overall behaviour of the functions is similar. As a consequence, we obtain exactly the same output for both procedures (based on Students t-test and on Wilcoxon test) applied to the values taken by Hellinger, Jeffreys and Chernoff distances.

Figure 12. The Hellinger distance between the subsequent pdfs and the base pdf chosen as the first one in a sample and the structure break point identified by the methods based on the Student’s t-test and Wilcoxon test for the data presented shift by shift.

Figure 13. The Hellinger distance between the subsequent pdfs and the base pdf chosen as the first one in a sample and the structure break points identified by the methods based on the Student’s t-test and Wilcoxon test corresponding to the data representing four work shifts.

5. Discussion

The developed procedure allows us to transform the one-month data from highly variable to a nearly monotonic feature. It should be said that the behaviour/shape of our feature is different than a bath-curve known from reliability theory. Here, there is a “switch” from good to bad condition. Therefore, using this data we are able just to detect the moment of change, without the possibility to detect the early stage of damage or tracking its development. It is not related to the proposed method, but to specific data.

The a priori knowledge that a repair action has been performed is necessary to validate the results. The detected date of structure break point is the same as the date of maintenance action done in the company. The extracted feature—i.e., the distance between the distributions estimated for each shift is convincing and easy to interpret. From an industrial perspective, it is essential. We believe that it has also appropriate quality from a scientific point of view as we proposed a new method for structure break point detection, i.e., the method for identification of the moment when our feature is changing (that corresponds to the change in the process). This method is based on the Student’s t and Wilcoxon statistics that are commonly applied in testing if two samples have the same means. The base of the procedure is very intuitive, we expect that the distribution of the time series is changing along with the degradation process. Thus, we proposed analyzing the characteristics of the empirical distributions of the data corresponding to the work shifts, i.e., the measures of the distance between empirical probability distribution functions. Using the characteristics, we could obtain the monotonic feature of the data that was further segmented.

The problem considered here is essential from the practical point of view. Even if various SCADA data are increasingly available, practical information for an end-user is still doubtful. It is especially the case for highly variable data acquired from time-varying systems such as mining machines, wind turbines, etc. [85,86,87]. To develop novel diagnostic procedures, historical data are used to assure the “training” process. Often the data are normalized as they depend on operating conditions [85,87]. Unfortunately, we cannot use normalization—engine oil pressure cannot be higher if the machine is more loaded. However, we have noticed that the statistical properties of the process (engine oil pressure variation) are changed when the machine changes the condition. It was the base for our approach.

6. Conclusions

As the conclusion we need to highlight few important issues:

We have proposed a novel multistep procedure that covers pre-processing, statistical analysis, and visualization.
The proposed procedure is the novel one. The innovation is related to the combination of the crucial steps of the proposed methodology, namely, the initial segmentation and the representation as a matrix in the work shift perspective as well as the analysis of the characteristics of the data (probability density functions) and the distance measures based on them. Finally, the last step (segmentation) is performed not for the real data, but for the distance measures of the time series’ pdfs. According to our knowledge, this approach is rarely used in real applications.
The utilization of the distance measures of time series’ pdfs causes that we do not consider here the problem when one parameter of the data changes (like mean or variance). The examined issue is much more general. The analysis of the pdf’s changes causes the algorithm is sensitive to the dynamics of various characteristics of the data, not only the single one.
The proposed approach is a universal one. It can be used for any cycle that corresponds to the considered phenomena (in our case it is a work shift), to any characteristics of the data (in our case it is the probability density function), and to any distance measure applied to the characteristics (in our case there are distance measures based on the probability density functions).
The whole procedure is automatic, thus we believe it could be implemented in the monitoring systems used in the company. When the new data corresponding to the next day (four work shifts) come, then using the introduced procedure we can test if they belong to the current regime. In our case, the new sample means the data corresponding to the next day. This approach is often used in monitoring systems. Thus, in some sense, the methodology can be used in a continuous manner.
The historical data with precise knowledge about replacement has been used for training and validation. Implementation of the proposed method as an automatic data processing procedure should not be a problem for any new machine. Small dataset from a couple of shifts from a new machine (new data set) will be enough to establish the averaged picture of the signature of good condition. If the damage will appear (change of regime), the method will be able to detect it after a few work shifts (min. 2), that is much better than the current situation. Note that a machine with such damage was able to operate for two weeks as there was not a tool to detect the problem.
It should be highlighted, the proposed methodology has also some limitations. One of this is related to the special requirements of the data. More precisely, there is a need to consider data that could be arranged as work shifts (or any other cycles). This influences the identified structure break point corresponds to the work shift, not the real time point (like hour).

Author Contributions

Conceptualization, A.G., A.W., R.Z., N.G.; investigation, A.G., A.W., R.Z.; methodology, A.G., A.W., R.Z.; resources, P.Ś., R.Z., N.G.; software, A.G.; supervision, A.W., R.Z., P.Ś., N.G.; validation, A.G., A.W., R.Z.; visualization, A.G.; writing—original draft, A.G., A.W., R.Z.; writing—Review and editing, A.G., A.W., R.Z., N.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are not available due to non-disclosure agreements.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Tables

Table A1. A list of monitored parameters for LHD machines.

Variable Name	Description
	Temperatures
’ENGCOOLT’	temperature of the cooling liquid of the internal combustion engine
’GROILT’	oil temperature of transmission and torque
’HYDOILT’	hydraulic oil temperature
	Pressures
’ENGOILP’	oil pressure of the internal combustion engine
’BREAKP’	breaking pressure
’GROILP’	transmission oil pressure
’HYDOILP’	pressure in the hydraulic system
	Others
’ENGRPM’	engine speed
’FUELUS’	instant fuel consumption
’SELGEAR’	direction and current gear
’SPEED’	average speed every 1s

Table A2. The identified date corresponding to the last shift before the change in data and the values of test statistics for the data presented shift by shift.

Hellinger Distance
Student’s t	Wilcoxon
1st shift on	1st shift on
14th of May	14th of May
$T (m^{*}) = - 19.76$	$T (m^{*}) = - 7.54$
Jeffreys distance
Student’s t	Wilcoxon
1st shift on	1st shift on
14th of May	14th of May
$T (m^{*}) = - 19.72$	$T (m^{*}) = - 7.54$
Chernoff distance
Student’s t	Wilcoxon
1st shift on	1st shift on
14th of May	14th of May
$T (m^{*}) = - 19.76$	$T (m^{*}) = - 7.53$

Table A3. The identified dates corresponding to the last shifts before the change in data and the values of test statistics corresponding to the data representing four work shifts—Hellinger distance.

Hellinger Distance
First Shift
Student’s t	Wilcoxon
14th of May	14th of May
$T (m^{*}) = - 8.39$	$T (m^{*}) = - 3.83$
Second shift
Student’s t	Wilcoxon
13th of May	13th of May
$T (m^{*}) = - 8.53$	$T (m^{*}) = - 3.20$
Third shift
Student’s t	Wilcoxon
13th of May	13th of May
$T (m^{*}) = - 10.73$	$T (m^{*}) = - 3.97$
Fourth shift
Student’s t	Wilcoxon
14th of May	15th of May
$T (m^{*}) = - 7.83$	$T (m^{*}) = - 3.77$

Table A4. The identified dates corresponding to the last shifts before the change in data and the values of test statistics corresponding to the data representing four work shifts—Jeffreys distance.

Jeffreys Distance
First Shift
Student’s t	Wilcoxon
14th of May	14th of May
$T (m^{*}) = - 11.21$	$T (m^{*}) = - 3.84$
Second shift
Student’s t	Wilcoxon
13th of May	13th of May
$T (m^{*}) = - 11.93$	$T (m^{*}) = - 3.20$
Third shift
Student’s t	Wilcoxon
13th of May	13th of May
$T (m^{*}) = - 14.76$	$T (m^{*}) = - 3.97$
Fourth shift
Student’s t	Wilcoxon
14th of May	15th of May
$T (m^{*}) = - 9.68$	$T (m^{*}) = - 3.77$

Table A5. The identified dates corresponding to the last shifts before the change in data and the values of test statistics corresponding to the data representing four work shifts - Chernoff distance.

Chernoff Distance
First Shift
Student’s t	Wilcoxon
14th of May	14th of May
$T (m^{*}) = - 13.94$	$T (m^{*}) = - 3.84$
Second shift
Student’s t	Wilcoxon
13th of May	13th of May
$T (m^{*}) = - 11.55$	$T (m^{*}) = - 3.20$
Third shift
Student’s t	Wilcoxon
13th of May	13th of May
$T (m^{*}) = - 10.52$	$T (m^{*}) = - 3.97$
Fourth shift
Student’s t	Wilcoxon
14th of May	15th of May
$T (m^{*}) = - 9.68$	$T (m^{*}) = 3.77$

Appendix B. Additional Graphs

Figure A1. Re-sampled data presented as a shift-by-shift map of engine oil pressure values for subsequent days.

Figure A2. Re-sampled data presented as the maps of engine oil pressure values corresponding to four work shifts during a day.

Figure A3. The matrix presenting the values of the Chernoff distance calculated based on the empirical probability density functions corresponding to the data presented shift by shift.

Figure A4. The matrices presenting the values of the Chernoff distance calculated based on the empirical probability density functions corresponding to the data representing four work shifts.

Figure A5. The Chernoff distance between the subsequent pdfs and the base pdf chosen as the first one in a sample and the structure break point identified by the methods based on the Student’s t-test and Wilcoxon test for the data presented shift by shift.

Figure A6. The Chernoff distance between the subsequent pdfs and the base pdf chosen as the first one in a sample and the structure break points identified by the methods based on the Student’s t-test and Wilcoxon test corresponding to the data representing four work shifts.

Figure A7. The matrix presenting the values of the Jeffreys distance calculated based on the empirical probability density functions corresponding to the data presented shift by shift.

Figure A8. The matrices presenting the values of the Jeffreys distance calculated based on the empirical probability density functions corresponding to the data representing four work shifts.

Figure A9. The Jeffreys distance between the subsequent pdfs and the base pdf chosen as the first one in a sample and the structure break point identified by the methods based on the Student’s t-test and Wilcoxon test for the data presented shift by shift.

Figure A10. The Jeffreys distance between the subsequent pdfs and the base pdf chosen as the first one in a sample and the structure break points identified by the methods based on the Student’s t-test and Wilcoxon test corresponding to the data representing four work shifts.

References

Vashistha, S.; Kumar Agrawal, A.; Siddiqui, M.; Chattopadhyaya, S. Reliability and Maintainability Analysis of LHD Loader at Saoner Mines, Nagpur, India. IOP Conf. Ser. Mater. Sci. Eng. 2019, 691, 012013. [Google Scholar] [CrossRef]
Jakkula, B.; Govinda Raj, M.; Murthy, C. Maintenance management of load haul dumper using reliability analysis. J. Qual. Maint. Eng. 2019, 26, 290–310. [Google Scholar] [CrossRef]
Chatterjee, S.; Bandopadhyay, S. Reliability estimation using a genetic algorithm-based artificial neural network: An application to a load-haul-dump machine. Expert Syst. Appl. 2012, 39, 10943–10951. [Google Scholar] [CrossRef]
Dindarloo, S. Reliability forecasting of a load-haul-dump machine: A comparative study of ARIMA and neural networks. Qual. Reliab. Eng. Int. 2016, 32, 1545–1552. [Google Scholar] [CrossRef]
Bala, R.; Govinda, R.; Murthy, C. Reliability analysis and failure rate evaluation of load haul dump machines using Weibull distribution analysis. Math. Model. Eng. Probl. 2018, 5, 116–122. [Google Scholar] [CrossRef]
Paithankar, A.; Chatterjee, S. Forecasting time-to-failure of machine using hybrid Neuro-genetic algorithm–a case study in mining machinery. Int. J. Mining Reclam. Environ. 2018, 32, 182–195. [Google Scholar] [CrossRef]
Balaraju, J.; Govinda Raj, M.; Murthy, C.S.N. Prediction and Assessment of LHD Machine Breakdowns Using Failure Mode Effect Analysis (FMEA). In Reliability, Safety and Hazard Assessment for Risk-Based Technologies; Varde, P.V., Prakash, R.V., Vinod, G., Eds.; Springer: Singapore, 2020; pp. 833–850. [Google Scholar]
Jakkula, B.; Mandela, G.; Chivukula, S. Application ANN Tool for Validation of LHD Machine Performance Characteristics. J. Inst. Eng. (India) Ser. D 2020, 101, 27–38. [Google Scholar] [CrossRef]
Jakkula, B.; Mandela, G.; Chivukula, M. Improvement of overall equipment performance of underground mining machines- a case study. Adv. Model. Anal. A 2018, 79, 6–11. [Google Scholar] [CrossRef]
Jakobs, A. The Sandvik LH621, from hardrock loader to high-performance machine in German salt and potash mining. World Min. Surf. Undergr. 2018, 70, 276–279. [Google Scholar]
Mkhwanazi, D. Optimizing LHD utilization. J. South Afr. Inst. Min. Metall. 2011, 111, 273–280. [Google Scholar]
Krot, P.; Śliwiński, P.; Zimroz, R.; Gomolla, N. The identification of operational cycles in the monitoring systems of underground vehicles. Measurement 2020, 151, 107111. [Google Scholar] [CrossRef]
Mbhalati, W. LHD optimization at an underground chromite mine. J. South Afr. Inst. Min. Metall. 2015, 115, 313–320. [Google Scholar] [CrossRef]
Fukui, R.; Kusaka, K.; Nakao, M.; Kodama, Y.; Uetake, M.; Kawai, K. Production analysis of functionally distributed machines for underground mining. Int. J. Min. Sci. Technol. 2016, 26, 477–485. [Google Scholar] [CrossRef]
Stefaniak, P.; Zimroz, R.; Obuchowski, J.; Śliwiński, P.; Andrzejewski, M. An Effectiveness Indicator for a Mining Loader Based on the Pressure Signal Measured at a Bucket’s Hydraulic Cylinder. Procedia Earth Planet. Sci. 2015, 15, 797–805. [Google Scholar] [CrossRef]
Balaraju, J.; Govinda Raj, M.; Murthy, C. Fuzzy-FMEA risk evaluation approach for LHD machine—A case study. J. Sustain. Min. 2019, 18, 257–268. [Google Scholar] [CrossRef]
Ghodrati, B.; Hoseinie, S.; Kumar, U. Context-driven mean residual life estimation of mining machinery. Int. J. Mining Reclam. Environ. 2018, 32, 486–494. [Google Scholar] [CrossRef]
Laukka, A.; Saari, J.; Ruuska, J.; Juuso, E.; Lahdelma, S. Condition-based monitoring for underground mobile machines. Int. J. Ind. Syst. Eng. 2016, 23, 74–89. [Google Scholar] [CrossRef]
Zimroz, R.; Wodecki, J.; Król, R.; Andrzejewski, M.; Śliwiński, P.; Stefaniak, P. Self-propelled Mining Machine Monitoring System—Data Validation, Processing and Analysis. In Mine Planning and Equipment Selection; Drebenstedt, C., Singhal, R., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 1285–1294. [Google Scholar]
Wodecki, J.; Stefaniak, P.; Michalak, A.; Wyłomańska, A.; Zimroz, R. Technical condition change detection using Anderson-Darling statistic approach for LHD machines—Engine overheating problem. Int. J. Mining Reclam. Environ. 2018, 32, 392–400. [Google Scholar] [CrossRef]
Michalak, A.; Śliwiński, P.; Kaniewski, T.; Wodecki, J.; Stefaniak, P.; Wyłomańska, A.; Zimroz, R. Condition Monitoring for LHD Machines Operating in Underground Mine—Analysis of Long-Term Diagnostic Data. In Proceedings of the 27th International Symposium on Mine Planning and Equipment Selection—MPES 2018; Widzyk-Capehart, E., Hekmat, A., Singhal, R., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 471–480. [Google Scholar]
Stefaniak, P.; Śliwiński, P.; Poczynek, P.; Wyłomańska, A.; Zimroz, R. The Automatic Method of Technical Condition Change Detection for LHD Machines—Engine Coolant Temperature Analysis. In Advances in Condition Monitoring of Machinery in Non-Stationary Operations; Fernandez Del Rincon, A., Viadero Rueda, F., Chaari, F., Zimroz, R., Haddar, M., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 54–63. [Google Scholar]
Paraszczak, J.; Gustafson, A.; Schunnesson, H. Technical and operational aspects of autonomous LHD application in metal mines. Int. J. Mining Reclam. Environ. 2015, 29, 391–403. [Google Scholar]
Gustafson, A.; Paraszczak, J.; Tuleau, J.; Schunnesson, H. Impact of technical and operational factors on effectiveness of automatic load-haul-dump machines. Trans. Institutions Min. Metall. Sect. A Min. Technol. 2017, 126, 185–190. [Google Scholar]
Kaniewski, T.; Śliwiński, P.; Hebda-Sobkowicz, J.; Zimroz, R. Comprehensive, experimental verification of the effects of the lock-up function implementation in LHD haul trucks in the deep underground mine. In Proceedings of the Mining Goes Digital: Proceedings of the 39th International Symposium on Application of Computers and Operations Research in the Mineral Industry (APCOM 2019), Wrocław, Poland, 4–6 June 2019; pp. 506–514. [Google Scholar]
Śliwiński, P.; Kaniewski, T.; Hebda-Sobkowicz, J.; Zimroz, R.; Wyłomańska, A. Analysis of dynamic external loads to haul truck machine subsystems during operation in a deep underground mine. In Proceedings of the Mining Goes Digital: Proceedings of the 39th International Symposium on Application of Computers and Operations Research in the Mineral Industry (APCOM 2019), Wrocław, Poland, 4–6 June 2019; pp. 515–524. [Google Scholar]
Wang, Y.; Jin, T.; Liu, L. Output torque prediction of hybrid underground LHD motor based on least square support vector machine. Meitan Xuebao J. China Coal Soc. 2017, 42, 619–625. [Google Scholar]
Saari, J.; Odelius, J. Detecting operation regimes using unsupervised clustering with infected group labelling to improve machine diagnostics and prognostics. Oper. Res. Perspect. 2018, 5, 232–244. [Google Scholar] [CrossRef]
Wyłomańska, A.; Zimroz, R. Signal segmentation for operational regimes detection of heavy duty mining mobile machines-a statistical approach. Diagnostyka 2014, 15, 33–42. [Google Scholar]
Wodecki, J.; Michalak, A.; Stefaniak, P. Review of smoothing methods for enhancement of noisy data from heavy-duty LHD mining machines. E3S Web Conf. 2018, 29, 00011. [Google Scholar] [CrossRef]
Stefaniak, P.K.; Zimroz, R.; Śliwiński, P.; Andrzejewski, M.; Wyłomańska, A. Multidimensional Signal Analysis for Technical Condition, Operation and Performance Understanding of Heavy Duty Mining Machines. In Advances in Condition Monitoring of Machinery in Non-Stationary Operations; Chaari, F., Zimroz, R., Bartelmus, W., Haddar, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 197–210. [Google Scholar]
Śliwiński, P.; Andrzejewski, M.; Kaniewski, T.; Hebda-Sobkowicz, J.; Zimroz, R. Selection of variables acquired by the on-board monitoring system to determine operational cycles for haul truck vehicle. In Proceedings of the Mining Goes Digital: Proceedings of the 39th International Symposium on Application of Computers and Operations Research in the Mineral Industry (APCOM 2019), Wrocław, Poland, 4–6 June 2019; pp. 525–533. [Google Scholar]
Kucharczyk, D.; Wyłomańska, A.; Zimroz, R. Structural break detection method based on the Adaptive Regression Splines technique. Physica A 2017, 471, 499–511. [Google Scholar] [CrossRef]
Obuchowski, J.; Wyłomańska, A.; Zimroz, R. The local maxima method for enhancement of time-frequency map and its application to local damage detection in rotating machines. Mech. Syst. Signal Process. 2014, 46, 389–405. [Google Scholar] [CrossRef]
Andreao, R.V.; Dorizzi, B.; Boudy, J. ECG signal analysis through hidden Markov models. IEEE Trans. Biomed. Eng. 2006, 53, 1541–1549. [Google Scholar] [CrossRef] [PubMed]
Azami, H.; Mohammadi, K.; Bozorgtabar, B. An Improved Signal Segmentation Using Moving Average and Savitzky-Golay Filter. J. Signal Inf. Process. 2012, 3, 39–44. [Google Scholar]
Bhagavatula, C.; Jaech, A.; Savvides, M.; Bhagavatula, V.; Friedman, R.; Blue, R.; O Griofa, M. Automatic segmentation of cardiosynchronous waveforms using cepstral analysis and continuous wavelet transforms. In Proceedings of the 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 2045–2048. [Google Scholar]
Choi, S.; Jiang, Z. Comparison of envelope extraction algorithms for cardiac sound signal segmentation. Expert Syst. Appl. 2008, 34, 1056–1069. [Google Scholar] [CrossRef]
Micó, P.; Mora, M.; Cuesta-Frau, D.; Aboy, M. Automatic segmentation of long-term ECG signals corrupted with broadband noise based on sample entropy. Comput. Methods Programs Biomed. 2010, 98, 118–129. [Google Scholar] [CrossRef]
Khanagha, V.; Daoudi, K.; Pont, O.; Yahia, H. Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Process. 2014, 35, 86–94. [Google Scholar] [CrossRef]
Lovell, B.; Boashash, B. Segmentation of non-stationary signals with applications. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, New York, NY, USA, 11–14 April 1988; Volume 5, pp. 2685–2688. [Google Scholar]
Makowski, R.; Hossa, R. Automatic speech signal segmentation based on the innovation adaptive filter. Int. J. Appl. Math. Comput. Sci. 2014, 24, 259–270. [Google Scholar] [CrossRef]
Janczura, J.; Weron, R. Goodness-of-fit testing for the marginal distribution of regime-switching models with an application to electricity spot prices. AStA Adv. Stat. Anal. 2013, 97, 239–270. [Google Scholar] [CrossRef]
Janczura, J. Pricing electricity derivatives within a Markov regime-switching model: A risk premium approach. Math. Methods Oper. Res. 2014, 79, 1–30. [Google Scholar] [CrossRef]
Chen, C. On a segmentation algorithm for seismic signal analysis. Geoexploration 1984, 23, 35–40. [Google Scholar] [CrossRef]
Gaby, J.E.; Anderson, K.R. Hierarchical segmentation of seismic waveforms using affinity. Geoexploration 1984, 23, 1–16. [Google Scholar] [CrossRef]
Kucharczyk, D.; Wyłomańska, A.; Obuchowski, J.; Zimroz, R.; Madziarz, M. Stochastic Modelling as a Tool for Seismic Signals Segmentation. Shock Vib. 2016, 2016, 1–13. [Google Scholar] [CrossRef]
Popescu, T.D. Signal segmentation using changing regression models with application in seismic engineering. Digit. Signal Process. 2014, 24, 14–26. [Google Scholar] [CrossRef]
Sokołowski, J.; Obuchowski, J.; Zimroz, R.; Wyłomańska, A.; Koziarz, E. Algorithm Indicating Moment of P-Wave Arrival Based on Second-Moment Characteristic. Shock Vib. 2016, 2016, 1–6. [Google Scholar] [CrossRef]
Gajda, J.; Sikora, G.; Wyłomańska, A. Regime Variance Testing—A Quantile Approach. Acta Phys. Pol. B Proc. Suppl. 2013, 44, 1015–1035. [Google Scholar] [CrossRef]
Makowski, R.; Zimroz, R. New techniques of local damage detection in machinery based on stochastic modelling using adaptive Schur filter. Appl. Acoust. 2014, 77, 130–137. [Google Scholar] [CrossRef]
Makowski, R.; Zimroz, R. A procedure for weighted summation of the derivatives of reflection coefficients in adaptive Schur filter with application to fault detection in rolling element bearings. Mech. Syst. Signal Process. 2013, 38, 65–77. [Google Scholar] [CrossRef]
Tsay, R.S. Outliers, level shifts, and variance changes in time series. J. Forecast. 1988, 7, 1–20. [Google Scholar] [CrossRef]
Urbanek, J.; Barszcz, T.; Zimroz, R.; Antoni, J. Application of averaged instantaneous power spectrum for diagnostics of machinery operating under non-stationary operational conditions. Measurement 2012, 45, 1782–1791. [Google Scholar] [CrossRef]
Lanoiselée, Y.; Grebenkov, D. Unraveling intermittent features in single-particle trajectories by a local convex hull method. Phys. Rev. E 2017, 96, 022144. [Google Scholar] [CrossRef]
Wagner, T.; Kroll, A.; Haramagatti, C.R.; Lipinski, H.G.; Wiemann, M. Classification and Segmentation of Nanoparticle Diffusion Trajectories in Cellular Micro Environments. PLoS ONE 2017, 12, 1–20. [Google Scholar]
Akimoto, T.; Yamamoto, E. Detection of transition times from single-particle-tracking trajectories. Phys. Rev. E 2017, 96, 052138. [Google Scholar] [CrossRef]
Sikora, G.; Wyłomańska, A.; Krapf, D. Recurrence statistics for anomalous diffusion regime change detection. Comput. Stat. Data Anal. 2018, 128, 380–394. [Google Scholar] [CrossRef]
Sikora, G.; Wyłomańska, A.; Gajda, J.; Solé, L.; Akin, E.; Tamkun, M.; Krapf, D. Elucidating distinct ion channel populations on the surface of hippocampal neurons via single-particle tracking recurrence analysis. Phys. Rev. E 2017, 96, 062404. [Google Scholar] [CrossRef]
Li, W.; Li, H.; Gu, S.; Chen, T. Process fault diagnosis with model- and knowledge-based approaches: Advances and opportunities. Control. Eng. Pract. 2020, 105, 104637. [Google Scholar] [CrossRef]
Qin, S. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control. 2012, 36, 220–234. [Google Scholar] [CrossRef]
Yan, Y.; Li, J.; Gao, D. Condition parameter modeling for anomaly detection in wind turbines. Energies 2014, 7, 3104–3120. [Google Scholar] [CrossRef]
Aghabozorgi, S.; Seyed Shirkhorshidi, A.; Ying Wah, T. Time-series clustering—A decade review. Inf. Syst. 2015, 53, 16–38. [Google Scholar] [CrossRef]
Branisavljević, N.; Kapelan, Z.; Prodanović, D. Improved real-time data anomaly detection using context classification. J. Hydroinform. 2011, 13, 307–323. [Google Scholar] [CrossRef]
Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
Markou, M.; Singh, S. Novelty detection: A review—Part 1: Statistical approaches. Signal Process. 2003, 83, 2481–2497. [Google Scholar] [CrossRef]
Myers, D.; Suriadi, S.; Radke, K.; Foo, E. Anomaly detection for industrial control systems using process mining. Comput. Secur. 2018, 78, 103–125. [Google Scholar] [CrossRef]
Zhao, H.; Liu, H.; Hu, W.; Yan, X. Anomaly detection and fault analysis of wind turbine components based on deep learning network. Renew. Energy 2018, 127, 825–834. [Google Scholar] [CrossRef]
Available online: https://www.kghmzanam.com/en/kategoria/mining-machinery/loaders/ (accessed on 27 July 2021).
Csiszár, I. Information-Type Measures of Difference of Probability Distributions and Indirect Observations. Stud. Sci. Math. Hung. 1967, 2, 299–318. [Google Scholar]
Csiszár, I. I-Divergence Geometry of Probability Distributions and Minimization Problem. Ann. Probab. 1975, 3, 146–158. [Google Scholar] [CrossRef]
Basseville, M. Distance measures for signal processing and pattern recognition. Signal Process. 1989, 18, 349–369. [Google Scholar] [CrossRef]
Basseville, M. Divergence measures for statistical data processing - An annotated bibliography. Signal Process. 2013, 93, 621–633. [Google Scholar] [CrossRef]
Chung, J.; Kannappan, P.; Ng, C.; Sahoo, P. Measures of distance between probability distributions. J. Math. Anal. Appl. 1989, 138, 280–292. [Google Scholar] [CrossRef]
Hill, P.D. Kernel estimation of a distribution function. Commun. Stat. Theory Methods 1985, 14, 605–620. [Google Scholar]
Bowman, A.W.; Azzalini, A. Applied Smoothing Techniques for Data Analysis; Oxford University Press Inc.: New York, NY, USA, 1997. [Google Scholar]
Silverman, B. Density Estimation: For Statistics and Data Analysis; Chapman & Hall: London, UK, 1986. [Google Scholar]
Blair, R.C.; Higgins, J.J. A Comparison of the Power of Wilcoxon’s Rank-Sum Statistic to that of Student’st Statistic Under Various Nonnormal Distributions. J. Educ. Stat. 1980, 5, 309–335. [Google Scholar] [CrossRef]
Rice, J.A. Mathematical Statistics and Data Analysis, 3rd ed.; Duxbury Press: Belmont, CA, USA, 2006. [Google Scholar]
Hogg, R.; Craig, A. Introduction to Mathematical Statistics, 4th ed.; Macmillan: New York, NY, USA, 1978. [Google Scholar]
Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2010. [Google Scholar]
Fay, M.P.; Proschan, M.A. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Stat. Surv. 2010, 4, 1–39. [Google Scholar] [CrossRef] [PubMed]
Conover, W.J. Practical Nonparametric Statistics, 2nd ed.; Wiley: New York, NY, USA, 1980; pp. 225–226. [Google Scholar]
Grzesiek, A.; Zimroz, R.; Śliwiński, P.; Gomolla, N.; Wyłomańska, A. Long term belt conveyor gearbox temperature data analysis—Statistical tests for anomaly detection. Measurement 2020, 165, 108124. [Google Scholar] [CrossRef]
Urbanek, J.; Barszcz, T.; Straczkiewicz, M.; Jablonski, A. Normalization of vibration signals generated under highly varying speed and load with application to signal separation. Mech. Syst. Signal Process. 2017, 82, 13–31. [Google Scholar] [CrossRef]
Schmidt, S.; Heyns, P.; Gryllias, K. A methodology using the spectral coherence and healthy historical data to perform gearbox fault diagnosis under varying operating conditions. Appl. Acoust. 2020, 158, 107038. [Google Scholar] [CrossRef]
Schmidt, S.; Heyns, P. Normalisation of the amplitude modulation caused by time-varying operating conditions for condition monitoring. Measurement 2020, 149, 106964. [Google Scholar] [CrossRef]

Figure 1. The general concept of the proposed diagnostic procedure. The steps highlighted in the schematic blocks are described in detail in the following sections of the paper.

Figure 2. Machine LK3 used in underground mine.

Figure 3. Engine oil pressure data from the database (top panel) and re-sampled data (bottom panel).

Figure 4. Map of re-sampled data (presented shift by shift) after removing the days with more than 40% of NaNs.

Figure 5. Maps of re-sampled data corresponding to four work shifts after removing the days with more than 40% of NaNs.

Figure 6. The concept of the pre-processing scheme.

Figure 7. The diagram showing the successive stages of the methodology described.

Figure 8. The empirical probability density functions (two-dimensional plot) for the pre-processed data presented shift by shift.

Figure 9. The empirical probability density functions (two-dimensional plot) for the pre-processed data corresponding to four work shifts.

Figure 10. The matrix presenting the values of the Hellinger distance calculated based on the empirical probability density functions corresponding to the data presented shift by shift.

Figure 11. The matrices presenting the values of the Hellinger distance calculated based on the empirical probability density functions corresponding to the data representing four work shifts.

Figure 12. The Hellinger distance between the subsequent pdfs and the base pdf chosen as the first one in a sample and the structure break point identified by the methods based on the Student’s t-test and Wilcoxon test for the data presented shift by shift.

Figure 13. The Hellinger distance between the subsequent pdfs and the base pdf chosen as the first one in a sample and the structure break points identified by the methods based on the Student’s t-test and Wilcoxon test corresponding to the data representing four work shifts.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Method for Structure Breaking Point Detection in Engine Oil Pressure Data

Abstract

1. Introduction

1.1. Brief State of the Art

1.2. Structure of the Paper

2. Machine, Obds, Experiment and Data Description

2.1. Machine Description

2.2. On Board Diagnostic System Description

2.3. Experiment Description

2.4. Data Pre-Processing

3. Methodology

4. Real Data Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Tables

Appendix B. Additional Graphs

References

Article Metrics

Citations

Article Access Statistics