A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering
Abstract
:1. Introduction
2. Data Processing
3. Methodology
3.1. Word2vec Moving Distance Model
3.2. Clustering Algorithm for Failure Type
- There are points with a lower density than the clustering center
- These points are less distant from the cluster center than other cluster centers.
3.3. Failure Sequence Mining Algorithm—PrefixSpan
- (1)
- Obtain a sequence pattern of length 1. Scan S once to find all sequence patterns of length 1 in the sequence. They are <a>: 4, <b>: 4, <c>: 4, <d>: 3, <f>: 3. “<mode>: Count” indicates the mode and its support count.
- (2)
- Divide the search space. The complete set of sequence patterns can be divided into the following six subsets based on six prefixes: The prefixes are subsets of <a>, <b>, <c>, <d>, <e> and <f> respectively.
- (3)
- Find a subset of the sequence patterns. The subset of the sequence patterns mentioned in step 2 can be mined by constructing a corresponding projection database and mining each one recursively.
3.4. Bayesian Failure Network Model
4. Results and Discussion
- 1062th: Transmitter failure;
- 1108th: The station received no signal;
- 1743th: The ground speed indicator is extremely small (0021) and does not move;
- 3128th: One starter generator starts overloaded and the signal light is on;
- 3145th: Engine internal grease;
- 3693th: “Land” position cannot be achieved due to high pressure.
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Choi, J.I.; Hasheminia, S.M.; Chun, H.J.; Park, J.C.; Chang, H.S. Failure Load Prediction of Composite Bolted Joint with Clamping Force. Compos. Struct. 2018, 189, 247–255. [Google Scholar] [CrossRef]
- Valis, D.; Zak, L. Contribution to prediction of soft and hard failure occurrence in combustion engine using oil tribodiagnostic data. Eng. Fail. Anal. 2017, 82, 583–598. [Google Scholar] [CrossRef]
- Abu-Samah, A.; Shahzad, M.K.; Zamai, E. Bayesian based Methodology for the Extraction and Validation of Time Bound Failure Signatures for online failure prediction. Reliab. Eng. Syst. Saf. 2017, 167, 616–628. [Google Scholar] [CrossRef]
- Mdhaffar, A.; Bouassida, I.R.; Charfi, K.; Abid, L.; Freisleben, B. CEP4HFP: Complex Event Processing for Heart Failure Prediction. IEEE Trans. Nanobiosci. 2018, 16, 708–717. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.; Cho, J.; Kang, C.; Choi, S. Study on prediction for a film success using text mining. J. Korean Data Inf. Sci. Soc. 2015, 26, 1259–1269. [Google Scholar] [Green Version]
- Kim, D.; Koo, M.W. Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec. J. KIISE 2017, 44, 742–747. [Google Scholar] [CrossRef]
- Hu, K.; Wu, H.; Qi, K.; Yu, J.; Yang, S.; Yu, T. A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model. Scientometrics 2018, 114, 1031–1068. [Google Scholar] [CrossRef]
- Park, S.S.; Lee, K.C. Effective Korean sentiment classification method using word2vec and ensemble classifier. J. Dig. Contents Soc. 2018, 19, 133–140. [Google Scholar]
- Zhu, Y.; Yan, E.; Wang, F. Semantic relatedness and similarity of biomedical terms: Examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Med. Inf. Decis. Mak. 2018, 17, 95. [Google Scholar] [CrossRef]
- Zhao, R.; Wang, J. Visualizing the research on pervasive and ubiquitous computing. Scientometrics 2010, 86, 593–612. [Google Scholar] [CrossRef]
- Jain, A.K. Data Clustering: 50 Years Beyond K-means. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5211, pp. 3–4. [Google Scholar]
- Hee-Chang, P.; Jee-Hyun, R.; Sung-Yong, L. Clustering Algorithm by Grid-based Sampling. J. Korean Data Inf. Sci. Soc. 2003, 14, 535–543. [Google Scholar]
- Mirzaei, A.; Rahmati, M. A Novel Hierarchical-Clustering-Combination Scheme Based on Fuzzy-Similarity Relations. IEEE Trans. Fuzzy Syst. 2010, 18, 27–39. [Google Scholar] [CrossRef]
- Jing, X.; Qiongqiong, X.; Chuanli, W. Performance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model. PLoS ONE 2016, 11, e0161112. [Google Scholar]
- Jianyun, L.; Qingsheng, Z. An Effective Algorithm Based on Density Clustering Framework. IEEE Access 2017, 5, 4991–5000. [Google Scholar]
- Rodriguez, A.; Alessandro, L. Clustering by fast search and find of density peaks. Science 2014, 344, 1492–1496. [Google Scholar] [CrossRef] [PubMed]
- Yang, S.; Han, R.; Wolfram, D.; Zhao, Y. Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis. J. Informetrics 2016, 10, 132–150. [Google Scholar] [CrossRef]
- Maylawati, D.S.; Irfan, M.; Zulfikar, W.B. Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text. J. Phys. Conf. Ser. 2017, 801, 012067. [Google Scholar] [Green Version]
- Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef] [Green Version]
- Newman, M.E. The mathematics of networks. In The New Palgrave Dictionary of Economics; Palgrave Macmillan: London, UK, 2008; pp. 1–12. [Google Scholar]
- Zhao, Y.; Wen, J.; Xiao, F.; Yang, X.; Wang, S. Diagnostic Bayesian networks for diagnosing air handling units faults—Part I: Faults in dampers, fans, filters and sensors. Appl. Therm. Eng. 2017, 111, 1272–1286. [Google Scholar] [CrossRef]
- Chen, C.; Zhang, G.; Tarefder, R.; Ma, J.; Wei, H.; Guan, H. A multinomial logit model-Bayesian network hybrid approach for driver injury severity analyses in rear-end crashes. Accid. Anal. Prev. 2015, 80, 76–88. [Google Scholar] [CrossRef]
- Cai, B.; Liu, Y.; Fan, Q.; Zhang, Y.; Liu, Z.; Yu, S. Multi-source information fusion based fault diagnosis of ground-source heat pump using Bayesian network. Appl. Energy 2014, 114, 1–9. [Google Scholar] [CrossRef]
Segmentation System | User-Defined Dictionary | POS Tagging | Keywords Extraction | Support Traditional Chinese | Support UTF-8 | New Word Recognition |
---|---|---|---|---|---|---|
jieba | √ | √ | √ | √ | √ | √ |
Chinese Academy of Sciences | √ | √ | × | √ | √ | × |
smallseg | √ | × | × | √ | × | √ |
snailseg | × | × | × | √ | × | × |
Failure Text |
---|
motor/gasket/clutch/failure |
switch/cargo hold/gate/chrome plating/aluminum layer/phase grinding/seepage/piston rod/deviation/center |
temperature/seepage/unknown |
radio station/sound/line/short circuit |
Sequence ID | Sequence |
---|---|
1 | <a(abc)(ac)d(cf)> |
2 | <(ad)c(bc)(ae)> |
3 | <(ef)(ab)(df)cb> |
4 | <e(af)cbc> |
Prefix | Projection Database | Sequence mode |
---|---|---|
<a> | <(abc)(ac)d(cf)> <(_d)c(bc)(ae)> <(_b)(df)eb> <(_f)cbc> | <a> <aa> <ab> <a(bc)> <a(bc)a> <aba> <abc> <(ab)> <(ab)c> <(ab)d> <(ab)f> <(ab)dc> <ac> <aca> <acb> <acc> <ad> <adc> <af> |
<b> | <(_c)(ac)d(cf)> <(_c)(ae)> <(df)cb> <c> | <b> <ba> <bc> <(bc)> <(bc)a> <bd> <bdc> <bf> |
<c> | <(ac)d(cf)> <(bc)(ae)> <b> <bc> | <c> <ca> <cb> <cc> |
<d> | <(cf)> <c(bc)ae> | <d> <db> <dc> <dcb> |
<e> | <(_f)cb> <(_f)(ab)(df)cb> <(af)cbc> | <e> <ea> <eab> <eac> <eacb> <eb> <ebc> <ec> <ecb> <ef> <efb> <efc> <efcb> |
<f> | <(ab)(df)cb> <cbc> | <f> <fb> <fbc> <fc> <fcb> |
Number | Distance | Number | Distance |
---|---|---|---|
R11 | 0.000 | R31 | 2.044 |
R12 | 16.713 | R32 | 1.897 |
R13 | 2.044 | R33 | 0.000 |
R14 | 1.720 | R34 | 2.003 |
R15 | 1.449 | R35 | 1.843 |
R16 | 4.003 | R36 | 12.083 |
R21 | 1.616 | R37 | 1.973 |
R22 | 0.000 | R38 | 17.443 |
R23 | 2.660 | R39 | 3.243 |
R24 | 2.188 | R310 | 10.657 |
R25 | 8.403 | R41 | 1.720 |
Number | Cluster Center | Total |
---|---|---|
1 | 1062 | 95 |
2 | 1108 | 1464 |
3 | 1743 | 60 |
4 | 3128 | 31 |
5 | 3145 | 35 |
6 | 3693 | 2042 |
Raw Data | 1062 | 1108 | 1743 | 3128 | 3145 | 3693 |
---|---|---|---|---|---|---|
Representation | a | b | c | d | e | f |
Number | Failure | Frequency | |
---|---|---|---|
1 | d | c | 16 |
2 | f | d | 8 |
3 | f | b | 26 |
4 | a | c | 25 |
5 | a | b | 10 |
6 | c | b | 10 |
7 | e | a | 15 |
8 | e | c | 4 |
9 | e | b | 12 |
Item | Probability |
---|---|
P(e) | 0.00939 |
P(f) | 0.54789 |
P(a = T|e = T) | 0.42857 |
P(a = T|e = F) | 0.02146 |
P(d = T|f = T) | 0.00392 |
P(d = T|f = F) | 0.00617 |
Item | Probability | ||
---|---|---|---|
d | a | e | P(c|d,a,e) |
T | T | T | 0.89357 |
F | T | T | 0.37744 |
F | F | T | 0.11429 |
F | T | F | 0.26316 |
T | F | T | 0.63041 |
T | F | F | 0.51613 |
T | T | F | 0.77929 |
F | F | F | 0.00107 |
Item | Probability | |||
---|---|---|---|---|
e | a | c | f | P(b|e,a,c,f) |
T | T | T | T | 0.62752 |
F | T | T | T | 0.28466 |
T | F | T | T | 0.52226 |
F | F | T | T | 0.17940 |
T | T | F | T | 0.46085 |
F | T | F | T | 0.11800 |
T | F | F | T | 0.35559 |
F | F | F | T | 0.01273 |
T | T | T | F | 0.61479 |
F | T | T | F | 0.27193 |
T | F | T | F | 0.50952 |
T | F | F | F | 0.34286 |
T | T | F | F | 0.44812 |
F | T | F | F | 0.10526 |
F | F | T | F | 0.16667 |
F | F | F | F | 0.37725 |
Item | Actual Value | Predictive Value |
---|---|---|
a = T|e = T | 8 | 6 |
a = T|e = F | 27 | 28 |
d = T|f = T | 4 | 3 |
d = T|f = F | 16 | 10 |
c|d = T, a = T, e = T | 22 | 21 |
c|d = F, a = T, e = T | 14 | 8 |
c|d = F, a = F, e = T | 2 | 1 |
c|d = F, a = T, e = F | 12 | 2 |
c|d = T, a = F, e = T | 10 | 11 |
c|d = T, a = F, e = F | 8 | 2 |
c|d = T, a = T, e = F | 20 | 7 |
c|d = F, a = F, e = F | 4 | 2 |
b|e = T, a = T, c = T, f = T | 29 | 33 |
b|e = F, a = T, c = T, f = T | 23 | 26 |
b|e = T, a = F, c = T, f = T | 11 | 7 |
b|e = F, a = F, c = T, f = T | 18 | 16 |
b|e = T, a = T, c = F, f = T | 24 | 22 |
b|e = F, a = T, c = F, f = T | 18 | 16 |
b|e = T, a = F, c = F, f = T | 19 | 12 |
b|e = F, a = F, c = F, f = T | 13 | 16 |
b|e = T, a = T, c = T, f = F | 16 | 18 |
b|e = F, a = T, c = T, f = F | 10 | 9 |
b|e = T, a = F, c = T, f = F | 11 | 12 |
b|e = T, a = F, c = F, f = F | 6 | 9 |
b|e = T, a = T, c = F, f = F | 11 | 16 |
b|e = F, a = T, c = F, f = F | 5 | 5 |
b|e = F, a = F, c = T, f = F | 5 | 6 |
b|e = F, a = F, c = F, f = F | 582 | 622 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chang, W.; Xu, Z.; You, M.; Zhou, S.; Xiao, Y.; Cheng, Y. A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering. Entropy 2018, 20, 923. https://doi.org/10.3390/e20120923
Chang W, Xu Z, You M, Zhou S, Xiao Y, Cheng Y. A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering. Entropy. 2018; 20(12):923. https://doi.org/10.3390/e20120923
Chicago/Turabian StyleChang, Wenbing, Zhenzhong Xu, Meng You, Shenghan Zhou, Yiyong Xiao, and Yang Cheng. 2018. "A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering" Entropy 20, no. 12: 923. https://doi.org/10.3390/e20120923
APA StyleChang, W., Xu, Z., You, M., Zhou, S., Xiao, Y., & Cheng, Y. (2018). A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering. Entropy, 20(12), 923. https://doi.org/10.3390/e20120923