Applying Fuzzy Inference and Machine Learning Methods for Prediction with a Small Dataset: A Case Study for Predicting the Consequences of Oil Spills on a Ground Environment
Abstract
:1. Introduction
- we propose applying ANFIS to predict the consequences of oil spills on ground environments when small datasets are present;
- we propose using synthetic data generated from the small datasets of real oil spills by applying the mathematical formalism (described in [24]) of oil spill penetration into groundwater;
- we propose an approach that combines several ML techniques and FIS for predicting the consequences of oil spills on ground environments when small datasets are present;
- we propose a conceptual architecture for the implementation of the introduced approach for predicting the consequences of oil spills on ground environments when small datasets are present;
- we use a practical problem for predicting the consequences of oil spills to investigate the performance of various ML techniques and FIS. We demonstrate that the proposed ANFIS-based approach has the best prediction performance results.
2. Background
3. Related Works
4. Method
4.1. The Mathematical Model of Oil Spills and Their Penetration into the Ground
4.2. Generating the Synthetic Data Using the Mathematical Model
4.3. Selecting the Main Initial Variables for Prediction
4.4. ML Methods Used for Prediction
4.5. Predicting Oil Spill Consequences Using ANFIS
- Fuzzification, during which crisp inputs are fuzzified using Gaussian MF [78] as follows (Equation (19)):
- Evaluating the strength of rules, where for each node the strength, , is provided by multiplication (Equation (20)):
- Normalisation, during which the rule strengths, , are normalized () (Equation (21)):
- Obtaining an output, fi, where the rules Ri (Equation (18)) are applied.
- Obtaining global model response f using Equation (22):
4.6. The Conceptual Architecture for the Implementation of the Proposed Method
5. Experimenting and Results
5.1. Application of ML Methods
5.2. ANFIS Results
5.3. Evaluation of Results
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, Z.; Yao, H.; Ma, F. Learning with Small Data. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; Association for Computing Machinery: New York, NY, USA, 2020. [Google Scholar]
- Papageorgiou, E.I.; Aggelopoulou, K.; Gemtos, T.A.; Nanos, G.D. Development and evaluation of a fuzzy inference system and a neuro-fuzzy inference system for grading apple quality. Appl. Artif. Intell. 2018, 32, 253–280. [Google Scholar] [CrossRef]
- Azmy, S.B.; Sneineh, R.A.; Zorba, N.; Hassanein, H.S. Small data in IoT: An MCS perspective. In Performability in Internet of Things; Springer: Cham, Switzerland, 2019; pp. 209–229. [Google Scholar]
- Sabay, A.; Harris, L.; Bejugama, V.; Jaceldo-Siegl, K. Overcoming Small Data Limitations in Heart Disease Prediction by Using Surrogate Data. SMU Data Sci. Rev. 2018, 1, 12. [Google Scholar]
- Chen, R.J.; Lu, M.Y.; Chen, T.Y.; Williamson, D.F.; Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 2021, 5, 493–497. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Chen, J.; Lu, H. Predicting Future Event via Small Data (e.g., 4 Data) by ASF and Curve Fitting Methods. In Proceedings of the 11th International Conference on ICICIP IEEE 2021, Dali, China, 3–7 December 2021. [Google Scholar]
- Suwa, M.; Watanabe, Y.; Ikeda, H.; Matsuoka, H.; Suzuki, T.A. Model for Predicting River Flooding Using Relatively Small Data Sets. AGU Fall Meet. Abstr. 2018, 17, H43J-2603. [Google Scholar]
- Burmakova, A.; Kalibatienė, D. Machine learning vs fuzzy inference methods for predicting the oil spill consequences with small data sets. In Proceedings of the Data Analysis Methods for Software Systems, Druskininkai, Lithuania, 2–4 December 2021. [Google Scholar]
- Mohammadiun, S.; Hu, G.; Gharahbagh, A.A.; Li, J.; Hewage, K.; Sadiq, R. Evaluation of machine learning techniques to select marine oil spill response methods under small-sized dataset conditions. J. Hazard. Mater. 2022, 436, 129282. [Google Scholar] [CrossRef] [PubMed]
- Kamath, C.; Fan, Y.J. Regression with small data sets: A case study using code surrogates in additive manufacturing. Knowl. Inf. Syst. 2018, 57, 475–493. [Google Scholar] [CrossRef]
- Sakizadeh, M.; Rahmatinia, H. Statistical learning methods for classification and prediction of groundwater quality using a small data record. Int. J. Agric. Environ. Inf. Syst. (IJAEIS) 2017, 8, 37–53. [Google Scholar] [CrossRef]
- Zhao, L.; Shang, Z.; Qin, A.; Tang, Y.Y. Siamese Dense Neural Network for Software Defect Prediction with Small Data. IEEE Access 2019, 7, 7663–7677. [Google Scholar] [CrossRef]
- Kalibatiene, D.; Burmakova, A. Fuzzy Model for Predicting Contamination of the Geological Environment during an Accidental Oil Spill. IJFS Int. J. Fuzzy Syst. 2021, 24, 425–439. [Google Scholar] [CrossRef]
- Jiao, Z.; Jia, G.; Cai, Y. A new approach to oil spill detection that combines deep learning with unmanned aerial vehicles. Comput. Ind. Eng. 2019, 135, 1300–1311. [Google Scholar] [CrossRef]
- Mohammadiun, S.; Hu, G.; Gharahbagh, A.A.; Mirshahi, R.; Li, J.; Hewage, K.; Sadiq, R. Optimization of integrated fuzzy decision tree and regression models for selection of oil spill response method in the Arctic. Knowl.-Based Syst. 2021, 213, 106676. [Google Scholar] [CrossRef]
- Sajid, Z.; Khan, F.; Veitch, B. Dynamic ecological risk modelling of hydrocarbon release scenarios in Arctic waters. Mar. Pollut. Bull. 2020, 153, 111001. [Google Scholar] [CrossRef] [PubMed]
- Cherednichenko, O.; Yanholenko, O.; Vovk, M.; Tkachenko, V. Formal Modeling of Decision-Making Processes under Transboundary Emergency Conditions. Data-Cent. Bus. Appl. 2020, 42, 141–162. [Google Scholar]
- Lourenzutti, R.; Krohling, R.A. A generalized TOPSIS method for group decision making with heterogeneous information in a dynamic environment. Inf. Sci. 2016, 330, 1–18. [Google Scholar] [CrossRef]
- Akyuz, E.; Celik, E. A quantitative risk analysis by using interval type-2 fuzzy FMEA approach: The case of oil spill. Marit. Policy Manag. 2018, 45, 979–994. [Google Scholar] [CrossRef]
- Yu, Z.; Ye, S.; Sun, Y.; Zhao, H.; Feng, X.Q. Deep learning method for predicting the mechanical properties of aluminum alloys with small data sets. Mater. Today Commun. 2021, 28, 102570. [Google Scholar] [CrossRef]
- Karaboga, D.; Kaya, E. Adaptive network based fuzzy inference system (ANFIS) training approaches: A comprehensive survey. Artif. Intell. Rev. 2019, 52, 2263–2293. [Google Scholar] [CrossRef]
- Al-Mahasneh, M.; Aljarrah, M.; Rababah, T.; Alu’datt, M. Application of hybrid neural fuzzy system (ANFIS) in food processing and technology. Food Eng. Rev. 2016, 8, 351–366. [Google Scholar] [CrossRef]
- Elsisi, M.; Tran, M.Q.; Mahmoud, K.; Lehtonen, M.; Darwish, M.M. Robust design of ANFIS-based blade pitch controller for wind energy conversion systems against wind speed fluctuations. IEEE Access 2021, 9, 37894–37904. [Google Scholar] [CrossRef]
- Kalibatiene, D.; Burmakova, A.; Smelov, V. On Knowledge-Based Forecasting Approach for Predicting the Effects of Oil Spills on the Ground. Digit. Transform. 2020, 4, 44–56. [Google Scholar] [CrossRef]
- Hssina, B.; Merbouha, A.; Ezzikouri, H.; Erritali, M. A comparative study of decision tree ID3 and C4. Int. J. Adv. Comput. Sci. Appl. 2014, 4, 13–19. [Google Scholar]
- Pandya, R.; Pandya, J. C5.0 algorithm to improved decision tree with feature selection and reduced error pruning. Int. J. Comput. Appl. 2015, 117, 18–21. [Google Scholar]
- Lewis, R.J. An introduction to classification and regression tree (CART) analysis. In Proceedings of the Annual Meeting of the Society for Academic Emergency Medicine, San Francisco, CA, USA, 22–25 May 2000. [Google Scholar]
- Hu, G.; Mohammadiun, S.; Gharahbagh, A.A.; Li, J.; Hewage, K.; Sadiq, R. Selection of oil spill response method in Arctic offshore waters: A fuzzy decision tree-based framework. Mar. Pollut. Bull. 2020, 161, 111705. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Q.; Wang, J. Disaster Chain Scenarios Evolutionary Analysis and Simulation Based on Fuzzy Petri Net: A Case Study on Marine Oil Spill Disaster. IEEE Access 2019, 7, 183010–183023. [Google Scholar] [CrossRef]
- Feng, D.; Passalacqua, P.; Hodges, B.R. Innovative Approaches for Geometric Uncertainty Quantification in an Operational Oil Spill Modeling System. JMSE J. Mar. Sci. Eng. 2019, 7, 259. [Google Scholar] [CrossRef]
- Hoblitzell, A.; Babbar-Sebens, M.; Mukhopadhyay, S. Machine Learning with Small Data for User Modeling of Watershed Stakeholders Engaged in Interactive Optimization. In Proceedings of the 2nd International Conference on Computer Science and Artificial Intelligence, Shenzhen, China, 8 December 2018; Association for Computing Machinery: New York, NY, USA, 2018. [Google Scholar]
- Russel, S.; Norvig, P. Artificial Intelligence. A Modern Approach, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2012; pp. 30–86. [Google Scholar]
- Kumar, R.; Verma, R. Classification algorithms for data mining: A survey. Int. J. Inf. Educ. Technol. 2012, 1, 7–14. [Google Scholar]
- Yuan, G.X.; Ho, C.H.; Lin, C.J. Recent advances of large-scale linear classification. Proc. IEEE 2012, 100, 2584–2603. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V.; Saitta, L. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Fix, E.; Hodges, J.L. Nonparametric Discrimination. Consistency Properties; International Statistical Institute (ISI): Randolph Field, TX, USA, 1951; Volume 1, pp. 21–49. [Google Scholar]
- Wu, X.; Kumar, V.; Ross, Q.J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Yu, P.S.; et al. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2007, 14, 1–37. [Google Scholar] [CrossRef]
- Freedman, D.A. Statistical Models: Theory and Practice; Cambridge University Press: Cambridge, UK, 2009; pp. 41–72. [Google Scholar]
- Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, UK, 2006; pp. 7–30. [Google Scholar]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: New York, NY, USA, 1999; pp. 69–99. [Google Scholar]
- Tolles, J.; Meurer, W.J. Logistic Regression: Relating Patient Characteristics to Outcomes. JAMA 2016, 316, 533–534. [Google Scholar] [CrossRef]
- Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Detrano, R. The UCI machine Learning Repository Online. Available online: http://archive.ics.uci.edu/ml/datasets/Heart+Disease (accessed on 20 June 2022).
- Piryonesi, S.M.; El-Diraby, T.E. Data Analytics in Asset Management: Cost-Effective Prediction of the Pavement Condition Index. J. Infrastruct. Syst. 2019, 26, 04019036. [Google Scholar] [CrossRef]
- Gong, H.F.; Chen, Z.S.; Zhu, Q.X.; He, Y.L. A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: An empirical study of petrochemical industries. Appl. Energy 2017, 197, 405–415. [Google Scholar] [CrossRef]
- Cubuk, E.D.; Sendek, A.D.; Reed, E.J. Screening billions of candidates for solid lithium-ion conductors: A transfer learning approach for small data. J. Chem. Phys. 2019, 150, 214701. [Google Scholar] [CrossRef] [PubMed]
- He, Y.L.; Wang, P.J.; Zhang, M.Q.; Zhu, Q.X.; Xu, Y. A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: A case study of Ethylene industry. Energy 2018, 147, 418–427. [Google Scholar] [CrossRef]
- Drechsler, R.; Huhn, S.; Plump, C. Combining Machine Learning and Formal Techniques for Small Data Applications-A Framework to Explore New Structural Materials. In Proceedings of the 2020 23rd Euromicro Conference on Digital System Design (DSD), Kranj, Slovenia, 26–28 August 2020; Volume 1, pp. 518–525. [Google Scholar]
- Baldominos, A.; Ogul, H.; Colomo-Palacios, R. Infection diagnosis using biomedical signals in small data scenarios. In Proceedings of the 32nd International Symposium on Computer-Based Medical Systems, Cordoba, Spain, 5–7 June 2019. [Google Scholar]
- Micallef, L.; Sundin, I.; Marttinen, P.; Ammad-ud-din, M.; Peltola, T.; Soare, M.; Jacucci, G.; Kaski, S. Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, Limassol, Cyprus, 13–16 March 2017; Association for Computing Machinery: New York, NY, USA, 2017. [Google Scholar]
- Shaikhina, T.; Khovanova, N.A. Handling limited datasets with neural networks in medical applications: A small-data approach. Artif. Intell. Med. 2017, 75, 51–63. [Google Scholar] [CrossRef]
- Li, Y.; Yang, X.; Ye, Y.; Cui, L.; Jia, B.; Jiang, Z.; Wang, S. Detection of oil spill through fully convolutional network. In Proceedings of the International Conference on Geo-Spatial Knowledge and Intelligence, Chiang Mai, Thailand, 8–10 December 2017. [Google Scholar]
- Li, Y.; Lyu, X.; Frery, A.C.; Ren, P. Oil Spill Detection with Multiscale Conditional Adversarial Networks with Small-Data Training. Remote Sens. 2021, 13, 2378. [Google Scholar] [CrossRef]
- Qin Ouyang, X.; Chen, Y.P.; Wei, B.H.; Mosic, D. Experimental Study on Class Imbalance Problem Using an Oil Spill Training Data Set. Br. J. Math. Comput. Sci. 2017, 2, 1–9. [Google Scholar] [CrossRef]
- Mills, P. Efficient statistical classification of satellite measurements. Int. J. Remote Sens. 2012, 32, 6109–6132. [Google Scholar] [CrossRef]
- Powers, D.M.; Ailab, W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. IJMLT 2020, 2, 37–63. [Google Scholar]
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Teke, A.; Yildirim, H.B.; Çelik, Ö. Evaluation and performance comparison of different models for the estimation of solar radiation. Renew. Sustain. Energy Rev. 2015, 50, 1097–1107. [Google Scholar] [CrossRef]
- Jadhav, A.; Pramod, D. Comparison of performance of data imputation methods for numeric dataset. Appl. Artif. Intell. 2019, 33, 913–933. [Google Scholar] [CrossRef]
- Ranković, V.; Radulović, J.; Radojević, I.; Ostojić, A.; Čomić, L. Neural network modeling of dissolved oxygen in the Gruža reservoir. Ecol. Model. 2010, 221, 1239–1244. [Google Scholar] [CrossRef]
- Sammut, C.; Webb, G. Encyclopedia of Machine Learning; Springer: Boston, MA, USA, 2011; pp. 150–207. [Google Scholar]
- Putka, D.J.; Beatty, A.S.; Reeder, M.C. Modern prediction methods: New perspectives on a common problem. Organ. Res. Methods 2018, 21, 689–732. [Google Scholar] [CrossRef]
- Duch, W. Filter Methods; Springer: Heidelberg/Berlin, Germany, 2006; pp. 89–117. [Google Scholar]
- Cherrington, M.; Thabtah, F.; Lu, J.; Xu, Q. Feature selection: Filter methods performance challenges. In Proceedings of the International Conference on Computer and Information Sciences, Jouf University, Aljouf, Saudi Arabia, 3–4 April 2019. [Google Scholar]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 2012, 1, 483–519. [Google Scholar] [CrossRef]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Feature selection for high-dimensional data. PAI 2016, 5, 65–75. [Google Scholar] [CrossRef]
- Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson Correlation Coefficient. Springer Topics in Signal Processing, 2nd ed.; Springer: Heidelberg/Berlin, Germany, 2009; pp. 1–4. [Google Scholar]
- Ardil, C.; Pashaev, A.M.; Sadiqov, R.A.; Abdullayev, P. Multiple Criteria Decision-Making Analysis for Selecting and Evaluating Fighter Aircraft. Int. J. Transp. Veh. Eng. 2021, 13, 683–694. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; pp. 1–175. [Google Scholar]
- Awad, M.; Khanna, R. Support vector regression. In Efficient Learning Machines; Apress: Berkeley, CA, USA, 2015; pp. 67–80. [Google Scholar]
- Karimi, K.; Hamilton, H.J. Generation and interpretation of temporal decision rules. arXiv 2010, arXiv:1004.3334. [Google Scholar]
- Jadhav, S.; He, H.; Jenkins, K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl. Soft Comput. 2018, 69, 541–553. [Google Scholar] [CrossRef]
- Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms; CRC Press: Cambridge, UK, 2012; pp. 69–85. [Google Scholar]
- Khatri, C. Classical statistical analysis based on a certain multivariate complex Gaussian distribution. Ann. Math. Stat. 1965, 36, 98–114. [Google Scholar] [CrossRef]
- MacKay, D.J.C. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003; pp. 54–75. [Google Scholar]
- Koizumi, D. On the Prediction of a Nonstationary Bernoulli Distribution based on Bayes Decision Theory. ICAART 2021, 2, 957–965. [Google Scholar]
- Bloch, D.A.; Watson, G.S. A Bayesian study of the multinomial distribution. Ann. Math. Stat. 1967, 38, 1423–1435. [Google Scholar] [CrossRef]
- Jang, J.-S. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
- Choi, B.I.; Rhee, F.C.H. Interval type-2 fuzzy membership function generation methods for pattern recognition. Inf. Sci. 2009, 179, 2102–2122. [Google Scholar] [CrossRef]
- Willmott, C.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Burmakova, A.; Kalibatiene, D. An ANFIS-based Model to Predict the Oil Spill Consequences on the Ground. In Proceedings of the IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 22 April 2021. [Google Scholar]
- Fathipour-Azar, H. Machine learning-assisted distinct element model calibration: ANFIS, SVM, GPR, and MARS approaches. Acta Geotech. 2021, 17, 1207–1217. [Google Scholar] [CrossRef]
References | Domain of Interest | Solution for Small Data Issue | Classification/ Regression | Methods/Techniques Used | Tools Used | Dataset for Experiment |
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | (7) |
[4] | Prediction of heart disease | Surrogate data generation from the characteristics of original observations | Regression | Neural network (NN), Logistic Regression, Decision Tree and Random Forest | Synthpop package in R | Heart disease data from the UCI Repository [42], 14 dataset variables, 297 records |
[10] | Additive manufacturing | Surrogate data generation from the two physical models | Regression | Regression Trees, SVR, kernel regression, multivariate adaptive regression, GPR. GPR performed the best | Analysis of variance (ANOVA) | Three datasets: for Eagar–Tsai-100, four input parameters and 100 samples; for Eagar–Tsai-462, four input parameters and random sampling; for Verhaeghe-41, 41 data points consisting of the four inputs |
[11] | Pollution of groundwater | Cluster analysis | Classification | Clusters, SVM and NN | SVM with polynomial and RBF kernel methods | 14 groundwater quality variables gathered from 27 groundwater samples |
[12] | Software defect prediction (SDP) | DNN model with integrated similarity feature learning and distance metric learning | Classification | Siamese Dense neural networks (SDNN) | Tensorflow, keras and Matlab | 10 datasets from the NASA repository from 87 to 2032 instances |
[31] | Modelling of watershed stakeholders engaged | Integrating user models with limited data | Regression | NN | ANFIS MATHLAB | Known data on the physical and chemical properties of soil and aquatic environment (datasets of 25, 50, 100 and 360 samples) |
[43] | Material design | ML model training using only elementary descriptors on the same dataset | Classification | Linear SVM with leave-one-out cross-validation | NA | Materials Project database |
[44] | The energy prediction and optimisation of petrochemical systems | Virtual sample generation forms the underlying information of the small dataset | Regression | The Monte Carlo, Particle Swarm Optimisation, and extreme ML (ELM) algorithms | NA | Two real-world cases of the petroleum production process |
[45] | Prediction of the mechanical properties of aluminium alloys | DNN model pre-training and tuning its parameters | Regression | DNN with gradient descent optimisation | TensorFlow | Data of mechanical properties of aluminium alloys |
[46] | Enhancing energy prediction | Virtual sample generation through nonlinear interpolation | Regression | Nonlinear interpolation, EML, non-linear interpolation based virtual sample generation | NA | 50 production data items from Chinese plants between 2011–2013; five input variables |
[47] | Exploring new structural materials | Extracting relevant properties from a high dimensional small training dataset | Regression | Kernel Regression-based Learning, Kernel Recursive Least Squares, expert knowledge | C++, dlib, MongoDB | Tensile test specimen and spherical micro samples; the training data of grid points, i.e., fully classified structural materials of 6500 data points; 15 kernel functions |
[48] | Infection diagnosis | Aggregating existing measurements to generate required features | Classification | Decision support system based on ensemble of Decision Trees, k-nearest neighbours, logistic regression, multi-layer perceptron, SVM | NA | Real dataset of 60 patients for infections; three variables |
[49] | Interactive knowledge extraction based on feature relevance | Interactive visualisation for extracting tacit prior knowledge | Classification, Regression | Interactive visualisation (IVis), user model | Python Natural Language Toolkit, Python Rake and KPMiner | The user’s knowledge about feature relevance from 162 scientific documents; 457 unique keywords that were used as features |
[50] | Medical applications | Surrogate data generated using statistical features of the original dataset | Regression | NN | NA | 56 samples; 5 input parameters |
[51] | Marine oil spill detection from images | The ResNet model for extracting the feature maps with the input data | Classification | Convolutional NN (CNN) | FCN-GoogLeNet and FNC-ResNet models | 20 oil spill images |
[52] | Oil spill detection | Generating an oil spill detection map from the observed image characteristics | Classification | Multiscale conditional adversarial network | NA | Four oil spill image pairs (size of 256 × 256 pixels) |
[53] | Oil spill detection | Generating new examples applying the Mega-Trend Diffusion function, intelligent over-sampling methods | Classification | Unsupervised algorithm SOM | MATLAB, Libsvm | Pictures of real oil spills compared to pictures of fake spills |
Rank | Variables, Units | Min. | Max. | Importance (Gini Impurity) |
---|---|---|---|---|
1 | Spilled oil volume, m3 | 10 | 1000 | 10.69 |
2 | Surface spreading coefficient, m−1 | 5 | 30 | 6.51 |
3 | Ground type | 4 possible values: sand, sandy loam, loam, clay | 6.17 | |
4 | Ground thickness, m | 3 | 5 | 5.41 |
5 | Time after the spill, days | 1 | 10 | 4.63 |
6 | Oil density, kg/m3 | 750 | 930 | 4.54 |
7 | Depth of groundwater, m | 3.15 | 5.35 | 4.32 |
8 | Ground temperature, °C | 3 | 11 | 4.15 |
9 | Soil moisture | 0.08 | 0.2 | 3.92 |
10 | Terrain relief, m | 130 | 170 | 3.9 |
11 | Air temperature, °C | 5 | 25 | 3.68 |
12 | Ground moisture | 0.18 | 0.46 | 3.49 |
Surface Spreading Coefficient | Ground Thickness | Oil Density | Spilled Oil Volume | |
---|---|---|---|---|
Surface spreading coefficient | 1 | 0 | 0 | 0 |
Ground thickness | 0 | 1 | 0 | 0 |
Oil density | 0 | 0 | 1 | 0 |
Spilled oil volume | 0 | 0 | 0 | 1 |
Input Data | Intervals (Terms) | ||||||
---|---|---|---|---|---|---|---|
Name of the Variable | Value Range, Measurement Units | Input No. | Very Low | Low | Moderate | High | Very High |
Ground thickness | [3; 5], m | Input 1 | [3; 3.65) | (3; 3.9) | (3.4; 4.6) | (4.1; 5) | (4.3; 5] |
Spilled oil volume | [100; 10,000], m3 | Input 2 | [100; 3500) | (100; 6000) | (1700; 8300) | (4000; 10,000) | (6000; 10,000] |
Surface spreading coefficient | [5; 30], m−1 | Input 3 | [5; 14) | (5; 20) | (9; 26) | (16; 30) | (23; 30] |
Oil density | [750; 850], kg/m3 | Input 4 | [750; 785) | (750; 810) | (765; 835) | (790; 850) | (815; 850] |
Rule # | IF | Input 1 | AND | Input 2 | AND | Input 3 | AND | Input 4 | THEN | Output |
1 | VH | VH | VH | VH | VH | |||||
2 | L | M | M | M | M | |||||
3 | M | L | VH | VH | VH | |||||
4 | H | VH | VH | VH | VH | |||||
5 | H | L | L | L | M | |||||
6 | H | M | M | M | H | |||||
7 | M | L | L | L | M | |||||
8 | M | H | H | VL | L | |||||
9 | M | VH | VH | VH | VH | |||||
10 | L | M | VH | M | M | |||||
11 | L | H | H | VL | M | |||||
12 | L | VH | VH | VL | M |
Method | MSE | R2 | NRMSE |
---|---|---|---|
Linear Regression | 0.43 | 0.45 | 14.2% |
Decision Trees | 0.33 | 0.32 | 15.8% |
SVR | 0.52 | 0.54 | 19.1% |
Ensembles | 0.30 | 0.14 | 17.7% |
GPR | 0.12 | 0.95 | 4.3% |
The proposed ANFIS-based approach | 0.01 | 0.99 | 1.0% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Burmakova, A.; Kalibatienė, D. Applying Fuzzy Inference and Machine Learning Methods for Prediction with a Small Dataset: A Case Study for Predicting the Consequences of Oil Spills on a Ground Environment. Appl. Sci. 2022, 12, 8252. https://doi.org/10.3390/app12168252
Burmakova A, Kalibatienė D. Applying Fuzzy Inference and Machine Learning Methods for Prediction with a Small Dataset: A Case Study for Predicting the Consequences of Oil Spills on a Ground Environment. Applied Sciences. 2022; 12(16):8252. https://doi.org/10.3390/app12168252
Chicago/Turabian StyleBurmakova, Anastasiya, and Diana Kalibatienė. 2022. "Applying Fuzzy Inference and Machine Learning Methods for Prediction with a Small Dataset: A Case Study for Predicting the Consequences of Oil Spills on a Ground Environment" Applied Sciences 12, no. 16: 8252. https://doi.org/10.3390/app12168252
APA StyleBurmakova, A., & Kalibatienė, D. (2022). Applying Fuzzy Inference and Machine Learning Methods for Prediction with a Small Dataset: A Case Study for Predicting the Consequences of Oil Spills on a Ground Environment. Applied Sciences, 12(16), 8252. https://doi.org/10.3390/app12168252