# A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- How to obtain meaningful information from previous data features to provide the highest accuracy;
- How to identify a class imbalance that commonly occurs in huge datasets;
- Howto integrate recent progress completed artificial intelligence (AI) areas while retaining Spark frame work computing power.

## 2. Background and Related Work

## 3. Proposed Framework Architecture

- (i)
- Phase 1: big data analysis using Spark MLlib
- (ii)
- Phase 2: Cascading

- (i)
- Retrain the model using MLP and LSTM
- (ii)
- Output

#### 3.1. Overview of the Architecture

#### 3.2. Phase 1. Big Data Processing Using Spark MLlib

_{i}, y

_{i}), I = 1 … 1, x

_{i}€R

^{n}, y

_{i}€{+1,−1} where R

^{n}is the input space, x

_{i}is the feature vectors, and y

_{i}is the class label of x

_{i}[45]. The linear separating function can be expressed as:

^{t}x + b

- The key Spark abstraction was due to RDD so input preprocessed data in the form of RDD.
- Convert the resilient distributed dataset into a data frame.
- Read labels and features from the data frame.
- Non-numeric features having one hot encoding.
- Each encoded feature having string indexing.
- Create vector assembly non-numeric features and one hot encoded feature.
- Alter the assembly vector in the pipeline.
- Modify the pipeline to an appropriate format for Spark.
- Train the model on the training data using the Spark MLlib.

#### 3.3. Phase 2. Cascading

#### 3.4. Stage 2. Deep Learning

- This is the second and final framework learning stage before output. We train the MLP and LSTM using the modified dataset from cascading.
- MLP and LSTM can be formed by either reusing steps 2–8 in stage 1, substituting ML for MLP using the Spark library, or MLP could be formed from the ANN initially.
- MLP can be trained using a high-quality training back-propagation algorithm to minimize the errors in prediction.

#### 3.5. Framework Underlying Logic

#### 3.5.1. Computation Time

#### 3.5.2. Feature Set Enhancement

#### 3.5.3. Continuous Learning Improvement

## 4. Proposed Framework Implementation

#### 4.1. Description of the Dataset

#### 4.2. Cardiac Arrhythmia Classification

- All 279 cardiac arrhythmia attributes were organized as numerical values using the numerical columns.
- The vector assembler was utilized to join all 279 attributes within individual vectors for all Spark MLlib based algorithms. This vector feature located below features in the data frame.
- As discussed above, the dataset included 16 arrhythmia classes, with the presence or absence represented as 1 and 0, respectively.

_{i}, we have a set Y(x

_{i}) of actual arrhythmia type and a set G(x

_{i}) of predicted arrhythmia type generated by the classifier. Therefore, if the set arrhythmia type labels or classes is L = $\{{\mathrm{l}}_{0}$,${\mathrm{l}}_{1}$, …, ${\mathrm{l}}_{\mathrm{M}-1}$}, then the true output vector y will have N elements such that y

_{0}, y

_{1}, …, y

_{N−1}∈ L.

- input layer contained 281 units,
- two hidden layers with 64 units each and a sigmoid activation function,
- output layer with Softmax activation function having 2 units.

_{t}at time step t and computed the hidden state, h

_{t}.

#### 4.3. Identifying Malicious URLs

- All 3,231,961 lexical and host-based features were organized as numerical values using the numerical columns.
- The vector assembler was utilized to join all 3,231,961 attributes within individual vectors for all Spark MLlib based algorithms. This vector feature was located below features in the data frame.
- As discussed above, the dataset included 2classes, with either as malicious URL or as a benign URL represented as 1 and 0, respectively.

- input layer contained 3,231,960 units,
- three hidden layers with 128, 256 and 512 units each and a sigmoid activation function,
- output layer with Softmax activation function having 2 units.

## 5. Experimental Results

#### 5.1. Experimental Setup

#### 5.2. Stage 1 Classification Analysis

#### 5.3. Stage 2 Classification Analysis

## 6. Conclusions and Outlook

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Nair, L.R.; Shetty, S.D. Applying spark based machine learning model on streaming big data for health status prediction. Comput. Electr. Eng.
**2018**, 65, 393–399. [Google Scholar] [CrossRef] - Hbibi, L.; Barka, H. Big data: Framework and issues. In Proceedings of the 2016 International Conference on Electrical and Information Technologies (ICEIT 2016), Tangier, Morocco, 4–7 May 2016. [Google Scholar]
- Assefi, M.; Behravesh, E.; Liu, G.; Tafti, A.P. Big data machine learning using apache spark MLlib. In Proceedings of the 2017 IEEE International Conference on Big Data, Boston, MA, USA, 11–14 December 2017. [Google Scholar]
- Abbasi, A.; Sarker, S.; Chiang, R.H. Big data research in information systems: Toward an inclusive research agenda. J Assoc. Inf. Syst.
**2016**, 17, 1–33. [Google Scholar] [CrossRef] - Fu, J.; Sun, J.; Wang, K. Spark—A big data processing platform for machine learning. In Proceedings of the 2016 Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII 2016), Wuhan, China, 3–4 December 2016. [Google Scholar]
- Richter, A.N.; Khoshgoftaar, T.M.; Landset, S.; Hasanin, T. A multi-dimensional comparison of toolkits for machine learning with big data. In Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration (IRI 2015), San Francisco, CA, USA, 13–15 August 2015. [Google Scholar]
- Karim, M.R.; Alla, S. Scala and Spark for Big Data Analytics: Explore the Concepts of Functional Programming, Data Streaming, and Machine Learning; Packt Publishing Ltd.: Birmingham, UK, 2017. [Google Scholar]
- Longadge, R.; Dongre, S. Class imbalance problem in data mining review. arXiv
**2013**, arXiv:1305.1707. [Google Scholar] - Rahman, F.; Slepian, M.; Mitra, A. A novel big-data processing framwork for healthcare applications: Big-data-healthcare-in-a-box. In Proceedings of the 2016 IEEE International Conference on Big Data, Washington, DC, USA, 5–8 December 2016. [Google Scholar]
- Archenaa, J.; Anita, E.M. Interactive big data management in healthcare using spark. In Proceedings of the 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC 2016), Chennai, India, 10–11 March 2016. [Google Scholar]
- Tafti, A.P.; LaRose, E.; Badger, J.C.; Kleiman, R.; Peissig, P. Machine learning-as-a-service and its application to medical informatics. In Proceedings of the 2017 International Conference on Machine Learning and Data Mining in Pattern Recognition, New York, NY, USA, 15–20 July 2017. [Google Scholar]
- Van Horn, J.D. Opinion: Big data biomedicine offers big higher education opportunities. Proc. Natl. Acad. Sci. USA
**2016**, 113, 6322–6324. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Anisetti, M.; Ardagna, C.; Bellandi, V.; Cremonini, M.; Frati, F.; Damiani, E. Privacy-aware big data analytics as a service for public health policies in smart cities. Sustain. Cities Soc.
**2018**, 39, 68–77. [Google Scholar] [CrossRef] - Rios, E.; Prünster, B.; Suzic, B.; Carnehult, T.; Prieto, E.; Notario, N.; Suciu, G.; Ruiz, J.F.; Orue-Echevarria, L.; Rak, M.; et al. Cloud technology options towards Free Flow Of Data. DPSP Cluster
**2017**. Available online: http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1232492&dswid=-865 (accessed on 15 June 2018). [CrossRef] - Lau, B.P.L.; Wijerathne, N.; Ng, B.K.K.; Yuen, C. Sensor fusion for public space utilization monitoring in a smart city. IEEE Internet Things J.
**2018**, 5, 473–481. [Google Scholar] [CrossRef] - Apache Spark Lightning-Fast Unified Analytics Engine. Available online: http://spark.apache.org/ (accessed on 7 July 2018).
- Barquero, J.B. Getting Started with Spark. Available online: http://malsolo.com/blog4java/?p=679 (accessed on 15 June 2018).
- Zaharia, M.; Chowdhury, M.; Das, T.; Dave, A.; Ma, J.; McCauley, M.; Franklin, M.J.; Shenker, S.; Stoica, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, San Jose, CA, USA, 25–27 April 2012. [Google Scholar]
- Apache Spark Mllib. Available online: http://spark.apache.org/mllib (accessed on 15 May 2018).
- Soomro, T.R.; Shoro, A.G. Big Data Analysis: Apache Spark Perspective. Glob. J. Comput. Sci. Technol.
**2015**, 15, 7–14. [Google Scholar] - Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.B.; Amde, M.; Owen, S.; et al. Mllib: Machine learning in apache spark. J. Mach. Learn. Res.
**2016**, 17, 1235–1241. [Google Scholar] - Community Effort Driving Standardization of ApacheSpark through Expanded Role in Hadoop Project, Cloudera, Databricks, IBM, Intel, and Map R, OpenSource Standards. Available online: https://www.cloudera.com/more/news-and-blogs/press-releases/2014-07-01-community-effort-driving-standardization-of-apache-spark-through.html (accessed on 15 June 2018).
- Zaharia, M.; Xin, R.S.; Wendell, P.; Das, T.; Armbrust, M.; Dave, A.; Meng, X.; Rosen, J.; Venkataraman, S.; Franklin, M.J.; et al. Apache spark: A unified engine for big data processing. Commun. ACM
**2016**, 59, 56–65. [Google Scholar] [CrossRef] - Github-Apache Spark. Available online: https://github.com/apache/spark/ (accessed on 15 July 2018).
- Nair, L.R.; Shetty, S.D. Streaming twitter data analysis using sparkfor effective job search. J. Theor. Appl. Inf. Technol.
**2015**, 80, 349. [Google Scholar] - Nodarakis, N.; Sioutas, S.; Tsakalidis, A.; Tzimas, G. Large scale sentiment analysis on Twitter with Spark. In Proceedings of the Workshop EDBT/ICDT Joint Conference, Bordeaux, France, 15 March 2016. [Google Scholar]
- Shyam, R.; Ganesh, H.B.; Kumar, S.; Prabaharan, P.; Soman, K.P. Apache Spark a big data analytics platform for smart grid. Procedia Technol.
**2015**, 21, 171–178. [Google Scholar] [CrossRef] - Yousefi, N.; Georgiopoulos, M.; Anagnostopoulos, G.C. Multi-task learning with group-specific feature space sharing. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Porto, Portugal, 7–11 September 2015; pp. 120–136. [Google Scholar]
- Fazli, M.S.; Vella, S.A.; Moreno, S.N.J.; Quinn, S. Computational motility tracking of calcium dynamics in toxoplasma gondii. arXiv
**2017**, arXiv:1305.1707. [Google Scholar] - Allahyari, M.; Pouriyeh, S.; Assefi, M.; Safaei, S.; Trippe, E.D.; Gutierrez, J.B.; Kochut, K. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv
**2017**, arXiv:1707.02919. [Google Scholar] - Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag.
**2015**, 35, 137–144. [Google Scholar] [CrossRef] - Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell.
**2016**, 5, 221–232. [Google Scholar] [CrossRef] - Prati, R.C.; Batista, G.E.; Silva, D.F. Class imbalance revisited: A new experimental setup to assess the performance of treatment methods. Knowl. Inf. Syst.
**2015**, 45, 247–270. [Google Scholar] [CrossRef] - Kotsiantis, S.; Kanellopoulos, D.; Pintelas, P. Handling imbalanced datasets: A review. GESTS Int. Trans. Comput. Sci. Eng.
**2006**, 30, 25–36. [Google Scholar] - Sonak, A.; Patankar, R.; Pise, N. A new approach for handling imbalanced dataset using ANN and genetic algorithm. In Proceedings of the 2016 International Conference on Communication and Signal Processing (ICCSP 2016), Chennai, India, 6–8 April 2016. [Google Scholar]
- Popescu, M.C.; Sasu, L.M. Feature extraction, feature selection and machine learning for image classification: A case study. In Proceedings of the 2014 International on Optimization of Electrical and Electronic Equipment (OPTIM 2014), Brasov, Romania, 22–24 May 2014. [Google Scholar]
- Silva, L.M.J.; Marques de Sá, J.; Alexandre, L.A. Data classification with multilayer perceptrons using a generalized error function. Neural Netw.
**2008**, 21, 1302–1310. [Google Scholar] [CrossRef] [PubMed] - Zanaty, E. Support vector machines (SVMs) versus multilayer perception (MLP) in data classification. Egypt. Inf. J.
**2012**, 13, 177–183. [Google Scholar] [CrossRef] - Sharma, C. Big Data Analytics Using Neural Networks. Master’s Thesis, San José State University, San José, CA, USA, May 2014. [Google Scholar]
- Sarwar, S.M.; Hasan, M.; Ignatov, D.I. Two-stage cascaded classifier for purchase prediction. arXiv
**2015**, arXiv:1508.03856. [Google Scholar] - Simonovsky, M.; Komodakis, N. Onionnet: Sharing features in cascaded deep classifiers. arXiv
**2016**, arXiv:1608.02728. [Google Scholar] - Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Karim, M.; Cochez, M.; Beyan, O.D.; Zappa, A.; Sahay, R.; Decker, S.; Schuhmann, D.-R. Recurrent deep embedding networks for genotype clustering and ethnicity prediction. arXiv
**2018**, arXiv:1805.12218. [Google Scholar] - Kang, D.; Lv, Y.; Chen, Y.-Y. Short-term traffic flow prediction with LSTM recurrent neural network. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017. [Google Scholar]
- Priyadarshini, A. A map reduce based support vector machine for big data classification. Int. J. Database Theory Appl.
**2015**, 8, 77–98. [Google Scholar] [CrossRef] - Vapnik, V. The Nature of Statistical Learning Theory; Springer: Berlin, Germany, 2013. [Google Scholar]
- Gunn, S.R. Support vector machines for classification and regression. ISIS Tech. Rep.
**1998**, 14, 5–16. [Google Scholar] - Tomar, D.; Agarwal, S. A comparison on multi-class classification methods based on least squares twin support vector machine. Knowl.-Based Syst.
**2015**, 81, 131–147. [Google Scholar] [CrossRef] - Jakkula, V. Tutorial on support vector machine (SVM). School EECS
**2006**, 37, 1–13. [Google Scholar] - Singh, V.; Gupta, R.; Sevakula, R.K.; Verma, N.K. Comparative analysis of Gaussian mixture model, logistic regression and random forest for big data classification using map reduce. In Proceedings of the 2016 11th International Conference on Industrial and Information Systems (ICIIS 2016), Roorkee, India, 3–4 December 2016. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Biau, G. Analysis of a random forests model. J. Mach. Learn. Res.
**2012**, 13, 1063–1095. [Google Scholar] - Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Farris, F.A. The Gini index and measures of inequality. Am. Math. Mon.
**2010**, 117, 851–864. [Google Scholar] [CrossRef] - Giannakopoulos, I.; Tsoumakos, D.; Koziris, N. A decision tree based approach towards adaptive modeling of big data applications. In Proceedings of the 2017 IEEE International Conference on Big Data, Boston, MA, USA, 11–14 December 2017. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Mach. Learn.
**1986**, 1, 81–106. [Google Scholar] [CrossRef] [Green Version] - Wisesa, H.A.; Ma’sum, M.A.; Mursanto, P.; Febrian, A. Processing big data with decision trees: A case study in large traffic data. In Proceedings of the 2016 International Workshop on Big Data and Information Security (IWBIS 2016), Jakarta, Indonesia, 18 October 2018. [Google Scholar]
- Jiang, Y.; Hamer, J.; Wang, C.; Jiang, X.; Kim, M.; Song, Y.; Xia, Y.; Mohamed, N.; Sadat, M.N.; Wang, S. SecureLR: Secure logistic regression model via a hybrid cryptographic protocol. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2018**, 1. [Google Scholar] [CrossRef] [PubMed] - Sharma, M.; Shukla, S. Relative object localization using logistic regression. In Proceedings of the 2017 3rd International Conference on Advances in Computing, Communication & Automation (ICACCA), Dehradun, India, 15–16 September 2017. [Google Scholar]
- Kobayashi, F.; Eram, A.; Talburt, J. Entity resolution using logistic regression as an extension to the rule-based oyster system. In Proceedings of the 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Miami, FL, USA, 10–12 April 2018. [Google Scholar]
- Fazel, A.; Algharbi, F.; Haider, B. Classification of Cardiac Arrhythmias Patients. CS229 Final Project Report. 2014. Available online: http://cs229.stanford.edu/proj2014/AlGharbi%20Fatema,%20Fazel%20Azar,%20Haider%20Batool,%20Cardiac%20Arrhythmias%20Patients.pdf (accessed on 15 July 2018).
- Guvenir, H.A.; Acar, B.; Demiroz, G.; Cekin, A. Supervised machine learning algorithm for arrhythmia analysis. IEEE Comput. Cardiol.
**1997**, 24, 433–436. [Google Scholar] - Ma, J.; Saul, L.K.; Savage, S.; Voelker, G.M. Identifying suspicious URLs: An application of large-scale online learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009. [Google Scholar]
- Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA) Protein Struct.
**1975**, 405, 442–451. [Google Scholar] [CrossRef] - Niazi, K.A.; Khan, S.A.; Shaukat, A.; Akhtar, M. Identifying best feature subset for cardiac arrhythmia classification. In Proceedings of the Science and Information Conference (SAI 2015), London, UK, 28–30 July 2015; pp. 494–499. [Google Scholar]
- Mustaqeem, A.; Anwar, S.M.; Majid, M.; Khan, R.K. Wrapper method for feature selection to classify cardiac arrhythmia. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2017), Jeju Island, Korea, 11–15 July 2017. [Google Scholar]
- Samad, S.; Khan, S.A.; Haq, A.; Riaz, A. Classification of arrhythmia. Int. J. Electr. Energy
**2014**, 2, 57–61. [Google Scholar] [CrossRef] - Soman, T.; Bobbie, P.O. Classification of arrhythmia using machine learning techniques. WSEAS Trans. Comput.
**2005**, 4, 548–552. [Google Scholar] - Persada, A.G.; Setiawan, N.A.; Nugroho, H.A. Comparative study of attribute reduction on arrhythmia classification dataset. In Proceedings of the 2013 International Conference on Information Technology and Electrical Engineering (ICITEE 2013), Yogyakarta, Indonesia, 7–8 October 2013. [Google Scholar]

**Figure 2.**A typical long short-term memory cell [43].

**Figure 4.**Support vectors and margins [45].

**Table 1.**Common support vector machine kernels [49].

Kernel Function | Expression |
---|---|

Linear | K(${\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}})=1+{\mathrm{x}}_{\mathrm{i}}^{\mathrm{T}}{\mathrm{x}}_{\mathrm{j}}$ |

Polynomial | K(${\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}})={(1+{\mathrm{x}}_{\mathrm{i}}^{\mathrm{T}}{\mathrm{x}}_{\mathrm{j}})}^{\mathrm{p}}$ |

Radial Basis | K(${\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}})=\mathrm{exp}(-\mathsf{\gamma}{\parallel {\mathrm{x}}_{\mathrm{i}}-{\mathrm{x}}_{\mathrm{j}}\parallel}^{2})$ |

Exponential radial basis | K(${\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}})=\mathrm{exp}(\frac{-\parallel {\mathrm{x}}_{\mathrm{i}}-{\mathrm{x}}_{\mathrm{j}}\parallel}{{2}_{\mathsf{\sigma}}^{2}})$ |

Gaussian radial basis | K(${\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}})=\mathrm{exp}(-\frac{{\parallel {\mathrm{x}}_{\mathrm{i}}-{\mathrm{x}}_{\mathrm{j}}\parallel}^{2}}{{2}_{\mathsf{\sigma}}^{2}})$ |

Sigmoid | K(${\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}})=\mathrm{tanh}({\mathrm{kx}}_{\mathrm{i}}^{\mathrm{T}}{\mathrm{x}}_{\mathrm{j}}-\mathsf{\delta}$) |

Classifier | Accuracy | Precision | Recall | F1-score |
---|---|---|---|---|

LR | 83.0% | 83.0% | 82.3% | 83.0% |

DT | 85.4% | 86.2% | 87.5% | 86.5% |

RF | 87.5% | 86.0% | 87.0% | 87.0% |

Classifier | Accuracy | auPR | auROC | MCC |
---|---|---|---|---|

LR | 85% | 88% | 85% | 0.73 |

DT | 92% | 92% | 92% | 0.85 |

RF | 93% | 93.4% | 93.5% | 0.87 |

Classifier | Accuracy | Precision | Recall | F1-score |
---|---|---|---|---|

MLP | 89.0% | 88.0% | 89.0% | 88.3% |

LSTM | 94.8% | 96.0% | 96.7% | 96.2% |

Classifier | Accuracy | auPR | auROC | MCC |
---|---|---|---|---|

MLP | 95% | 96% | 95% | 0.91 |

LSTM | 96% | 97% | 96% | 0.94 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Khan, M.A.; Karim, M.R.; Kim, Y.
A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network. *Symmetry* **2018**, *10*, 485.
https://doi.org/10.3390/sym10100485

**AMA Style**

Khan MA, Karim MR, Kim Y.
A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network. *Symmetry*. 2018; 10(10):485.
https://doi.org/10.3390/sym10100485

**Chicago/Turabian Style**

Khan, Muhammad Ashfaq, Md. Rezaul Karim, and Yangwoo Kim.
2018. "A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network" *Symmetry* 10, no. 10: 485.
https://doi.org/10.3390/sym10100485