Next Article in Journal
Exploring How Phonotactic Knowledge Can Be Represented in Cognitive Networks
Previous Article in Journal
Diversification of Legislation Editing Open Software (LEOS) Using Software Agents—Transforming Parliamentary Control of the Hellenic Parliament into Big Open Legal Data
Article

Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter Data

Department of Informatics, Parahyangan Catholic University, Bandung 40141, Indonesia
*
Author to whom correspondence should be addressed.
Academic Editor: Min Chen
Big Data Cogn. Comput. 2021, 5(4), 46; https://doi.org/10.3390/bdcc5040046
Received: 17 July 2021 / Revised: 15 September 2021 / Accepted: 16 September 2021 / Published: 22 September 2021
Directed graphs can be prepared from big data containing peoples’ interaction information. In these graphs the vertices represent people, while the directed edges denote the interactions among them. The number of interactions at certain intervals can be included as the edges’ attribute. Thus, the larger the count, the more frequent the people (vertices) interact with each other. Subgraphs which have a count larger than a threshold value can be created from these graphs, and temporal active communities can then be mined from each of these subgraphs. Apache Spark has been recognized as a data processing framework that is fast and scalable for processing big data. It provides DataFrames, GraphFrames, and GraphX APIs which can be employed for analyzing big graphs. We propose three kinds of active communities, namely, Similar interest communities (SIC), Strong-interacting communities (SC), and Strong-interacting communities with their “inner circle” neighbors (SCIC), along with algorithms needed to uncover them. The algorithm design and implementation are based on these APIs. We conducted experiments on a Spark cluster using ten machines. The results show that our proposed algorithms are able to uncover active communities from public big graphs as well from Twitter data collected using Spark structured streaming. In some cases, the execution time of the algorithms that are based on GraphFrames’ motif findings is faster. View Full-Text
Keywords: directed graphs analysis; social network analysis; scalable communities detection; graph data mining on Spark; off-line data stream analysis directed graphs analysis; social network analysis; scalable communities detection; graph data mining on Spark; off-line data stream analysis
Show Figures

Figure 1

MDPI and ACS Style

Moertini, V.S.; Adithia, M.T. Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter Data. Big Data Cogn. Comput. 2021, 5, 46. https://doi.org/10.3390/bdcc5040046

AMA Style

Moertini VS, Adithia MT. Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter Data. Big Data and Cognitive Computing. 2021; 5(4):46. https://doi.org/10.3390/bdcc5040046

Chicago/Turabian Style

Moertini, Veronica S., and Mariskha T. Adithia 2021. "Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter Data" Big Data and Cognitive Computing 5, no. 4: 46. https://doi.org/10.3390/bdcc5040046

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop