A Jaccard Similarity-Based Model to Match Stakeholders for Collaboration in an Industry-Driven Portal †
Abstract
:1. Introduction
2. Materials and Methods
2.1. Jaccard Similarity Coefficient
2.2. Dataset
- –
- Total budget;
- –
- Start and end dates;
- –
- Relevant programs in Horizon 2020;
- –
- Project acronym;
- –
- Project identifier;
- –
- Project coordinator;
- –
- List of participants;
- –
- Coordinator country;
- –
- Title;
- –
- Objective;
- –
- URL for project website.
2.3. Obtaining Tokens
> library(tokenizers) | (1) |
> filename <- “d:/projects.txt” | (2) |
> my_data <- readChar(filename, file.info(filename)$size) | (3) |
> words <- tokenize_words(my_data) | (4) |
> listt <- words[[1]] | (5) |
> sink(file = “d:/output.txt”) | (6) |
> listt_u <- unique(listt) | (7) |
> listt_u[1:1000] | (8) |
> listt_u[1001:2000] | (9) |
> listt_u[2001:2132] | (10) |
> sink(file = NULL) | (11) |
2.4. Data Preprocessing
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
References
- Leydesdorff, L.; Etzkowitz, H. The triple helix as a model for innovation studies. Sci. Public Policy 1998, 25, 195–203. [Google Scholar]
- Azaroff, L.V. Industry—University Collaboration: How to make it work. Res. Manag. 1982, 25, 31–34. [Google Scholar] [CrossRef]
- Bruneel, J.; D’Este, P.; Salter, A. Investigating the factors that diminish the barriers to university–industry collaboration. Res. Policy 2010, 39, 858–868. [Google Scholar] [CrossRef]
- Levchenko, O.; Kuzmenko, H.; Tsarenko, I. The role of universities in forming the innovation ecosystem. IEM 2018, 5, 10–16. [Google Scholar]
- Perkmann, M.; Neely, A.; Walsh, K. How should firms evaluate success in university–industry alliances? A performance measurement system. R D Manag. 2011, 41, 202–216. [Google Scholar] [CrossRef]
- Vijaymeena, M.K.; Kavitha, K. A survey of similarity measures in text mining. Mach. Learn. Appl. Int. J. 2016, 3, 19–28. [Google Scholar]
- Leydesdorff, L. On the normalization and visualization of author co-citation data: Salton’s Cosine versus the Jaccard index. J. Assoc. Inf. Sci. Technol. 2008, 59, 77–85. [Google Scholar] [CrossRef]
- Schneider, J.W.; Borlund, P. Matrix comparison, Part 1: Motivation and important issues for measuring the resemblance between proximity measures or ordination results. J. Assoc. Inf. Sci. Technol. 2007, 58, 1586–1595. [Google Scholar] [CrossRef]
- Bag, S.; Kumar, S.K.; Tiwari, M.K. An efficient recommendation generation using relevant Jaccard similarity. Inf. Sci. 2019, 483, 53–64. [Google Scholar] [CrossRef]
- Igual, L.; Seguí, S. Introduction to Data Science, a Python Approach to Concepts, Techniques and Applications; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Saxena, A.; Prasad, M.; Gupta, A.; Bharill, N.; Patel, O.P.; Tiwari, A.; Joo, E.M.; Weiping, D.; Lin, C.T. A review of clustering techniques and developments. Neurocomputing 2017, 267, 664–681. [Google Scholar] [CrossRef]
- Kotu, V.; Deshpande, B. Data Science: Concepts and Practice, 2nd ed.; Morgan Kaufmann: Burlington, MA, USA, 2018. [Google Scholar]
- Jaccard, P. The distribution of the flora in the alpine zone. New Phytol. 1912, 11, 37–50. [Google Scholar] [CrossRef]
- Aggarwal, C. Data Mining: The Textbook, 1st ed.; Springer: Cham, Switzerland, 2015; pp. 75–76. [Google Scholar]
- Öztemiz, F.; Karcı, A. Akademik Yazarların Yayınları Arasındaki İlişkinin Sosyal Ağ Benzerlik Yöntemleri İle Tespit Edilmesi. Uludağ Univ. J. Fac. Eng. 2020, 25, 591–608. [Google Scholar] [CrossRef]
- Seifoddini, H.; Djassemi, M. The production data-based similarity coefficient versus Jaccard’s similarity coefficient. Comput. Ind. Eng. 1991, 21, 263–266. [Google Scholar] [CrossRef]
- Osman, F.M.; Yap, M.H. The effect of filtering algorithms for breast ultrasound lesions segmentation. Inform. Med. Unlocked 2018, 12, 14–20. [Google Scholar] [CrossRef]
- Lu, M.; Qin, Z.; Cao, Y.; Liu, Z.; Wang, M. Scalable news recommendation using multi-dimensional similarity and Jaccard-Kmeans clustering. J. Syst. Softw. 2014, 95, 242–251. [Google Scholar] [CrossRef]
- Egghe, L. New relations between similarity measures for vectors based on vector norms. J. Assoc. Inf. Sci. Technol. 2009, 60, 232–239. [Google Scholar] [CrossRef]
- Niwattanakul, S.; Singthongchai, J.; Naenudorn, E.; Wanapu, S. Using of Jaccard coefficient for keywords similarity. In Proceedings of the International Multiconference of Engineers and Computer Scientists, Hong Kong, China, 13–15 March 2013; pp. 380–384. [Google Scholar]
- Park, S.; Kim, D.Y. Assessing language discrepancies between travelers and online travel recommendation systems: Application of the Jaccard distance score to web data mining. Technol. Forecast. Soc. Chang. 2017, 123, 381–388. [Google Scholar] [CrossRef]
- Yu, C.; Lakshmanan, L.V.; Amer-Yahia, S. Recommendation diversification using explanations. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering, Shanghai, China, 29 March–2 April 2009; pp. 1299–1302. [Google Scholar]
- Fletcher, S.; Islam, M.Z. Comparing sets of patterns with the Jaccard index. Australas. J. Inf. Syst. 2018, 22, 1–17. [Google Scholar] [CrossRef]
- Han, J.; Kamber, M. Data Mining Concepts and Techniques, 2nd ed.; Morgan Kaufmann: Burlington, MA, USA, 2006. [Google Scholar]
- Singh, M.D.; Krishna, P.R.; Saxena, A. A privacy preserving Jaccard similarity function for mining encrypted data. In Proceedings of the TENCON 2009: 2009 IEEE Region 10 Conference, Singapore, 23–26 November 2009; pp. 1–4. [Google Scholar]
- Gültekin, H. EU Research Projects. Available online: https://www.kaggle.com/hgultekin/eu-research-projects (accessed on 27 August 2020).
- Mullen, L. Introduction to the Tokenizers Package. Available online: https://cran.r-project.org/web/packages/tokenizers/vignettes/introduction-to-tokenizers.html (accessed on 19 August 2020).
- Simske, S. Meta Analytics: Consensus Approaches and System Patterns for Data Analysis, 1st ed.; Elsevier: Cambridge, MA, USA, 2019. [Google Scholar]
Project No. | Tokens | Project No | Tokens | Project No. | Tokens |
---|---|---|---|---|---|
1 | 26 | 16 | 18 | 7 | 13 |
18 | 25 | 23 | 18 | 31 | 13 |
20 | 25 | 32 | 18 | 33 | 13 |
19 | 24 | 4 | 17 | 49 | 13 |
22 | 24 | 9 | 17 | 12 | 12 |
13 | 23 | 11 | 17 | 36 | 12 |
2 | 22 | 15 | 17 | 44 | 11 |
21 | 22 | 17 | 17 | 50 | 11 |
29 | 22 | 25 | 17 | 37 | 10 |
39 | 21 | 26 | 17 | 38 | 10 |
6 | 20 | 35 | 17 | 46 | 10 |
24 | 20 | 30 | 16 | 48 | 10 |
28 | 20 | 3 | 15 | 10 | 8 |
40 | 20 | 34 | 15 | 42 | 7 |
14 | 19 | 41 | 15 | 43 | 7 |
27 | 19 | 45 | 15 | 47 | 7 |
5 | 18 | 8 | 14 |
Token | Frequency | Token | Frequency | Token | Frequency |
---|---|---|---|---|---|
disease | 13 | sector | 7 | initiative | 4 |
stakeholder | 11 | data | 6 | observation | 4 |
process | 10 | infrastructure | 6 | structure | 4 |
system | 10 | medicine | 6 | academia | 3 |
team | 10 | patient | 6 | action | 3 |
material | 9 | programme | 6 | cell | 3 |
mechanism | 9 | europe | 5 | chair | 3 |
application | 8 | expert | 5 | collaboration | 3 |
environment | 8 | innovation | 5 | communication | 3 |
institution | 8 | institute | 5 | community | 3 |
leader | 8 | management | 5 | competitive | 3 |
society | 8 | network | 5 | complex | 3 |
excellence | 7 | partnership | 5 | deep | 3 |
goal | 7 | region | 5 | ecosystem | 3 |
group | 7 | relation | 5 | effort | 3 |
position | 7 | economic | 4 | gene | 3 |
disease | 13 | industry | 4 |
Projects | 1 | 2 | Similarity 3 | Projects | Similarity 3 | Projects | Similarity 3 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
29 | 35 | 6 | 33 | 0.1818 | 8 | 19 | 4 | 34 | 0.1176 | 44 | 48 | 2 | 19 | 0.1053 |
34 | 40 | 5 | 30 | 0.1667 | 8 | 22 | 4 | 34 | 0.1176 | 5 | 8 | 3 | 29 | 0.1034 |
6 | 11 | 5 | 32 | 0.1563 | 23 | 24 | 4 | 34 | 0.1176 | 8 | 32 | 3 | 29 | 0.1034 |
31 | 37 | 3 | 20 | 0.1500 | 10 | 44 | 2 | 17 | 0.1176 | 11 | 34 | 3 | 29 | 0.1034 |
24 | 50 | 4 | 27 | 0.1481 | 16 | 44 | 3 | 26 | 0.1154 | 9 | 45 | 3 | 29 | 0.1034 |
20 | 34 | 5 | 35 | 0.1429 | 32 | 44 | 3 | 26 | 0.1154 | 2 | 29 | 4 | 40 | 0.1000 |
29 | 44 | 4 | 29 | 0.1379 | 32 | 50 | 3 | 26 | 0.1154 | 16 | 41 | 3 | 30 | 0.1000 |
11 | 26 | 4 | 30 | 0.1333 | 8 | 18 | 4 | 35 | 0.1143 | 41 | 43 | 2 | 20 | 0.1000 |
46 | 47 | 2 | 15 | 0.1333 | 18 | 41 | 4 | 36 | 0.1111 | 4 | 9 | 3 | 31 | 0.0968 |
27 | 43 | 3 | 23 | 0.1304 | 27 | 50 | 3 | 27 | 0.1111 | 9 | 35 | 3 | 31 | 0.0968 |
18 | 50 | 4 | 32 | 0.1250 | 37 | 38 | 2 | 18 | 0.1111 | 14 | 41 | 3 | 31 | 0.0968 |
12 | 45 | 3 | 24 | 0.1250 | 38 | 46 | 2 | 18 | 0.1111 | 31 | 38 | 2 | 21 | 0.0952 |
2 | 22 | 5 | 41 | 0.1220 | 43 | 49 | 2 | 18 | 0.1111 | 31 | 46 | 2 | 21 | 0.0952 |
21 | 22 | 5 | 41 | 0.1220 | 38 | 39 | 3 | 28 | 0.1071 | 6 | 34 | 3 | 32 | 0.0938 |
29 | 41 | 4 | 33 | 0.1212 | 19 | 43 | 3 | 28 | 0.1071 | 16 | 35 | 3 | 32 | 0.0938 |
7 | 34 | 3 | 25 | 0.1200 | 6 | 44 | 3 | 28 | 0.1071 | 20 | 38 | 3 | 32 | 0.0938 |
35 | 44 | 3 | 25 | 0.1200 | 8 | 43 | 2 | 19 | 0.1053 |
Project | Similarity 1 | Project | Similarity 1 | ||||
---|---|---|---|---|---|---|---|
5 | 6 | 30 | 0.200 | 31 | 1 | 30 | 0.033 |
41 | 4 | 29 | 0.138 | 8 | 1 | 31 | 0.032 |
18 | 5 | 38 | 0.132 | 45 | 1 | 32 | 0.031 |
17 | 3 | 32 | 0.094 | 30 | 1 | 33 | 0.030 |
16 | 3 | 33 | 0.091 | 9 | 1 | 34 | 0.029 |
10 | 2 | 24 | 0.083 | 32 | 1 | 35 | 0.029 |
46 | 2 | 26 | 0.077 | 14 | 1 | 36 | 0.028 |
22 | 2 | 40 | 0.050 | 24 | 1 | 37 | 0.027 |
42 | 1 | 24 | 0.042 | 40 | 1 | 37 | 0.027 |
38 | 1 | 27 | 0.037 | 29 | 1 | 39 | 0.026 |
44 | 1 | 28 | 0.036 | 20 | 1 | 42 | 0.024 |
7 | 1 | 30 | 0.033 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kabasakal, İ.; Soyuer, H. A Jaccard Similarity-Based Model to Match Stakeholders for Collaboration in an Industry-Driven Portal. Proceedings 2021, 74, 15. https://doi.org/10.3390/proceedings2021074015
Kabasakal İ, Soyuer H. A Jaccard Similarity-Based Model to Match Stakeholders for Collaboration in an Industry-Driven Portal. Proceedings. 2021; 74(1):15. https://doi.org/10.3390/proceedings2021074015
Chicago/Turabian StyleKabasakal, İnanç, and Haluk Soyuer. 2021. "A Jaccard Similarity-Based Model to Match Stakeholders for Collaboration in an Industry-Driven Portal" Proceedings 74, no. 1: 15. https://doi.org/10.3390/proceedings2021074015
APA StyleKabasakal, İ., & Soyuer, H. (2021). A Jaccard Similarity-Based Model to Match Stakeholders for Collaboration in an Industry-Driven Portal. Proceedings, 74(1), 15. https://doi.org/10.3390/proceedings2021074015