The Expansion of Data Science: Dataset Standardization
Abstract
:1. Introduction
2. Dataset Standardization
- Findability, Accessibility, Interoperability, and Reuse (FAIR) [42]—An initiative that started defining a set of principles applicable to research data to promote interoperability between data sources.
- Document, Discover, and Interoperate (DDI) [43]—An initiative that started defining social science research data metadata standards. It provides a framework for describing and documenting research data and promoting data reuse in this field of science.
- Clinical Data Interchange Standards Consortium (CDISC) [44]—An initiative that started defining principles applicable to clinical research data. It provides a framework for describing, acquiring, and documenting data used in this field of science.
3. Strategy Analysis
- The increase in the consistency and accuracy of the data and datasets;
- The data become easier to interpret and analyze, also allowing faster technological innovation;
- It is possible to save time and resources in data pre-processing and analysis;
- Data sharing between systems becomes more accessible by ensuring interoperability;
- The ability to develop internal knowledge that directly increases productivity levels;
- The increased ease of collaboration between teams;
- The ability to provide data-related services easily, e.g., data analysis or sharing data that complies with a recognized standard.
- The implementation and development of a data standard requires a considerable amount of time;
- The implementation and development of a data standard requires a significant amount of resources, e.g., workers and hardware;
- If the data standard is not flexible enough to accommodate the necessary data types or respective content, it can be impossible to implement;
- The developed standards may not be compatible with currently existing tools and data sources;
- Current teams may require training time to adopt the standards effectively;
- An initial investment is needed to implement a proper structure to perform data standardization more easily.
- The generalized adoption of a data standard makes it possible to increase external collaboration and interoperability;
- The developed applications and software provide direct interoperability with any dataset following the data standard;
- The well-known standards allow fast response since applications and services use consistent and accurate data;
- The boost in the research and development in the data science field;
- The growth in the economy since many companies can benefit from the advantages of data standardization;
- The recruitment process becomes easier for the company and the worker: since the worker already knows the dataset standard, he/she can become productive earlier without requiring the usual adaptation time.
- The existence of several redundant or competing standards with property formats not open to everyone;
- The data standards, if not correctly updated periodically, can become rapidly obsolete;
- Even with a standard, the data can be misinterpreted or manipulated;
- Some companies can develop standards to decrease interoperability and maintain service or application exclusivity;
- It is important to standardize data to ensure privacy and security;
- The cost-effectiveness of the data standardization investment since the data standards may not be accepted globally.
4. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kim, M.; Zimmermann, T.; DeLine, R.; Begel, A. Data scientists in software teams: State of the art and challenges. IEEE Trans. Softw. Eng. 2017, 44, 1024–1038. [Google Scholar] [CrossRef]
- Davenport, T.H.; Patil, D. Data scientist. Harv. Bus. Rev. 2012, 90, 70–76. [Google Scholar] [PubMed]
- Gibert, K.; Horsburgh, J.S.; Athanasiadis, I.N.; Holmes, G. Environmental data science. Environ. Model. Softw. 2018, 106, 4–12. [Google Scholar] [CrossRef]
- Nasution, M.K.; Sitompul, O.S.; Nababan, E.B. Data science. J. Phys. Conf. Ser. 2020, 1566, 012034. [Google Scholar] [CrossRef]
- Coenen, F. Data mining: Past, present and future. Knowl. Eng. Rev. 2011, 26, 25–29. [Google Scholar] [CrossRef]
- Sarker, I.H. Data science and analytics: An overview from data-driven smart computing, decision-making and applications perspective. SN Comput. Sci. 2021, 2, 377. [Google Scholar] [CrossRef]
- Inmon, W.H. The data warehouse and data mining. Commun. ACM 1996, 39, 49–51. [Google Scholar] [CrossRef]
- Mikut, R.; Reischl, M. Data mining tools. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 431–443. [Google Scholar] [CrossRef]
- Sterne, J. Artificial Intelligence for Marketing: Practical Applications; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
- Obenshain, M.K. Application of data mining techniques to healthcare data. Infect. Control Hosp. Epidemiol. 2004, 25, 690–695. [Google Scholar] [CrossRef]
- Kohavi, R.; Provost, F. Applications of Data Mining to Electronic Commerce; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
- Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. From data mining to knowledge discovery in databases. AI Mag. 1996, 17, 37. [Google Scholar]
- Fayyad, U.M.; Piatetsky-Shapiro, G.; Smyth, P. Knowledge Discovery and Data Mining: Towards a Unifying Framework; AAAI Press: Portland, OR, USA, 1996; Volume 96, pp. 82–88. [Google Scholar]
- Sismanoglu, G.; Onde, M.A.; Kocer, F.; Sahingoz, O.K. Deep learning based forecasting in stock market with big data analytics. In Proceedings of the 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, 24–26 April 2019; pp. 1–4. [Google Scholar]
- Mohamed, N.; Al-Jaroodi, J. Real-time big data analytics: Applications and challenges. In Proceedings of the 2014 International Conference on High Performance Computing & Simulation (HPCS), Bologna, Italy, 21–25 July 2014; pp. 305–310. [Google Scholar]
- Stach, C. Data Is the New Oil–Sort of: A View on Why This Comparison Is Misleading and Its Implications for Modern Data Administration. Future Internet 2023, 15, 71. [Google Scholar] [CrossRef]
- Loi, M.; Dehaye, P.O. If data is the new oil, when is the extraction of value from data unjust? Filos. Quest. Pubbliche 2017, 7, 137–178. [Google Scholar]
- Possler, D.; Bruns, S.; Niemann-Lenz, J. Data Is the New Oil—But How Do We Drill It? Pathways to Access and Acquire Large Data Sets in Communication Science. Int. J. Commun. (19328036) 2019, 13, 3894–3911. [Google Scholar]
- Bojer, C.S.; Meldgaard, J.P. Kaggle forecasting competitions: An overlooked learning opportunity. Int. J. Forecast. 2021, 37, 587–603. [Google Scholar] [CrossRef]
- Asuncion, A.; Newman, D. UCI Machine Learning Repository. 2007. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 21 April 2023).
- Yang, X.; Zeng, Z.; Teo, S.G.; Wang, L.; Chandrasekhar, V.; Hoi, S. Deep learning for practical image recognition: Case study on kaggle competitions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 923–931. [Google Scholar]
- Iglovikov, V.; Mushinskiy, S.; Osin, V. Satellite imagery feature detection using deep convolutional neural network: A kaggle competition. arXiv 2017, arXiv:1706.06169. [Google Scholar]
- Taieb, S.B.; Hyndman, R.J. A gradient boosting approach to the Kaggle load forecasting competition. Int. J. Forecast. 2014, 30, 382–394. [Google Scholar] [CrossRef]
- Kasunic, M. Measuring Systems Interoperability: Challenges and Opportunities. Defense Technical Information Center. 2001. Available online: https://apps.dtic.mil/sti/pdfs/ADA400176.pdf (accessed on 21 April 2023).
- Tolk, A.; Muguira, J.A. The levels of conceptual interoperability model. In Proceedings of the 2003 Fall Simulation Interoperability Workshop, Orlando, FL, USA, 14–19 September 2003; Volume 7, pp. 1–11. [Google Scholar]
- Engel, J.; Gerretzen, J.; Szymańska, E.; Jansen, J.J.; Downey, G.; Blanchet, L.; Buydens, L.M. Breaking with trends in pre-processing? TrAC Trends Anal. Chem. 2013, 50, 96–106. [Google Scholar] [CrossRef]
- Rinnan, Å. Pre-processing in vibrational spectroscopy–when, why and how. Anal. Methods 2014, 6, 7124–7129. [Google Scholar] [CrossRef]
- Foley, J.D.; Van, F.D.; Van Dam, A.; Feiner, S.K.; Hughes, J.F. Computer Graphics: Principles and Practice; Addison-Wesley Professional: Boston, MA, USA, 1996; Volume 12110. [Google Scholar]
- Geraci, A. IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries; IEEE Press: Piscataway, NJ, USA, 1991. [Google Scholar]
- Mora, A.; Riera, D.; Gonzalez, C.; Arnedo-Moreno, J. A literature review of gamification design frameworks. In Proceedings of the 2015 7th International Conference on Games and Virtual Worlds for Serious Applications (VS-Games), Skövde, Sweden, 16–18 September 2015; pp. 1–8. [Google Scholar]
- Wegner, P. Interoperability. ACM Comput. Surv. (CSUR) 1996, 28, 285–287. [Google Scholar] [CrossRef]
- Mellal, M.A. Obsolescence—A review of the literature. Technol. Soc. 2020, 63, 101347. [Google Scholar] [CrossRef]
- Mihaescu, M.C.; Popescu, P.S. Review on publicly available datasets for educational data mining. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2021, 11, e1403. [Google Scholar] [CrossRef]
- Sarsby, A. SWOT Analysis—A Guide to SWOT for Business Studies Students; Spectaris Ltd - Leadership Library: Suffolk, UK, 2016. [Google Scholar]
- Benzaghta, M.A.; Elwalda, A.; Mousa, M.M.; Erkan, I.; Rahman, M. SWOT analysis applications: An integrative literature review. J. Glob. Bus. Insights 2021, 6, 55–73. [Google Scholar] [CrossRef]
- Leigh, D. SWOT Analysis. In Handbook of Improving Performance in the Workplace: Volumes 1–3; Pfeiffer: San Francisco, CA, USA, 2009. [Google Scholar]
- Larson, D.; Chang, V. A review and future direction of agile, business intelligence, analytics and data science. Int. J. Inf. Manag. 2016, 36, 700–710. [Google Scholar] [CrossRef]
- Kumari, A.; Tanwar, S.; Tyagi, S.; Kumar, N. Verification and validation techniques for streaming big data analytics in internet of things environment. IET Netw. 2019, 8, 155–163. [Google Scholar] [CrossRef]
- Acharjya, D.P.; Ahmed, K. A survey on big data analytics: Challenges, open research issues and tools. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 511–518. [Google Scholar]
- Anuradha, J. A brief introduction on Big Data 5Vs characteristics and Hadoop technology. Procedia Comput. Sci. 2015, 48, 319–324. [Google Scholar]
- Sagiroglu, S.; Sinanc, D. Big data: A review. In Proceedings of the 2013 International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, USA, 20–24 May 2013; pp. 42–47. [Google Scholar]
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 1–9. [Google Scholar] [CrossRef]
- Vardigan, M.; Heus, P.; Thomas, W. Data documentation initiative: Toward a standard for the social sciences. Int. J. Digit. Curation 2008, 3. [Google Scholar] [CrossRef]
- Sato, I.; Kawasaki, Y.; Ide, K.; Sakakibara, I.; Konomura, K.; Yamada, H.; Tanaka, Y. Clinical data interchange standards consortium standardization of biobank data: A feasibility study. Biopreserv. Biobank. 2016, 14, 45–50. [Google Scholar] [CrossRef]
- Maharana, K.; Mondal, S.; Nemade, B. A review: Data pre-processing and data augmentation techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
- Akhigbe, A.; Stevenson, B.A. Profit efficiency in US BHCs: Effects of increasing non-traditional revenue sources. Q. Rev. Econ. Financ. 2010, 50, 132–140. [Google Scholar] [CrossRef]
- Schüritz, R.; Seebacher, S.; Dorner, R. Capturing value from data: Revenue models for data-driven services. In Proceedings of the 50th Hawaii International Conference on System Sciences, San Diego, CA, USA, 4–7 January 2017. [Google Scholar] [CrossRef]
- Byun, J.W.; Sohn, Y.; Bertino, E.; Li, N. Secure anonymization for incremental datasets. In Proceedings of the Secure Data Management: Third VLDB Workshop, SDM 2006, Seoul, Republic of Korea, 10–11 September 2006; Proceedings 3. Springer: Berlin/Heidelberg, Germany, 2006; pp. 48–63. [Google Scholar]
- Bayardo, R.J.; Agrawal, R. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05), IEEE, Tokyo, Japan, 5–8 April 2005; pp. 217–228. [Google Scholar]
- Murthy, S.; Bakar, A.A.; Rahim, F.A.; Ramli, R. A comparative study of data anonymization techniques. In Proceedings of the 2019 IEEE 5th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing,(HPSC) and IEEE International Conference on Intelligent Data and Security (IDS), Washington, DC, USA, 5–8 April 2019. [Google Scholar]
- Ghinita, G.; Karras, P.; Kalnis, P.; Mamoulis, N. Fast data anonymization with low information loss. In Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 23–27 September 2007; pp. 758–769. [Google Scholar]
- Loukides, G.; Gkoulalas-Divanis, A. Utility-preserving transaction data anonymization with low information loss. Expert Syst. Appl. 2012, 39, 9764–9777. [Google Scholar] [CrossRef]
- Raghunathan, T.E. Synthetic data. Annu. Rev. Stat. Appl. 2021, 8, 129–140. [Google Scholar] [CrossRef]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 2013, 34, 483–519. [Google Scholar] [CrossRef]
- Pessanha Santos, N.; Lobo, V.; Bernardino, A. Two-stage 3D model-based UAV pose estimation: A comparison of methods for optimization. J. Field Robot. 2020, 37, 580–605. [Google Scholar] [CrossRef]
- Pessanha Santos, N.; Melício, F.; Lobo, V.; Bernardino, A. A ground-based vision system for UAV pose estimation. Int. J. Robot. Mechatron. 2014, 1, 138–144. [Google Scholar] [CrossRef]
- Tremblay, J.; Prakash, A.; Acuna, D.; Brophy, M.; Jampani, V.; Anil, C.; To, T.; Cameracci, E.; Boochoon, S.; Birchfield, S. Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 969–977. [Google Scholar]
- Handa, A.; Patraucean, V.; Badrinarayanan, V.; Stent, S.; Cipolla, R. Understanding real world indoor scenes with synthetic data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4077–4085. [Google Scholar]
- Cong, G.; Fan, W.; Geerts, F.; Jia, X.; Ma, S. Improving Data Quality: Consistency and Accuracy; VLDB: Marousi, Athens, Greece, 2007. [Google Scholar]
- Han, J. Data Mining: Concepts and Techniques; Han, J., Kamber, M., Pei, J., Eds.; Elservier—Morgan Kaufman Publishers: Burlington, MA, USA, 2011. [Google Scholar]
- Kirianaki, N.V.; Yurish, S.Y.; Shpak, N.O.; Deynega, V.P. Data Acquisition and Signal Processing for Smart Sensors; Wiley: New York, NY, USA, 2002. [Google Scholar]
- Römer, K.; Blum, P.; Meier, L. Time synchronization and calibration in wireless sensor networks. Handb. Sens. Netw. Algorithms Archit. 2005, 199–237. [Google Scholar] [CrossRef]
- Kaggle. Available online: https://www.kaggle.com/ (accessed on 21 April 2023).
- Kang, W.X.; Yang, Q.Q.; Liang, R.P. The comparative research on image segmentation algorithms. In Proceedings of the 2009 First International Workshop on Education Technology and Computer Science, Wuhan, China, 7–8 March 2009; Volume 2, pp. 703–707. [Google Scholar]
- Zhang, X.; Dahu, W. Application of artificial intelligence algorithms in image processing. J. Vis. Commun. Image Represent. 2019, 61, 42–49. [Google Scholar] [CrossRef]
- Yang, R.; Yu, Y. Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis. Front. Oncol. 2021, 11, 638182. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Tobias Springenberg, J.; Brox, T. Learning to generate chairs with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1538–1546. [Google Scholar]
- Mahmudur Rahman Khan, M.; Bente Arif, R.; Abu Bakr Siddique, M.; Rahman Oishe, M. Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository. arXiv 2018, arXiv:1809.06186. [Google Scholar]
- Kotsiantis, S.B.; Zaharakis, I.; Pintelas, P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007, 160, 3–24. [Google Scholar]
- Ezugwu, A.E.; Ikotun, A.M.; Oyelade, O.O.; Abualigah, L.; Agushaka, J.O.; Eke, C.I.; Akinyelu, A.A. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng. Appl. Artif. Intell. 2022, 110, 104743. [Google Scholar] [CrossRef]
- Data.gov. Available online: https://www.data.gov/ (accessed on 21 April 2023).
- Ding, L.; DiFranzo, D.; Graves, A.; Michaelis, J.R.; Li, X.; McGuinness, D.L.; Hendler, J. Data-gov wiki: Towards linking government data. In Proceedings of the 2010 AAAI Spring Symposium Series, Stanford, CA, USA, 22–24 March 2010. [Google Scholar]
- Krishnamurthy, R.; Awazu, Y. Liberating data for public value: The case of Data. gov. Int. J. Inf. Manag. 2016, 36, 668–672. [Google Scholar] [CrossRef]
- Stevens, H. Open data, closed government: Unpacking data.gov.sg. First Monday 2019, 24. [Google Scholar] [CrossRef]
- Google Dataset Search. Available online: https://datasetsearch.research.google.com/ (accessed on 21 April 2023).
- Grimmer, J.; Roberts, M.E.; Stewart, B.M. Machine learning for social science: An agnostic approach. Annu. Rev. Political Sci. 2021, 24, 395–419. [Google Scholar] [CrossRef]
- Dixon, M.F.; Halperin, I.; Bilokon, P. Machine Learning in Finance; Springer: Berlin/Heidelberg, Germany, 2020; Volume 1170. [Google Scholar]
- Amazon Web Services Open Data. Available online: https://registry.opendata.aws/ (accessed on 21 April 2023).
- Kashinath, K.; Mustafa, M.; Albert, A.; Wu, J.; Jiang, C.; Esmaeilzadeh, S.; Azizzadenesheli, K.; Wang, R.; Chattopadhyay, A.; Singh, A.; et al. Physics-informed machine learning: Case studies for weather and climate modelling. Philos. Trans. R. Soc. A 2021, 379, 20200093. [Google Scholar] [CrossRef] [PubMed]
- Kiwelekar, A.W.; Mahamunkar, G.S.; Netak, L.D.; Nikam, V.B. Deep learning techniques for geospatial data analysis. In Machine Learning Paradigms: Advances in Deep Learning-Based Technological Applications; Springer: Berlin/Heidelberg, Germany, 2020; pp. 63–81. [Google Scholar]
- Microsoft Research Open Data. Available online: https://msropendata.com/ (accessed on 21 April 2023).
- Khan, A.I.; Al-Habsi, S. Machine learning in computer vision. Procedia Comput. Sci. 2020, 167, 1444–1451. [Google Scholar] [CrossRef]
- Sebe, N.; Cohen, I.; Garg, A.; Huang, T.S. Machine Learning in Computer Vision; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2005; Volume 29. [Google Scholar]
- Otter, D.W.; Medina, J.R.; Kalita, J.K. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 604–624. [Google Scholar] [CrossRef]
- Li, H. Deep learning for natural language processing: Advantages and challenges. Natl. Sci. Rev. 2018, 5, 24–26. [Google Scholar] [CrossRef]
- World Bank Open Data. Available online: https://data.worldbank.org/ (accessed on 21 April 2023).
- Adams, R.H. Economic Growth, Inequality and Poverty: Findings from a New Data Set; World Bank Publications: Washington, DC, USA, 2003; Volume 2972. [Google Scholar]
- Altinok, N.; Angrist, N.; Patrinos, H.A. Global data set on education quality (1965–2015). World Bank Policy Res. Work. Pap. 2018. Available online: http://hdl.handle.net/10986/29281 (accessed on 21 April 2023).
- Rolnick, D.; Donti, P.L.; Kaack, L.H.; Kochanski, K.; Lacoste, A.; Sankaran, K.; Ross, A.S.; Milojevic-Dupont, N.; Jaques, N.; Waldman-Brown, A.; et al. Tackling climate change with machine learning. ACM Comput. Surv. (CSUR) 2022, 55, 1–96. [Google Scholar] [CrossRef]
- Ardabili, S.; Mosavi, A.; Dehghani, M.; Várkonyi-Kóczy, A.R. Deep learning and machine learning in hydrological processes climate change and earth systems a systematic review. In Engineering for Sustainable Future: Selected Papers of the 18th International Conference on Global Research and Education Inter-Academia—2019, Budapest & Balatonfüred, Hungary, 4–7 September 2019; Springer: Cham, Switzerland, 2019; pp. 52–62. [Google Scholar]
- Javornik, M.; Nadoh, N.; Lange, D. Data is the new oil: How data will fuel the transportation industry—The airline industry as an example. In Towards User-Centric Transport in Europe: Challenges, Solutions and Collaborations; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Larose, D.T.; Larose, C.D. Discovering Knowledge in Data: An Introduction to Data Mining; John Wiley & Sons: Hoboken, NJ, USA, 2014; Volume 4. [Google Scholar]
- Nickols, F. Strategy, strategic management, strategic planning and strategic thinking. Manag. J. 2016, 1, 4–7. [Google Scholar]
- Olson, E.M.; Slater, S.F.; Hult, G.T.M. The importance of structure and process to strategy implementation. Bus. Horizons 2005, 48, 47–54. [Google Scholar] [CrossRef]
- Okumus, F. Towards a strategy implementation framework. Int. J. Contemp. Hosp. Manag. 2001, 13, 327–338. [Google Scholar] [CrossRef]
- Teece, D.J. SWOT Analysis. In The Palgrave Encyclopedia of Strategic Management; Augier, M., Teece, D.J., Eds.; Palgrave Macmillan: London, UK, 2018; pp. 1689–1690. [Google Scholar] [CrossRef]
- Weihrich, H. The TOWS matrix—A tool for situational analysis. Long Range Plan. 1982, 15, 54–66. [Google Scholar] [CrossRef]
- Mintzberg, H.; Ahlstrand, B.; Lampel, J.B. Strategy Safari: A Guided Tour through the Wilds of Strategic Management; Simon & Schuster Inc.: New York, NY, USA, 1998. [Google Scholar]
- Hill, C.W.; Jones, G.R.; Schilling, M.A. Strategic Management: Theory: An Integrated Approach; Cengage Learning: Boston, MA, USA, 2014. [Google Scholar]
- Doz, Y.L.; Prahalad, C.K. Managing DMNCs: A search for a new paradigm. Strateg. Manag. J. 1991, 12, 145–164. [Google Scholar] [CrossRef]
- Ghemawat, P. Distance still matters—The hard reality of global expansion. Hanvard Bus. Rev. 2001, 79, 137. [Google Scholar]
- Kaplan, R.S.; Norton, D.P. The Balanced Scorecard: Translating Strategy into Action; Harvard Business Press: Cambridge, MA, USA, 1996. [Google Scholar]
- Lynch, R.L.; Cross, K.F. Measure Up!: The Essential Guide to Measuring Business Performance; Mandarin: London, UK, 1991. [Google Scholar]
- Austin, R.D. Business Performance Measurement: Theory and Practice; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Mello, R.; Martins, R.A. Can big data analytics enhance performance measurement systems? IEEE Eng. Manag. Rev. 2019, 47, 52–57. [Google Scholar] [CrossRef]
- Armstrong, M.; Baron, A. Performance Management; Kogan Page Limited: London, UK, 2000. [Google Scholar]
- Ledolter, J. Data Mining and Business Analytics with R; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Dataset | Open Data | Access | Update Frequency | User Interface | API Access |
---|---|---|---|---|---|
Kaggle [63] | No | Free Paid | Daily | User friendly (interactive) | Yes |
UCIMLR [20] | Yes | Free | Irregular | User friendly (simple) | No |
Data.gov [71] | Yes | Free | Irregular | User friendly (simple) | Yes |
GDS [75] | Yes | Free | Irregular | Simple search interface | No |
AWSODR [78] | Yes | Free | Irregular | User friendly (simple) | Yes |
MROD [81] | Yes | Free | Irregular | Simple search interface | Yes |
WBOD [86] | Yes | Free | Irregular | User friendly (simple) | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pessanha Santos, N. The Expansion of Data Science: Dataset Standardization. Standards 2023, 3, 400-410. https://doi.org/10.3390/standards3040028
Pessanha Santos N. The Expansion of Data Science: Dataset Standardization. Standards. 2023; 3(4):400-410. https://doi.org/10.3390/standards3040028
Chicago/Turabian StylePessanha Santos, Nuno. 2023. "The Expansion of Data Science: Dataset Standardization" Standards 3, no. 4: 400-410. https://doi.org/10.3390/standards3040028
APA StylePessanha Santos, N. (2023). The Expansion of Data Science: Dataset Standardization. Standards, 3(4), 400-410. https://doi.org/10.3390/standards3040028