Next Article in Journal
IGR Token-Raw Material and Ingredient Certification of Recipe Based Foods Using Smart Contracts
Previous Article in Journal
Using Malone’s Theoretical Model on Gamification for Designing Educational Rubrics
Article Menu

Export Article

Open AccessArticle
Informatics 2019, 6(1), 10; https://doi.org/10.3390/informatics6010010

ETL Best Practices for Data Quality Checks in RIS Databases

1
German Center for Higher Education Research and Science Studies (DZHW), Schützenstraße 6a, 10117 Berlin, Germany
2
Institute for Technical and Business Information Systems—Database Research Group, Otto-von-Guericke-University Magdeburg, Universitätsplatz 2, 39106 Magdeburg, Germany
3
Department of Computer Science and Engineering, University of Applied Sciences—HTW Berlin, Wilhelminenhofstraße 75 A, 12459 Berlin, Germany
*
Author to whom correspondence should be addressed.
Received: 25 December 2018 / Revised: 24 February 2019 / Accepted: 27 February 2019 / Published: 5 March 2019
Full-Text   |   PDF [2595 KB, uploaded 5 March 2019]   |  
  |   Review Reports

Abstract

The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution’s internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed. View Full-Text
Keywords: research information systems (RIS); heterogeneous information sources; metadata; data integration; data transformation; extraction transformation load (ETL) technology; data quality research information systems (RIS); heterogeneous information sources; metadata; data integration; data transformation; extraction transformation load (ETL) technology; data quality
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Azeroual, O.; Saake, G.; Abuosba, M. ETL Best Practices for Data Quality Checks in RIS Databases. Informatics 2019, 6, 10.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Informatics EISSN 2227-9709 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top