Data Quality as a Critical Success Factor for User Acceptance of Research Information Systems

: In our present paper, the influence of data quality on the success of the user acceptance of research information systems (RIS) is investigated and determined. Until today, only a little research has been done on this topic and no studies have been carried out. So far, just the importance of data quality in RIS, the investigation of its dimensions and techniques for measuring, improving, and increasing data quality in RIS (such as data profiling, data cleansing, data wrangling, and text data mining) has been focused. With this work, we try to derive an answer to the question of the impact of data quality on the success of RIS user acceptance. An acceptance of RIS users is achieved when the research institutions decide to replace the RIS and replace it with a new one. The result is a statement about the extent to which data quality influences the success of users' acceptance of RIS.


Introduction
Research information is often spread across different systems and tables within a research institution. The term research information includes all metadata that arise in connection with research activities, for example, information about persons, publications, project data, and patent data, etc. Since this information is often stored in different systems, research information systems (RIS) are needed to structure the information and to simplify the preparation of reports or to enable valueadded services. RIS is viewed as a holistic task to support research processes and as a central database for collecting, managing, and providing information about research activities and research results [1]. The special feature of RIS can combine the institution's internal systems such as personnel, student administration, finance, and price management systems as well as a variety of external data sources (e.g., Scopus, Web of Science, PubMed, arXiv, CrossRef, Mendeley, etc.). Researchers and administrators enter research information only once, and staff across the institution use the information in RIS for a variety of purposes. RIS provides the institution with a comprehensive overview of the activities, specialist areas, and services of its researchers and saves researchers time and effort because RIS makes it easier to create, update, and correct research profiles by automatically retrieving publication lists from relevant internal and external databases.
RIS exist on the market with various solutions both proprietary (such as Pure by Elsevier, Converis by Clarivate Analytics, and Symplectics by Elements) and open source solutions (e.g., VIVO by Connect-Share-Discover, FACTScience by QLEO Science GmbH, etc.) and provide an important basis for the analysis of research information. They supply a timely basis for decision-making and thus exempt the decision-maker from the need for an intuitive decision. Apart from its strategic importance to a research institution, a RIS project is usually time-consuming and costly. One of the key reasons for the failure of many RIS implementations is the lack of data quality. Incomplete, incorrect, redundant, and obsolete research information is collected and stored in the RIS. In addition, errors occur in data transmission and integration. Besides, there is often a lack of data maintenance by RIS administrators. Furthermore, analyses based on poor data values are carried out, which may lead to wrong decisions in research institutions. All of this can directly or indirectly contribute to the impairment of the reputation of institutions. Data quality plays an important role in the success of a RIS application, which is why the data quality in RIS has been evaluated in the corresponding papers [1][2][3][4][5][6][7][8] and proposed comprehensive solutions for the universities and research institutions that use them so that they can permanently check and ensure their quality in RIS.
Critical to the acceptance of a RIS in a research institution is that users have access to relevant research information, that they can be sure that the research information is accurate, complete, consistent, and that they can, therefore, trust the research information. Acceptance in a RIS is particularly dependent on the data quality. As stated in Reference [7], "the higher the quality of research information in RIS, the easier it is to make reporting and rendered data more accessible and reliable, the greater the acceptance of users".
For this reason, the influence of the data quality on the success of the user acceptance of RIS is determined in this paper. To achieve the goal, the technology acceptance model of Davis (1989) is used. The paper analyzes to what extent this model is suitable for predicting the acceptance of RIS. These focused on our results of the conducted survey (between February 2018 and September 2018) with universities and non-university research institutions in Germany using RIS, which are already covered and published [7]. In the results, the results obtained are analyzed using Descriptive Analysis, Cronbach's Alpha, Factor Analysis, Principal Component Analysis (PCA), and Regression Analyses. Correlations between the important four data quality criteria (completeness, correctness, consistency, and timeliness) [3] [6] and the acceptance success criteria are evaluated. To clarify the results, a structural equation model (SEM) is created. The results will be used to answer the following research question: To what extent does data quality represent a critical success factor for RIS user acceptance?
The research process of our paper is divided into five sections. Section (1) contains the introduction to the topic. Section (2) examines the impact of data quality on the Technology Acceptance Model (TAM) and examines the relationship between data quality and user acceptance of the associated institutional RIS. Section (3) presents the methodology and research results of the survey on the acceptance of RIS among using institutions. In Section (4), the research question is answered from a developed framework as a structural equation model (SEM). This can be provided to all academic institutions, especially those currently in the implementation phase of RIS. Finally, our paper ends with a summary of the most important results and an outlook.
The results of this study can offer RIS managers in academic institutions insights into the development of successful strategies and plans for user acceptance and dissemination of RIS applications.

State-of-the-Art Data Quality and User Acceptance
High-quality research information is a prerequisite for research institutions today. In research management, the importance of high data quality has grown steadily in recent years. Data quality contributes to increasing customer satisfaction and acceptance. Data quality can be a match between claims and expectations of an information system and its characteristics. Data quality can be perceived as negative or positive. The notion of quality is divided into five approaches according to References [9], [10], and [11] as follows: • The transcendent approach describes quality as a subjective experience of a person regarding the particular characteristics of an information system. • The product-related approach describes the quality of an information system due to the fulfillment of specified requirements. The quality can be determined without subjective perception. • The customer-centric approach describes the quality of an information system about the product user. The customer decides whether the product meets the required quality. • The process-oriented approach assumes that the product is of high quality if the process is optimal and on schedule. All product specifications are complied with. • The value-based approach sees quality as the fulfillment of service at an acceptable cost. An information system is of high quality if the costs and the performance are in an acceptable relationship to each other. • The approaches mentioned show that quality can be perceived differently, depending on the consideration. The evaluation of the quality of an identical information system can thus be different. Data quality is defined in the context of RIS as the appropriateness of the research information for use in certain required uses and must be accurate, complete, up-to-date, and consistent [6]. Poor research information has a negative impact on decision-making and the implementation and modification of RIS, leading to growing distrust in research organizations. Persistent effects generally result in the risk of incorrect research management decisions and low acceptance due to the loss of confidence of data users.
User acceptance is a common measure of RIS success. To analyze the acceptance, a definition of the term is required. The scientific study of the acceptance of information systems has been an integral part of social science, economic, and political research since the mid-1980s [12], [13]. For the concept of acceptance, there is no uniform, generally accepted definition in the scientific literature [14], [15], [16], [12]. In the literature, the acceptance of an information system or an IT application is usually used synonymously with the concept of adoption [17], [13]. The focus of this paper is on examining the acceptance at the level of the RIS user; therefore, the acceptance according to Reference [13] is defined as follows: "The acceptance of a user with respect to an information system is a condition and is expressed through the acceptance and use of it. In the process, this condition can take on different forms over time and be motivated both intrinsically and extrinsically." In this sense, user acceptance may be understood as the total affective and cognitive evaluation of the end user (e.g., research institutions) of the pleasurable fulfillment experienced with a RIS. "User acceptance is an important determinant of the planned, actual, and sustainable use of information systems, as it is highly related to the motivation and ability of users to use information systems. User acceptance can be described as a verifiable readiness of a user group to use information technology for the tasks for which it was developed" [18].
Since the introduction of business information systems, numerous theories and models have emerged dealing with the user acceptance of these systems. A large number of acceptance models are found in the literature (e.g., Acceptance Model by Degenhardt, 1986 [19], Technology Acceptance Model (TAM) by Davis, 1989 [20], and Technology Task Fit Model (TTFM) by Goodhue, 1995 [21], etc.), with the help of which the subjective, object-related, and context-related factors influencing the acceptance of information systems were described and related to each other. To look at the acceptance of RIS, the best-known and most influential technology acceptance model (TAM) is selected and adopted in the context of RIS, since TAM offers a much-used theoretical basis for empirical research because of its great popularity and good applicability. In the acceptance model, TAM is considered to be the best operationalized and empirically most extensively tested model for explaining the acceptance of technical systems [22]. The goal in developing Davis' TAM was to formulate a theory to explain user behavior in connection with a wide variety of information systems that is theoretically justified on the one hand and requires a small number of constructs and assumptions on the other hand [20]. Davis describes acceptance as the actual use of the technology by its (potential) users on the basis of perceived usefulness (PU) and perceived ease of use (PEOU). As the adjective "perceived" reflects, these are mental constructs that take into account the subjectively different attitudes of individuals. The perceived utility is defined as the perceived probability that the use of a specific information system increases professional performance in an organizational context [20]. The perceived ease of use is the degree prospective users expect "that using the target system would be free of effort" (Perceived ease of use refers to the degree to which a person believes that using a particular system would be free of effort) [20]. In the TAM, the interaction of these two factors (PU and PEOU) results in the intention of the user to use the technology in question (intention of use, BI). The use of a system, defined as the actual, direct use of a system by an individual in the context of his or her work [20], is influenced by the intended use. This is defined as the positive or negative feelings of an individual regarding the performance of behavior [20].
Several empirical studies have shown that Davis' perceived utility and perceived ease-of-use factors are valid indicators of the acceptance and actual use of technical systems [17]. Intentional use is influenced by perceived utility and perceived ease of use, and perceived ease of use and perceived utility are influenced by external factors. In this case, the relations form the core of the TAM and are taken over unchanged. In addition, data quality is one of the most important conditions for user acceptance of RIS, in addition to ease of use and other variables [7]. To identify the external variables in the context of RIS, an empirical study has highlighted the important drivers of data quality to increase user acceptance in RIS. Data quality in this context includes four aspects [3], [6]: • Completeness as the degree to which the system contains all the necessary information • Correctness as perceived correctness of the information in the system • Consistency as a measure of the perception to what extent the presentation of the information is represented consistently in the system • Timeliness as a measure of perception, to what extent the information is up to date.
These four aspects in the context of RIS have been explored in References [3] and [6] and their reliability and validity have been assessed in the context of the empirical study of 51 German RIS institutions with several items (more details can be found in this paper [6]). Moreover, it was found that this detailing of the data quality construct makes sense for the RIS acceptance model, as it provides more information than the important constructs on the structure of the perceived data quality for RIS users, thus opening up the possibility of targeted data quality management. RIS, which does not reliably identify or correct data quality issues or is itself a source of data quality issues, can (and will) not be trusted. Perceived quality problems affect the subjective performance expectations of the system. However, in the case of the RIS, poor data quality is all the more problematic as it involves strategic and sometimes highly sensitive information and decision-making tools, such as personal or financial data. The perceived data quality has a direct impact on the expected benefit and, therefore, an indirect effect on the intended and real use of the system [6]. Figure 1 illustrates this relationship as a RIS acceptance model.

Methodology and Results
To increase the acceptance and use of RIS, it is important to understand how users make decisions about the selection and use of RIS. Now, in this section, the results of the empirical study are evaluated. The investigation was carried out at 240 German universities and non-university research institutions and evaluated anonymously. In total, 160 of the 240 surveyed scientific institutions in Germany took part in the survey. The empirical study achieved a response rate of 67%. In 160 participants, 51 institutions used a RIS, while 61 institutions are in the implementation phase and the remaining 48 institutions do not use RIS but use their own developments or other databases.
Our addressees are the administrative staff (in the area of controlling, third-party funded projects, research funding and transfer, patents, and library), as speakers or project managers at universities and non-university research institutions who are responsible for introducing RIS or with their RIS for a maximum of six hours work per day and their facility has had the RIS for at least 24 to 100 months.
The main purpose of our study was to determine how it relates to the acceptance of RIS, whether RIS makes sense, how useful it is, and to what extent the user-friendliness of RIS looks like. The survey consisted of concrete questions on the topic of RIS and its use (see Figure 2). The questions asked are on the five-point scale (answer format: 1 = "strongly disagree" to 5 = "strongly agree"). To this end, the survey focuses on the 51 German universities and research institutions that have confirmed the use of a RIS. The respondents were persons who were responsible for the topic of RIS as project managers or research speakers in their institutions. Figure 3 shows the results of the acceptance of 51 RIS users in relation to the questions asked in Figure 2.  The results of the empirical investigation showed that a large part of RIS users is positive. The handling of RIS is described by most of the respondents as clear and understandable, which suggests that working with the RIS is not complicated and, therefore, no unnecessary waste of time takes place, such as, e.g., it would be the case with multiple entries or incorrect entries. It also means that switching to RIS is not a big challenge for many employees, so working with the RIS quickly is possible. Furthermore, most respondents feel that RIS improves their work performance. Employees benefit from the RIS as it eases workloads and shortens working hours while increasing efficiency and productivity. At the same time, employee satisfaction increases as well. Increased employee satisfaction provides increased productivity and effectiveness, as confirmed by the survey. Here, it was stated that a majority of the interviewees believe that the RIS increases their productivity and effectiveness. Although the majority rather believe that the RIS requires a mental effort, RIS, in turn, is useful for the work, which is confirmed by many respondents. Regarding the operation of RIS, some feel that the RIS is easy to use and others feel the opposite, so it can be assumed that RIS training or seminars should be considered, thus giving RIS employees and users one more chance to be able to guarantee the easy operation of the RIS. This is also proven because respondents believe that learning takes more time to ultimately serve the RIS as effectively as possible. However, the learning of RIS is relatively easy, as most could also learn RIS without outside help or manual. Although comprehensible error messages at the RIS are largely reported, the evaluation of the survey indicates that the error messages must be improved and understood. The same is also true for bug fixes, here it should come to refinements in order to offer the user better bug fixes. Respondents are also moderately satisfied with the correction effort for errors in the RIS so that an increased effort is required to be able to remedy certain errors. The survey also confirmed that the RIS is considered good and useful for universities and research institutes and that the needs of users of RIS are met. Since the vast majority of participants would recommend RIS, this reflects a high level of satisfaction with the RIS. Furthermore, it can be said that users enjoy working with the RIS and are more than convinced.
In summary, empirical research makes it clear that the RIS is meaningful and useful for universities and research institutions. Working with the RIS turns out to be effective, efficient, and enjoyable. The overall performance of the work is improved by the RIS.
To determine the representative sample of the acceptance factors with the RIS and to analyze the connection of the questions asked about the acceptance in RIS, we carried out the R-software based on respondent's response behavior in our empirical investigation. Table 1 shows the descriptive analysis results. As can be seen, the averages of the questions are in the range of 2.76 to 4.29 and indicate that the respondents answered the statements on average with approval. Figure 4 shows the result of the correlation analysis. A correlation coefficient is a measure of the relationship between the variables. Correlation coefficients can take values between (−1) and (+1). (+1) has a perfectly positive linear relationship and (−1) a completely negative one. With the value (0) there is no correlation between the variables. The results of the calculated correlation show that the values of the variables correspond to (1) and indicate a strong positive linear relationship between the variables. This also means that the variables for RIS user acceptance correlate strongly with each other.

Dependence between Data Quality and User Acceptance
Ensuring data quality is essential so that the undoubtedly valuable processes of the RIS can also be accepted in practice. This section looks at data quality dimensions (completeness, correctness, consistency, and timeliness) and user acceptance using TAM variables. To make a meaningful statement about the dependency between data quality and user acceptance, it is necessary to consider each other. To evaluate such consideration and estimation based on our results of the empirical investigation, a framework is suitable as a structural equation model (SEM). With this model, it is possible to find out to what extent the data quality has an influence on the success of acceptance of RIS. SEM provides a useful multivariate technique for researchers that combines Cronbach's Alpha, Factor Analysis, and Principal Component Analysis (PCA) to simultaneously estimate the relationships between observable variables (indicators or manifest variables) and unobservable variables (constructs, latent variables, or factors) [23]. A SEM consists of two elements: First, a structural model that describes the relationship between the endogenous and exogenous latent variables and allows the researcher to evaluate the direction and strength of the causal effects between these variables, and second, a measurement model that analyzes the relationship between the latent variables and the observed variables [23]. This model uses the reflective model because the latent variables affect the respective indicators. The number next to the arrow describes the relationship between the latent variable and the corresponding indicator. This number is to be interpreted as a factor load and indicates how strong the reliability is to the latent variable [24].
The SEM enables the measurement of the two unobservable variables of data quality and user acceptance. Data quality was measured by the four key indicators (completeness, correctness, consistency, and timeliness). For the consideration of their dependency relationship, a reliable and valid SEM was developed in the paper [6] and thus it turned out that these four dimensions have a high load. This means that they influence the improvement process in the RIS and are crucial for measuring quality in RIS [6]. A well-managed RIS with correct and complete, consistent, and up-todate research information is a prerequisite for everything to work smoothly. If the research information is incorrect, outdated, incomplete, or duplicate and multiple, the best RIS processes are ineffective. The basis of any RIS project in institutions is, therefore, a well-designed and maintained RIS.
For the measurement of user acceptance, the variables (PU1, PU2, PU3, PU4, PU5, PU6, PU7,  PU8, PU9, PEOU1, PEOU2, PEOU3, PEOU4, PEOU5, PEOU6, PEOU7, PEOU8, PEOU9, PEOU10,  PEOU11, PEOU12, and ATU1) are used as indicators. The framework as a SEM, which was created with the SmartPLS software on the data results of the survey conducted to assess the dependency between data quality and user acceptance of RIS, is shown in Figure 5. In the framework, data quality is started with the first latent variable and, for this, the SEM and its calculation in paper [6] can be taken. The second latent variable user acceptance is described by 22 indicators. To assess the reliability and validity of the RIS user acceptance indicators, Cronbach's Alpha Analysis, Factor Analysis, and Principal Component Analysis (PCA) were calculated. Table 2 shows the analysis results. • Cronbach's Alpha determines the reliability of the variables. This shows the reliability of the repeated measurements of a matter with a measuring instrument which give the same results [25]. Reliability at the indicator level allows statements about the extent to which an indicator variable is suitable as a measure of a latent variable [24]. The values of Cronbach's Alpha are between 0 and 1, with loadings above a value of 0.7 being significant. The result for all user acceptance indicators is above the significance level of 0.7. This means that all indicators fulfill reliability. The total Cronbach's Alpha value is considered to be a very good value at 0.9162. Therefore, the Cronbach's Alpha Reliability Coefficients show that all indicators are appropriate as a measure of the latent variable (user acceptance) and have a relative consistency for it. • Factor Analysis and Principal Component Analysis (PCA) were performed to determine the content and construct validity of indicators or user acceptance factors. In Factor Analysis, the factors should fully explain and interpret the relationships between observed variables. When interpreting the results of a factor analysis of acceptance scales or their items, account is taken of the number of factors, the amount of communalities, and the amount of the loadings. Using PCA reduces the data and extracts the factors. This is based on the determination of a covariance or correlation matrix. A PCA is a linear combination of all observed variables and allows the measurement of the indicators using Varimax Rotation, which minimizes the number of relationships and simplifies their interpretation. By means of the Kaiser-Meyer-Olkin criterion (KMO), the PCA can be determined. To calculate the factors, a coefficient greater than 0.5 was chosen to make the factor matrix more reliable, with the eigenvalue (variance) greater than 1 and KMO greater than 0.5 for measuring the adequacy of the sample [6]. The correlations of the factor values are called loadings and these explain the relationship between the indicators and the factor [6]. Factor loadings can be used to identify which indicators are highly correlated with which factor and which indicators can be assigned to this factor [6]. • The results of the investigations show that the total KMO value for all user acceptance indicators is 1.000. The KMO values for all indicators were above 0.5, indicating that the sample size was adequate and that there was sufficient indicator for each factor (PU, PEOU, and ATU). All indicators had a factor loading of more than 0.5, which means that all indicators can be loaded with the same factor. For the first factor PU, the extracted variance was 32.69% for one, while for the second factor the PEOU was 40.13%. The eigenvalue for both factors is greater than 1. For the factor ATU, however, also greater than 1, with an extracted variance of 27.17%. For all factors, the items are compared to related indicators and can be grouped into one factor.
To assess the SEM in Figure 5, it can be summarized that the TAM factors with their indicators influence the acceptance of RIS users. The results indicate that the reliability and validity of the 22 items (indicators) for the acceptance of RIS are reliable. The result of the PCA shows that the indicators have high validity in the construct and that the properties can be used as a test for the developed indicators.
The dimensions of the data quality and the success criteria of the user acceptance of RIS are juxtaposed to find their strong dependency and thus possible to investigate directional or causal effects. For this, we have performed the regression analysis for each combination of a data quality dimension and a success criterion of user acceptance. As illustrated in Figure 6, 88 regressions are made for the coefficient of determination of each combination. Using color coding, it is possible to locate the criteria with the greatest correlation. The closer the coefficient of determination is to 1, the darker the cell is colored and, if there is no correlation, the cell is white. The largest coefficient of determination has a correlation with the data quality dimension consistency and the success criterion of user acceptance PU3. It states that the consistency of the research information contributes 47% to the perceived usefulness to the success criterion of user acceptance. The second-largest is the correlation with the data quality dimension correctness and the success criteria of the user acceptance PU6, PU2, PU7, and PEOU6 and its coefficient of determination is 42% and 46%. The third-largest is the correlation of the data quality dimension completeness and the success criteria of user acceptance PU2, PEOU8, and PEOU10, which is 40% and 42%, respectively. The low coefficient of determination below 20% means that the success criteria of user acceptance are not explained by the corresponding data quality dimension.
In the SEM, it can be seen that the total coefficient of determination (R 2 ) for the data quality dimensions and success criteria for user acceptance is ≥0.67. This means that the user acceptance of RIS is explained by the calculated value (77.61%) of the data quality and its dimensions. The coefficient of determination of this study suggests that the dependency between the two latent variables of data quality dimensions and success criteria of user acceptance was significant. The four factors of data quality (completeness, correctness, consistency, and timeliness) are highly loaded, but only the three factors of data quality (completeness, correctness, and consistency) related to RIS acceptance are best explained. Compared to the actuality, this is explained less strongly.
Our research results indicate that data quality has a strong impact on the acceptance of RIS. Successful acceptance of RIS can only be achieved if data quality is constantly measured, analyzed, improved, and controlled in RIS. Our research in papers [4], [6], and [7] has further shown that RIS projects at scientific institutions are trying to ensure and increase data quality. This indicates that the subject of data quality in RIS plays an important role in the acceptance of RIS. Furthermore, the investigation of this research work has shown that data quality is a critical success factor in the acceptance of RIS by users.
It would be recommendable to deal intensively with the topic of data quality early on in the RIS project since depending on the amount of data or integrated data sources, a lot of effort may be required. One of the decisive reasons for the failure of many databases is poor data quality. Incorrect, redundant, incomplete, and outdated data are recorded and stored in the databases. In addition, there are errors in data transmission and data integration. At the same time, there is often insufficient data maintenance. High-quality databases are very important for achieving business goals. It is, therefore, recommended to check the data quality and improve it if necessary. If the research information is in the RIS and there is no data governance, optional cleanup actions have no effect. The quality of the RIS data is teamwork. If everyone participates, recognizes the advantages of good data quality, and feels responsible, this will have a positive impact on user acceptance and customer satisfaction.

Conclusion
Our research paper has answered the question "to what extent does the quality of data represent a critical success factor for the user acceptance of RIS?". For this purpose, a quantitative survey was carried out. With our survey, we were able to get a closer look at the data quality in RIS and the acceptance of RIS by the RIS using universities and research institutions. Our goal was to measure the dependence of acceptance success on data quality. The results showed that data quality in RIS was considered a critical success factor for user acceptance. The analysis results demonstrate a clear and unambiguous dependency of the acceptance success on the data quality, measured by the four data quality dimensions examined and 22 success criteria of the user acceptance of RIS. The result was a dependency on the acceptance of the data quality of 77.61%.
Data quality is an important issue today, not only in the context of RIS and its acceptance. The causes and effects of poor data quality are manifold. Most institutions today depend on the data and good data quality is an essential requirement for them. In order to meet this requirement, data governance or data quality team must be established and implemented in institutions.
In summary, it can be stated that poor data quality in RIS can lead to an acceptance of RIS not being successful. Therefore, it is a prerequisite for the success of user acceptance that there is a continuous assurance of data quality in RIS.