A Comprehensive Framework to Reinforce Evidence Synthesis Features in Cloud-Based Systematic Review Tools

: Systematic reviews are powerful methods used to determine the state-of-the-art in a given ﬁeld from existing studies and literature. They are critical but time-consuming in research and decision making for various disciplines. When conducting a review, a large volume of data is usually generated from relevant studies. Computer-based tools are often used to manage such data and to support the systematic review process. This paper describes a comprehensive analysis to gather the required features of a systematic review tool, in order to support the complete evidence synthesis process. We propose a framework, elaborated by consulting experts in different knowledge areas, to evaluate signiﬁcant features and thus reinforce existing tool capabilities. The framework will be used to enhance the currently available functionality of CloudSERA, a cloud-based systematic review tool focused on Computer Science, to implement evidence-based systematic review processes in other disciplines.


Introduction
Research and development activity and decision making usually require a preliminary study of related literature to understand the up-to-date, state-of-the-art issues, techniques and methods in a given research field. For instance, Health Science researchers need to find out the scientific evidence that supports their clinical decisions. Analyses in bibliometrics [1], science mapping [2,3] and logology [4] need to operate on data records that are usually retrieved from queries to a bibliographic database, such as Clarivate's Web of Science or Elsevier's SCOPUS, or a patent registry. A huge volume of data is published and stored in digital bibliographic repositories, which are often manually reviewed in order to select those related to the field and research purpose. Thus, it is important to be acquainted with the quality of the evidence provided in these studies. In this vein, tools such as GRADEpro [5] have emerged, to synthesize and evaluate the quality of evidence found in health science-related studies.
Rooted in the Health Sciences, Evidence Synthesis (ES) methods are used to aggregate the global message of a set of studies [6]. The main goal of ES is to evaluate the included studies and select appropriate methods for integrating their information [7]. ES methods can be used to synthesize both qualitative and quantitative evidence [8], according to the type of research questions and forms of evidence analyzed. These methods are often specific or adapted to a given field. For example, scoping, thematic analysis, narrative synthesis, comparative analysis, meta-analysis, case survey and meta-ethnography are ES methods in the Software Engineering field [9].
Evidence synthesis approaches are seamlessly linked to Systematic Reviews (SR) methods, which enable researchers to identify, evaluate and interpret the existing research that is relevant for a particular Research Question (RQ) or phenomenon of interest [10]. Some reasons for performing SRs include: to synthesize the existing evidence concerning a given topic, to identify gaps in the current research and to suggest areas for further investigation; and to provide a background for positioning new research lines [11].
Focused on the disciplines of the Health Sciences, the ES methods' steps are defined as the following [12]: aggregate information; explain or interpret processes, perceptions, beliefs and values; develop theory; identify gaps in the literature or the need for future research; explore methodological aspects of a method or topic; and develop or describe frameworks, guidelines, models, measures, scales or programmes. SR methods have also been used in domains such as Environmental Sciences [13] and Computer Science [14], which have benefited from the ES approach. In the latter field, a set of guidelines for performing Systematic Literature Reviews (SLR) has been published [14]. The guidelines define an SLR as a process consisting of three stages, namely, planning, conducting and reporting. The SLR method has become a popular research methodology for conducting literature reviews and evidence aggregation in Software Engineering. Similarly, Systematic Mapping Studies (SMS) and scope studies enable researchers to obtain a wide overview of a research area, providing them with a quantitative indication of the evidence found [15].
It is important to note that, regardless of the discipline in which ES is applied, a considerable number of studies must be processed and, eventually, selected as primary. Consequently, the information provided for those studies should be methodically synthesized. This process is a time-consuming task and is difficult to conduct manually. For this reason, using computer tools to support the process is essential in research and decision-making.
The goal of this paper is to analyse and collect the essential features of ES methods in order to present a framework that can be used to improve cloud-based SR support tools, thus fostering comprehensive evidence-based systematic review processes. The main contribution is a framework that aggregates cloud-based ES features as proposed and used in existing SR tools. To accomplish this goal, a design and creation research strategy [16] has been followed and applied around the CloudSERA software artifact. CloudSERA [17] is a cloud-based web application that supports systematic reviews of scientific literature. Its current version is focused on SLR processes applied in Computer Science. The tool has been previously evaluated in the Computer Science discipline under the scope of the SLR methods. For the sake of generality, the features of future versions of CloudSERA have to be proposed and assessed within other research domains beyond Computer Science.
The research output of the design and creation strategy is a framework or construct that covers the concepts and vocabulary [16] used in the ES and SR domains. This construct is the basis of instantiations or working systems, such as the future version of CloudSERA. As is common in computing and information systems research, the methodology involves analysing, designing and developing a computer-based product to explore and exhibit the possibilities of software technologies applied to the SR domain. This work does not approach an illustration of technical prowess in the development of the software artifacts, but instead an analysis, argument and critical evaluation [16] of the features to be considered for augmenting CloudSERA. Therefore, a survey with experts from Computer Science and other disciplines has been performed to collect their opinions about the features included in the proposed framework, and to gather new necessary features with the objective of applying them in the next versions of CloudSERA, reinforcing its functionality to enable the ES process and using it in a multidisciplinary way.

Evidence-Based Systematic Review Analysis Framework
Following the aim to define a framework that provides the set of features needed in a tool to cover an evidence-based SR process, we analyzed the existing tools that support such processes. First, we have based our framework on the prior analysis of Kohl [18], who carried out an evaluation of existing SR tools according to a set of features selected from previous studies. Second, we considered the work of Hassler [19], who conducted a community workshop with Software Engineering researchers in order to identify and prioritize the necessary features of an SLR tool. Finally, the work of Manterola [20] provided us with a suite of ES features as the relevant steps to be carried out in ES methods. Based on such previous studies, we defined an analysis framework, as shown in Figure 1, using the BPMN notation. It is important to note that two participant profiles have been defined in the framework, namely researcher and data scientist, because of the complexity of the complete process. The same individuals can perform both roles as long as they have learned specific data analysis techniques and tools. For instance, Cheng [21] notes the importance of using machine learning techniques for applying ES methods in conservation and environmental studies. Based on the proposed framework, each main feature required for an integrated evidence-based SR tool is classified in one of the following categories (see Table 1 Table 2 show the results of an analysis made on a set of SR tools that are not focused on specific research fields [18]. Thus, the selected tools do not depend on the particular features required in a specific discipline. These tools are: CADIMA [18], Colandr [22], Dis-tillerSR [23], EPI-Reviewer [24], METAGEAR R package [25], Rayyan [26], ReviewER [27], SESRA [28], SLR-Tool [29] and CloudSERA [17]. Subsequently, the results obtained are discussed for each tool and feature category. From the feature analysis shown in Table 3, we can see that some tools are more focused on the management of information related to selected studies, such as ReviewER or CloudSERA. Other tools are focused on information synthesis, such as the METAGEAR R package. Additionally, CADIMA, SLR-Tool and Colandr are balanced in both categories. We can notice how there is no tool that fully covers all requirement categories. In addition, the evidence synthesis category has an average coverage score of 38.54% and the maximum obtained by a tool is 57.1%. This might indicate that more work needs to be done on researching and improving these types of features in the SR process support tools. Eventually, the result of the analysis on the current version of CloudSERA yields the following scores: •  The previous results would guide us towards the new functionalities that must be incorporated in CloudSERA, besides identifying those that are essential for covering the complete ES process, which is the main goal of this work.

Cloudsera Features and Implementation
From the previous results, none of the analyzed SR tools cover the complete SR process. For this reason, we plan to expand CloudSERA to incorporate the complete set of evidence synthesis features. The current state of CloudSERA functionality is described below.

Cloudsera Features
Since it is a cloud-based web application, CloudSERA does not require installation or configuration. It is available online for free usage [30]. Besides, CloudSERA is an open-source tool, with its source code openly released in GitHub [31].
Concerning the features of the non-functional category, the Grails framework has been used to develop the application. Figure 2 shows the conceptual information model managed by the tool. The user interface has been built using the Bootstrap toolkit, which provides a responsive and rich user experience. The tool is provided with development documentation and end-user tutorials. To summarize, the tool has been developed covering the complete set of non-function features, namely, it is cloud-based, open source and free, updated, not focused on a specific discipline, and delivered with user guides.
Considering the overall category features, the tool has been implemented with a role management module, thus enabling users to collaboratively work on a review. Two main roles are defined, namely performer and supervisor. SR data can be shared among all the SR team members. With a user's consent, the SR data can be accessed through the web interface for preservation and reproducibility (see Figure 3b). Thus, users can follow other users' activities. Finally, CloudSERA provides a logging system to trace the actions performed by users in an SR project. To summarize, CloudSERA has been thought of, and built, to cover the main features of the overall functionality category, such as collaboration, user role management, data maintenance and traceability.
Regarding the information management category features, users can create SRs and define research questions and related issues. CloudSERA can be used to automate several tasks of the SR process and includes a step-wise wizard (see Figure 3a) to guide users through the creation and configuration of an SR process. Besides, the tool automates the search tasks by launching the configured queries to an integrated set of databases and digital libraries. Currently, the supported sources are the following: ACM Digital Library, IEEE Computer Society, Springer Link and Science Direct. CloudSERA enables the inclusion of new sources easily through configuration. With the integrated search engines, the user does not have to run separate queries for each library. Every query runs asynchronously in the background and the user is notified when the search finishes. CloudSERA enables the definition of inclusion and exclusion criteria, which can be used to filter the bibliographic references found. References can be visualized and refined by means of a set of facets, according to the automatically retrieved metadata of the studies and the manually entered values for the attributes (see Figure 3c). These results will also be used to show the statistics of included and excluded studies, with their exclusion reasons. Considering the features of the evidence synthesis category, CloudSERA uses charts to visualize the data according to some aspects such as document type, language and inclusion or exclusion criteria, among others. Figure 3d shows an example of the main screen of a specific project. This enables users to report data results from the study inclusion/exclusion criteria and specific attribute tagging.
Finally, Elsevier's Mendeley is also integrated with CloudSERA to authenticate users using Mendeley credentials and to import and store bibliographic references found. Common metadata is used to automatically annotate the imported references. The tool also adds specific attributes for designing additional data extraction forms and quality assessment instruments. In this way, users can collect all the information needed from the primary studies to address the review questions by using textual or nominal attributes in a range of predefined values. In addition, users can evaluate the quality of each compiled study by means of a numeric attribute-based scale. These attributes will be used to categorize the studies and export the results, including the statistics of the studies tagged by each category. Additionally, the tool enables the user to export the bibliography data in different formats, for example, BibTeX, Word and Excel. The two latter formats also provide pages or sheets that include the resulting data of the work, such as research questions, attributes, search history, primary studies and charts (see the available export options in Figure 3d).
(a) CloudSERA's wizard to create a new systematic review.
(b) CloudSERA's home screen for a specific user.
(c) CloudSERA's selection screen for a specific study.
(d) CloudSERA's main screen for a specific project.

Technical Quality and Utility Evaluation
The technical quality and utility of software must be rigorously checked using accepted evaluation methods [32]. CloudSERA has been evaluated, first, by means of software quality testing techniques. With that aim, an exhaustive test battery was developed and run to ensure the fulfillment of the requirements. Sets of unit tests and functional tests were coded by using Spock and Selenium frameworks. Then, several non-functional tests were also conducted. JMeter was used first to stress-test the system and check its behavior with a great number of requests and long reference searches; then TAWDIS was used to check the web accessibility. Finally, a structural inspection of code quality to detect bugs, code smells and security vulnerabilities was performed with SonarQube. More details about the testing plan can be found in the developer portal of the tool [31].
Once the application was developed and deployed, a general heuristic evaluation was performed. The test was designed by following the heuristics proposed by Nielsen [33]. This test was conducted by several members of the authors' department who assessed the application by completing a checklist. This questionnaire focused on aspects such as identity and information, language, structure and navigation, layout, help elements, user feedback, and so forth. In general, the results provided us with valuable tips for improving the finally delivered version of the application. The results of this evaluation are also available on the developer's website [31].

Experts' Survey on Evidence-Based Systematic Review
The proposed framework collects all the main desirable features in a tool that supports the complete SR process. In this section, an expert survey for assessing the features included in the framework is presented. The results provide us with valuable insights for the improvement of the CloudSERA tool. The survey carried out involves the following steps: expert screening, survey questions' definition and implementation and, finally, data collection and analysis of the experts' opinions. In this study, the survey was conducted by 11 experts from different research academic disciplines, such as: Humanities, Applied Sciences, Formal Sciences, Social Sciences and Natural Sciences.

Expert Screening and Survey Questions
First, a purpose sampling technique was applied to select the experts from the sampling frame who had completed the survey. We considered researchers from different academic disciplines who had performed and supervised at least one SR. The expert screening, based on the previous characteristics, was carried out with researchers from the University of Cádiz INDESS Research Institute, a multidisciplinary research institute whose members' areas match the goal of the study. In addition, researchers from other universities also participated.
Second, the expert survey [34] was designed by following the recommendations provided by Oates [16] and was published using Google Forms. The survey content was organized into six sections. The first section included a question related to data protection issues. Users needed to provide consent to allow for the analysis of their data. We also provided users with a consent revocation form [35] if required. The second section included questions related to the user's profile, whereas the third was devoted to obtaining data on their level of expertise with systematic reviews. The goal of the fourth and fifth sections was to validate the features contemplated in the information management and evidence synthesis categories. Finally, the sixth section aimed to capture the users' interest level in the SR support tools and validate the most relevant features contemplated in the nonfunctional and overall functionality categories.
The survey included several types of questions such as scale questions, multiple selection questions, and open questions to answer with free text. In the multiple selection and scale questions, respondents were entitled to include alternative responses, which was useful for indicating, for example, additional features to consider in an SR tool besides those already considered by the framework. The questions included are listed in Appendix A.
Once the data form was designed, an e-mail with detailed instructions was sent to the screened experts to complete the survey. Table 4 summarizes the experts' opinions about the significance of each feature included in the analysis framework. In order to properly analyze the results, some dimensions of interest, namely academic discipline and expertise level, were considered.

Data Collection and Analysis
For each dimension considered, the average score assigned to the questions by the experts pertaining to the indicated dimension has been included in the table. Additionally, the total average of the scores of the entire sample of experts has also been provided.

•
The complete sample: a total of 11 experts were involved in the survey. • Research academic discipline: Humanities (2), Applied sciences (2), Formal sciences (2), Social sciences (3) and Natural Sciences (2). • Expertise level: researchers who have supervised more than one SR (8) and those who have only supervised one (3).
The data collected for each section of the survey is analyzed in detail below. In order to make the study reproducible, a spreadsheet with data collection and analysis is available online [36].

Information Management Features
The responses collected for the questions belonging to the information management category are discussed below: • First, this section of the survey asked the experts about the reference management systems they use. In this case, the experts' responses enumerated the following: Mendeley, Zotero, JabRef, EndNote, and RefWorks. Third, the experts were asked about the guides used to measure the quality of the methodology used in their selected studies. In this case, only the experts from the Applied Sciences, Natural Sciences and Social Sciences use PRISMA [37], ENTREQ [38] and Cochrane Collaboration [39].
Then, the experts were asked about the significance score they assigned to the features included in the Information management category. Table 4 presents the results of the experts' opinions, classified by their research disciplines and expertise level. From these results, we can observe the following: • All features included in this category were considered relevant for the experts. • The duplicate deletion feature has a lower than average score; this might indicate that integrating the remaining features into the SR tool to cover the experts' requirements should be a requisite. Besides, this can indicate that the features needed for information management in the SR process are applied in the same way in all disciplines. • The experts suggested including the following features: stakeholder engagement, inter-rater reliability in the screening process, detect future lines, new questions and challenges, and the possibility of setting the sample size.

Evidence Synthesis Features
The responses collected for the questions belonging to the evidence synthesis category are discussed below: • First, this section asked the experts about the type of study more frequently conducted or supervised. In this case, four researchers indicated quantitative studies, two researchers indicated qualitative studies and five researchers indicated mixed studies. • Second, they were asked about the techniques used to synthesize collected evidence. In this case, the researchers mentioned the following techniques: grounded theory, content analysis, case survey, meta-study, meta-ethnography, thematic analysis, narrative summary, Bayesian meta-analysis and meta-study. Additionally, they contributed qualitative comparative analysis method and meta-synthesis to the previous list. • Third, the experts were asked about the method used to collect data from primary studies. In this case, the experts responded with: manually, using Mendeley, using online survey data, using face-to-face surveys or phone surveys. Other experts indicated that, in certain disciplines, it is complicated to gather data because "some studies do not provide data, for example, patient data are not commonly available". • Fourth, they were asked about tools used to analyze data. In this case, the experts from the Humanities mentioned Atlas.ti, or SPSS, experts from the Social Sciences mentioned R, meta-regression and Forest Plot, and experts from the Natural Sciences named Microsoft Excel. Other researchers indicated that they manually perform the analysis, without using any support tool. • Fifth, the experts were asked about the techniques used to measure the risk of bias assessment. In this case, only the experts from the Applied Sciences and Social Sciences indicated that they used this type of technique. In the case of the Applied Sciences, experts performed a revision using a risk of bias table, whereas they used GRADE and AHQR guidelines in the case of the Social Sciences. • Sixth, they were asked about the methods used for representing results. In this case, seven of them indicated that they used visual representations for synthesizing results; five of them indicated that they used a flowchart to depict the selection process of each study; and three of them indicated that they used visual representations for indicating the included and excluded studies. • Seventh, the experts were asked about their opinion on whether quantitative reports should be different to qualitative ones. In this case, seven of them responded affirmatively and two of them negatively. • Eighth, they were asked about including extra information in the reports. In this case, one expert from the Social Sciences indicated that the addition of summary tables and additional files with the complete information is needed and one expert from the Applied Sciences indicated that it is relevant to include personal opinions.
Then, the experts were asked about the relevance score they assigned to the features included in the evidence synthesis category. Table 4 shows the results of the experts' opinions. From the previous results, we can observe: • All features included in this category were considered relevant by experts. • The experts from the Humanities and Formal Sciences gave a higher score to the ES features than experts from the Applied Sciences, Social Sciences and Natural Sciences. This might indicate that the experts of the former disciplines invest a greater effort into the application of evidence synthesis techniques in their SR processes.

Overall Features
The responses collected for the questions belonging to the overall category are discussed below: • First, the experts were asked about their interest level in the availability of tools to support the SR process. In this case, the average level of interest is 4.18, indicating that the inclusion of this kind of tool is very relevant. • Second, they were asked whether they have used SR tools previously. In this case, only one of the participants had used this type of tool and mentioned EPPI-Reviewer, CADIMA, Rayyan, SysRev and Colandr.
Then, the experts were asked about the relevance score they assigned to the features included in the overall functionality category. Table 4 shows the results of the experts' opinions. From the previous results, we can also observe that all the features belonging to the overall functionality category were considered to be relevant by the experts. We can observe that the experts in each domain assigned quite different scores for each feature. However, they were all rated above 3, meaning that these features are relevant or very relevant for them in SR support tools.

Results
Following the opinions and analysis previously discussed, we can draw a road map for enriching CloudSERA with the most valuable features required to provide more complete support to the SR process. Include computational support to partially automate some of the evidence data synthesis techniques, such as: grounded theory, content analysis, case survey, metaethnography, meta-study, narrative summary, meta-study, qualitative comparative analysis methods, meta-synthesis, and thematic analysis. • Provide the following features from the information management category: Inter-rated reliability in the screening process, detect future lines, new questions and challenges, and the ability to set the sample size.
Finally, an analysis of threats to validity is required. In this case, we followed a detailed protocol to define the more desirable features in SR support tools and the expert survey, in accordance with internal validity and construct validity. In addition, we screened the experts ensuring that they have experience in the SR process by having supervised at least one SR. The sample size is not large, but it helped us to validate the features proposed in the evidence-based SR framework and to propose new ones, thus fulfilling the goal of the survey.

Conclusions and Future Work
Undertaking systematic reviews is an essential task before starting any research activity. Researchers need to perform a preliminary study of the literature in order to know the current state-of-the-art in a specific research topic. Manually performing this task is very time-consuming for a user. For this reason, several tools have been developed to support systematic reviews. CloudSERA is a cloud-based systematic review tool focused on the realization of systematic literature reviews within the domain of Computer Science. In this work, we have carried out an analysis aimed at gathering the essential evidence synthesis features to provide for a more complete evidence synthesis process in a general-purpose systematic review tool. We have defined a framework that incorporates these features to evaluate and compare existing tools in other domains. With the aim of evaluating the framework and feeding it with more relevant features, an expert survey was carried out. That study has provided us with valuable insights to help improve the CloudSERA tool in supporting the complete set of evidence synthesis functionalities. Moreover, after evaluating existing tools that perform SR processes, we conclude that none of them incorporate all the necessary functionalities for domain experts. For this reason, enhancing CloudSERA by integrating these functionalities can be a promising and viable option.
We plan to extend CloudSERA with an On-Line Analytical Processing (OLAP) viewer to carry out multi-dimensional analysis easily. In addition, data mining algorithms will be explored and included to discover and cluster data extracted from the primary studies. Additionally, we will incorporate a workflow engine to orchestrate the execution of the tasks to complete an SR process. Finally, we also plan to conduct a heuristic evaluation with the next CloudSERA version using potential end-users from different knowledge areas to measure usability attributes such as learnability, efficiency and user satisfaction.

Data Availability Statement:
The data used in this paper are available online: the experts' survey and consent form [34]; consent revocation form [35]; and data collection and analysis spreadsheet [36].
• Followed methodological guides: Cochrane Collaboration, Kitchenham's guidelines, etc. • Personal rating of significance of each feature collected in the Information management category of the framework (in a scale from non-relevant to essential).