Business Data Sharing through Data Marketplaces: A Systematic Literature Review

: Data marketplaces are expected to play a crucial role in tomorrow’s data economy, but such marketplaces are seldom commercially viable. Currently, there is no clear understanding of the knowledge gaps in data marketplace research, especially not of neglected research topics that may advance such marketplaces toward commercialization. This study provides an overview of the state-of-the-art of data marketplace research. We employ a Systematic Literature Review (SLR) approach to examine 133 academic articles and structure our analysis using the Service-Technology-Organization-Finance (STOF) model. We ﬁnd that the extant data marketplace literature is primarily dominated by technical research, such as discussions about computational pricing and architecture. To move past the ﬁrst stage of the platform’s lifecycle (i.e., platform design) to the second stage (i.e., platform adoption), we call for empirical research in non-technological areas, such as customer expected value and market segmentation.


Introduction
Data marketplaces are expected to play a crucial role in tomorrow's data economy [1].A data marketplace can be broadly defined as a multi-sided platform that matches data providers and buyers.It facilitates business data sharing among enterprises.Key actors providing data marketplace functionalities include owners, operators, and third-party providers [2][3][4].Business data sharing via data marketplaces may contribute to overall economic growth by stimulating data-driven innovation, improving the competitiveness of small and medium-sized enterprises (SMEs), and opening up job markets [5].Despite their potential, data marketplaces have only been commercialized in a few cases (such as Dawex, Data Intelligence Hub, and Advaneo) [4].Commercialization of such marketplaces enables the creation of new products and services.It is especially beneficial for organizations that do not have proprietary access to required data [6].Moreover, commercialization can foster the integration of third-party providers into data marketplaces, enabling them to enhance marketplace offerings by providing complementary products and services.
This paper considers all data marketplace archetypes revealed by Fruhwirth, Rachinger, and Prlja [2]: centralized, decentralized, and personal data trading.In centralized data trading, data marketplaces mediate data exchange from diverse domains and origins, incorporating different data types and pricing mechanisms.Advanced data marketplaces in this archetype employ smart contracts to execute transactions.Decentralized data trading, on the other hand, relies on a decentralized architecture to operate data marketplaces.
Finally, personal data trading refers to a Customer-to-Business (C2B) relationship where individuals can sell their personal information to companies.
From an academic perspective, recent trends in the European Union policy-making agendas have led to increased studies on business data sharing via data marketplaces, resulting in a constantly expanding yet fragmented body of literature.Recent research provides an understanding of the state-of-the-art in practice via business model studies (e.g., Fruhwirth, Rachinger and Prlja [2], van de Ven et al. [7]), but it does not provide a comprehensive overview of data marketplace research in academia.Consequently, knowledge gaps in data marketplace research remain unclear.Specifically, we lack understanding of whether research is scarce on topics that would advance data marketplaces toward commercialization.As it stands, it might well be that academic research is focusing on topics that do not help resolve the standstill in data marketplace commercialization.
Adopting the Systematic Literature Review (SLR) guideline provided by Okoli [8], this paper provides a systematic review of research on data marketplaces.To cover the broad range of issues that plays a role in technology commercialization, we also use the business model construct as a literature review framework (cf., Solaimani et al. [9]).To the best of our knowledge, our study is the first to provide a comprehensive overview of data marketplace research, which will be beneficial in steering future research toward commercializing data marketplaces.
We describe our approach in conducting a systematic literature review in Section 2, followed by the article categorization based on the Service-Technology-Organization-Finance (STOF) model in Section 3.Then, Section 4 discusses the domination of technical research in the data marketplace literature; Section 4 also highlights the future research agendas.Finally, we close this paper by presenting the main conclusions and limitations of our study in Section 5.
marketplaces.Finally, personal data trading refers to a Customer-to-Business (C2B) relationship where individuals can sell their personal information to companies.
From an academic perspective, recent trends in the European Union policy-making agendas have led to increased studies on business data sharing via data marketplaces, resulting in a constantly expanding yet fragmented body of literature.Recent research provides an understanding of the state-of-the-art in practice via business model studies (e.g., Fruhwirth, Rachinger and Prlja [2], van de Ven et al. [7]), but it does not provide a comprehensive overview of data marketplace research in academia.Consequently, knowledge gaps in data marketplace research remain unclear.Specifically, we lack understanding of whether research is scarce on topics that would advance data marketplaces toward commercialization.As it stands, it might well be that academic research is focusing on topics that do not help resolve the standstill in data marketplace commercialization.
Adopting the Systematic Literature Review (SLR) guideline provided by Okoli [8], this paper provides a systematic review of research on data marketplaces.To cover the broad range of issues that plays a role in technology commercialization, we also use the business model construct as a literature review framework (cf., Solaimani et al. [9]).To the best of our knowledge, our study is the first to provide a comprehensive overview of data marketplace research, which will be beneficial in steering future research toward commercializing data marketplaces.
We describe our approach in conducting a systematic literature review in Section 2, followed by the article categorization based on the Service-Technology-Organization-Finance (STOF) model in Section 3.Then, Section 4 discusses the domination of technical research in the data marketplace literature; Section 4 also highlights the future research agendas.Finally, we close this paper by presenting the main conclusions and limitations of our study in Section 5.

Research Approach
This research employs a Systematic Literature Review (SLR) approach [8], summarized in Figure 1.Okoli [8] suggests that an SLR study can be divided into four primary steps.These are (1) planning, (2) selection, (3) extraction, and (4) execution.The planning step comprises the activities of determining the objective and research protocol.Whereas the objective is presented in Section 1, the research protocol, including the guidelines to synthesize the articles, will be discussed in this section.Next, the selection step is conducted by identifying the screening criteria and conducting a literature search.We selected articles based on three criteria: articles should be (1) written in English; (2) published in a peer-reviewed journal or conference proceedings; and (3) focused on data marketplaces.We employed the search terms of ("data marketplace*") OR ("data market*").Our primary database is Scopus, which comprises a comprehensive database of many scientific research papers, including the area we are examining in this study.The The planning step comprises the activities of determining the objective and research protocol.Whereas the objective is presented in Section 1, the research protocol, including the guidelines to synthesize the articles, will be discussed in this section.Next, the selection step is conducted by identifying the screening criteria and conducting a literature search.We selected articles based on three criteria: articles should be (1) written in English; (2) published in a peer-reviewed journal or conference proceedings; and (3) focused on data marketplaces.We employed the search terms of ("data marketplace*") OR ("data market*").Our primary database is Scopus, which comprises a comprehensive database of many scientific research papers, including the area we are examining in this study.The literature search was conducted on 6 July 2020 and resulted in 496 articles.We complemented these articles with nine additional papers that we consider key literature.These nine articles did not appear in the initial search because, for instance, they do not use the data marketplace term explicitly, neither in the title nor abstract.
In the extraction step, we retrieved the articles' meta-data and saved it in an Excel spreadsheet (File S1, available here: https://doi.org/10.4121/14673813.v2,accessed on 22 November 2021).Next, we analyzed the quality of the identified articles by employing a two-step screening approach.First, we looked into the title and abstract of the selected papers to assess their relevance.We discussed our assessment internally to reach a consensus, resulting in an exclusion of 225 papers.We excluded the articles because the studies (1) merely focus on data marketplaces as the core of the research, (2) are published in a workshop or proceeding description-not in a peer-reviewed research paper, (3) not written in English, and (4) have no abstract.
Second, we used traditional metrics (i.e., citation numbers, journal ranks, and journal percentiles) by calculating the average number of citations from the existing 280 articles.We use the resulting average citation number (7.3, rounded down to 7) as a threshold to quantitatively assess the paper.We included any articles that were cited more than seven times.We further assessed those below the threshold in terms of the publication outlet.If a journal or conference proceedings were ranked above the 50th percentile in their respective domain, we would consider those outlets as high-quality.As a result, we included any articles that also belong to these criteria.Using both citation numbers and publication rank ensured the inclusion of the most prominent and relevant articles.
We also considered alternative metrics (i.e., social media, usage, captures, and mentions) provided by the Scopus database, namely the PlumX Metrics [10], for the remaining articles that did not meet both criteria.The rationale is that the novelty of data marketplaces and its growing interest in the non-scientific community might lead to more discussions in (among others) social media.As a result, the impact of such articles might not be captured by traditional metrics.Using these alternative metrics would allow the inclusion of articles that creates an impact beyond the scientific community.Furthermore, attention to such metrics is increasingly used for scientific evaluation to complement traditional metrics [11].We calculated the average numbers of those alternative metrics based on the existing 280 articles, resulting in the following threshold: social media = 2.1, usage = 44.8,captures = 43.2, and mentions = 0.2.We included any remaining articles that have scores above these numbers, resulting in 158 papers.By combining both traditional and alternative metrics, we ensure both scientific reliability and relevance to practice.
In the execution step, we synthesized the included papers and wrote the review (see Section 4).Following Solaimani, Keijzer-Broers, and Bouwman [9], we applied the Service-Technology-Organization-Finance (STOF) model to synthesize the included papers.The STOF model is a generic framework to reconstruct the logic of a business and its ecosystem [12].Thus, it enables a high-level representation of the service domain (S), technology domain (T), organization domain (O), and finance domain (F).The service domain describes the service offering that the business and its ecosystem intend to deliver to create value for a target group of customers.The technology domain describes the technical architecture needed by the business ecosystem to deliver the proposed services.The organization domain describes how the actors in the business ecosystem are organized to deliver the service offering, to explicate how the ecosystem intends to create value for the customer.Finally, the finance domain describes how the business and its ecosystem intend to capture value from the service offering, including how costs, revenues, and risks are divided among the different actors in the ecosystem.
The STOF model is suitable for our purpose since it is explicitly designed for ICTenabled services such as data marketplaces.Unlike frameworks such as The Business Model Canvas [13], the STOF model explicitly captures the role of technology in commercialization.Moreover, the STOF model helps to understand the dynamics involved in developing successful business models (i.e., market adoption and sustainable profitability of the designed services).Due to the lack of commercialized data marketplaces, it is crucial to understand what we (do not) know about the breadth of the business models of data marketplaces, ranging from their value to how they deliver and capture value.Hence, the STOF model is highly appropriate to structure our review and discussion.
We then read the full text of the 158 remaining articles and classified each article into a STOF model domain.Furthermore, each article was further classified into a category.
To classify an article, we identified its main research objective while paying attention to the primary unit of analysis of the research.We employed the following guideline to categorize the articles (see Table 1).The guideline is inspired by the STOF model [12].In addition, we also considered the well-known ACM Computing Classification System (https://dl.acm.org/ccs,accessed on 9 August 2021) to identify the suitable keywords for our categorization.
Table 1.The guideline to categorize the articles.

Service
Discussing possible services for end-users (data providers and buyers); services uniqueness and differentiators compared to competitors' offered services; potential customers who will use and pay for the developed services.
Customer, previous experience, expected value, market segment, context, effort (ease of use), tariff, bundling, perceived value, delivered value, intended value, value proposition.

Technology
Discussing technology needs to deliver the services.
Technical architecture, applications, devices, service platforms, billing platform, customer data platform, technical functionality.

Organizational
Discussing actors and resources to run the services.Use organization domain to categorize "other" topics, e.g., demographic aspects, social implications.
Resources and capabilities, strategies and goals, value activities, value network, actors, organizational arrangements, relations, interactions, roles.

Finance
Discussing financial schemas to run the services.
Investment sources, capital cost sources, costs, revenue sources, revenues, risk sources, risk performance indicators, financial arrangement.
For example, Munoz-Arcentales et al. [14] propose an architecture for data usage and access control.Since the discussion emphasizes technology needs, we classified this paper into the architecture category in the STOF technology domain.Another example is a study conducted by Virkar, Viale Pereira, and Vignoli [5].The study discusses the political, economic, societal impacts of data trading via a data marketplace.After carefully examining the paper, we classified this paper into the social implication category in the STOF organization domain.Although some articles can have multiple overlapping topics, we still attempted to assign each article into a single category.We justified this by analyzing the central theme of the discussion.Various articles were independently categorized by multiple authors to assess inter-rater reliability.In general, there was a high level of agreement between the authors.We also further excluded some irrelevant articles, including those that did not discuss business data sharing via data marketplaces.Our final sample consisted of 133 articles.

Results: STOF Model Categorization
This section describes the results of our STOF model categorization.In total, we identify 17 categories (refer to Figure 2).The description for each category is provided in the following sub-section.

The Service Domain
We identify three categories within the STOF service domain (see Table 2).The first one concerns the data-related aspects.This category explores data properties as a unit of analysis, such as data characteristics as economic goods [15] and approaches to identify data quality problems [16].The second category in the service domain is user preferences.It discusses data providers' willingness to share data via data marketplaces considering aspects such as anonymity [17] and data ownership [18].In addition, the value theory for personal data is also proposed [19].
Finally, the most dominant category in the service domain is the value proposition.The studies in this category generally concern identifying value for data marketplace actors.For example, Perera et al. [20] and Anderson et al. [21] explore the value of trading Internet of Things platforms (IoT) and healthcare data, respectively.An additional example is the value exploration of data marketplaces that trade anonymous personal data [22].Additionally, Mamoshina et al. [23] discuss the possibility of blockchain and artificial intelligence implementation to solve concerns from regulators and data providers, specifically related to the issue of control over data.Match-making services in data marketplaces are also discussed to ease data providers to advertise their data product; to enable data buyers to request their data demand [24,25].Finally, another surprising example is the discussion of services provided by "stolen data markets," which refer to marketplaces that trade illegal data such as personal and credit card information [26].To sum up, the discussion in the service domain primarily focuses on the services provided by data marketplace operators and third-party providers to fulfill the needs of data marketplace actors.

The Service Domain
We identify three categories within the STOF service domain (see Table 2).The first one concerns the data-related aspects.This category explores data properties as a unit of analysis, such as data characteristics as economic goods [15] and approaches to identify data quality problems [16].The second category in the service domain is user preferences.It discusses data providers' willingness to share data via data marketplaces considering aspects such as anonymity [17] and data ownership [18].In addition, the value theory for personal data is also proposed [19].

Category Description Article Reference
Data-related aspects Discussing data properties as a unit of analysis.[15,16] User preferences Discussing willingness to share data due to certain aspects.[17][18][19] Value proposition Identifying value for data marketplace actors.[20][21][22][23][24][25][26] Finally, the most dominant category in the service domain is the value proposition.The studies in this category generally concern identifying value for data marketplace actors.For example, Perera et al. [20] and Anderson et al. [21] explore the value of trading Internet of Things platforms (IoT) and healthcare data, respectively.An additional example is the value exploration of data marketplaces that trade anonymous personal data [22].Additionally, Mamoshina et al. [23] discuss the possibility of blockchain and artificial intelligence implementation to solve concerns from regulators and data providers, specifically related to the issue of control over data.Match-making services in data marketplaces are also discussed to ease data providers to advertise their data product; to enable data buyers to request their data demand [24,25].Finally, another surprising example is the discussion of services provided by "stolen data markets," which refer to marketplaces that trade illegal data such as personal and credit card information [26].To sum up, the discussion in the service domain primarily focuses on the services provided by data marketplace operators and third-party providers to fulfill the needs of data marketplace actors.

The Technical Domain
Most publications fall within the STOF technology domain.This domain is divided into six categories (refer to Table 3).In our sample, the first identified category is architecture.Architecture of data marketplaces can be loosely described as building blocks of technical components.The discussion in the architecture category is primarily dominated by blockchain-based systems, which relates to the development of peer-to-peer and decentralized data marketplaces [27,28].Specifically, the blockchain systems are applied to specific contexts such as the automotive domain [29,30], private data sharing [31], Internet of Things (IoT) [32][33][34][35], or smart cities [36].In other cases, blockchain-based systems are employed for proposing auditing schema [37], credit scoring [38], data transaction integrity [39], and Proof of Usage (PoU) algorithm [40].Beyond the blockchain-based systems, the proposed architecture specifically highlights data access and control based on the International Data Space (IDS) reference architecture [14].Beyond the blockchainbased architecture, Matzutt et al. [41] discuss a conceptual architecture for personal data marketplaces, focusing on protecting data privacy, while Mišura and Žagar [42] focus on IoT devices.In addition, Sánchez et al. [43] propose a data marketplace architecture to federate multiple-domain IoT; Pillmann et al. [44] propose an information model to provide a single point of access for vehicle data.Finally, Li et al. [45] propose a cost-efficient middleware for data acquisition service; Ren et al. [46] introduce infrastructure architecture for data placement.
The second category, which is the most discussed category in this domain, is computational pricing.It focuses on technical discussions for data pricing.Computational pricing emphasizes algorithms as price determination mechanisms [47], such as machine learning-based algorithms to price training data or pre-trained models [48,49].Advanced techniques are proposed, such as a smart pricing algorithm based on Stackelberg game theory.This algorithm is applied in blockchain-based data marketplaces [50].
Next, publications in this category primarily propose query-based pricing mechanisms, referring to the capability to allow "the price of any query to be derived automatically" ( [51], p. 43).The studies discuss many aspects, for instance, the implementation of query pricing [52] and dynamic pricing considering "reserve price constraint" that helps data brokers maximize their revenue [53].Another algorithm allows data price to be derived from the privacy losses [54].Studies in query-based pricing mechanisms consider many cases such as query interfaces for mobile crowd-sensed data [55,56], cloud-based data marketplaces with possibilities to share cloud resources [57], spatial data [58], aggregated data from multiple distributed system [59], and data acquired from Application Programming Interfaces (APIs) [60].Moreover, Tang et al. [61] introduce query-based data provenance, while Wang et al. [62] create efficient query-based auctions by considering both the value data and the resource consumption of queries.
Many other articles also propose data quality-based pricing models by considering a bi-level mathematical programming model [63], Fair Knapsack Pricing [64,65], or optimal distributing algorithm [66].Other works on data quality-based pricing specifically focus on XML dataset properties [67,68].Moreover, another topic in this category discusses an iterative auction-based algorithm with an additional focus on data protection throughout the auction processes [69,70].Still concerning auction, Zheng et al. [71] introduce an auction algorithm for data brokers, aiming for profit maximization in mobile crowdsourcing data marketplaces.
The rest of the pricing topics are relatively diverse, depending on their specific focus.Zeng and Ohsawa [72] propose a new method to price data based on the clustering technique.Oh et al. [73,74] develop data trading models that consider privacy valuation.Likewise, another example explores algorithms for dynamic privacy pricing [75].Hu et al. [76] develop a blockchain-based incentive structure that incorporates privacy and security aspects.Still on blockchain-based data trading, Liu et al. [77] design a debt-credit system to solve the efficiency issues.Finally, Yang et al. [78] develop a pricing algorithm from a data science perspective to examine the effect of data quality on machine learning.
Next, the category of data-as-a-service primarily explores the topic of Application Programming Interfaces (APIs) to enable data providers and buyers to use the services of data marketplaces.Vu et al. [79] aim to ease API implementation by providing a structure description model.In addition, Truong et al. [80] develop a RESTful service specifically for exchanging data agreements.The following category is data contracts, which generally refer to formal arrangements between data providers and data buyers to specify data usage.In this category, abstract models for data contracts are proposed to develop various data contracts that consider different data types.The studies also propose evaluation techniques to evaluate data contracts [81,82].
The information retrieval category to support data discovery in data marketplaces such as information schema [83], semantic [84], and ontologies [85,86] are also discussed in the literature.A review of data search techniques in data marketplaces is also conducted [87].Moreover, Rekatsinas et al. [88] introduce a data source management system, which allows users to identify the most useful data sources for their applications.Finally, the security and privacy category has also gained much attention in the literature.The topics covered in this category are related to privacy-preserving technology [89][90][91][92][93], property rights enforcement [94], and secure information models [95].

Data contracts
Discovering the models to develop formal arrangements between data providers and data buyers to specify data usage.[81,82] Information retrieval Discussing data discovery techniques in data marketplaces.[83][84][85][86][87][88] Security and privacy Proposing technical enforcements to guarantee security and privacy. [89-95]

The Organization Domain
We identify five categories in the STOF organization domain (refer to Table 4).The first category is the classification frameworks, which describe data marketplace business models via a taxonomy [2,96,97].Next, the category of data ecosystems is also discussed.A data ecosystem is "a set of networks composed by autonomous actors that directly or indirectly consume, produce or provide data and other related resources (e.g., software, services, and infrastructure)" [98] (p.4).Data marketplaces are often categorized as an instance of a data ecosystem [99].Therefore, the topics in this category examine ecosystem structures that are relevant to data marketplaces.For instance, Hayashi and Ohsawa [100] investigate the structural characteristics (i.e., how data interacts) in networks.Koutroumpis, Leiponen and Thomas [3] examine data sharing using a conceptual market design perspective.They identify the requirements for data sharing, specifically comparing small markets with greater control vs. large markets with less control over data.Another topic is the exploration of stolen data markets that specifically discuss the processes and market forces that shape the relationship between involved actors and available products [101].Finally, W. Thomas and Leiponen [6] and Oliveira, Lima and Lóscio [99] review data ecosystems in the literature and propose research agenda.Subsequently, the category of demographic aspects can be broadly defined as the description distribution of specific actor properties, such as population.The topic discussed in this category covers the geographical distribution of victims [102], actor populations [103], and community networks structures in stolen data markets [104].
Next, governance, the most-discussed category in this domain, broadly refers to governing processes by certain actors (e.g., data marketplace operators) via several mechanisms, such as norms or power [105].Examples of governance topics include discussion about policies and strategies in data marketplaces [106], a reference model for data protection for policymakers [107], and trust-creating mechanisms to enhance perceived market trustworthiness [108].Other topics analyze social structures [109] and facilitating factors of data trading in stolen data markets [110].Subsequently, the intervention and distributing approaches to crime prevention in stolen data markets are also discussed [111].Furthermore, more topics like tax instruments [112], a manifesto from data providers to retain control over their data [113], and an elaboration on how multi-party computation (MPC) can be attributed as a control mechanism [114] are also studied.The last topics in this category are governance mechanisms in the data sharing platform design process [115], self-regulation for fairness and transparency for data sharing [116], as well as discussion about legal and technical measures for dealing with privacy issues [117].
Finally, the category of social implications refers to the exploration of data marketplace impacts for society, such as the rise of ethical challenges in genomic health data sharing [118].Likewise, Van Dijck and Poell [119] critically examine the claim of the benefits of health data sharing in platforms.This category also discusses the implications of data trading for social, political, economic, and cultural contexts [5].Finally, many articles discuss the topic of exploitation of individual data in personal data marketplaces [120][121][122][123].

The Finance Domain
We identify three categories in the STOF finance domain (see Table 5).The first category is economic feasibility, examining the possibility to implement data marketplaces using economic perspectives.It explores the competition between actors using Nash equilibrium characterization [124].Another category is market analysis.In general, it examines the market size and value.For instance, Holt et al. [125] and Shulman [126] analyze the economic value of stolen data markets.In addition, Soley et al. [127] develop a model for calculating and estimating the monetary value of connected car data.
The topics of this category include data trading models that consider contract theory [128], information design perspective [129], and equilibrium pricing mechanism based on Stackelberg game approach [130].Moreover, pricing mechanisms specifically for personal data are also discussed.For instance, Niu et al. [131] propose pricing functions for aggregated personal data; Parra-Arnau [132] mathematically examine the tradeoff between privacy and money in personal data market; Yuncheng et al. [133] identify the properties that contribute to price personal data, such as data cost, value weight, information entropy, credit rating, and data reference index; Li et al. [134,135], discuss an economic theory of pricing personal data.
Empirical research is also conducted in the finance category.Hayashi and Ohsawa [136] explore the utility value of data using a workshop and behavioral economic theory.Subsequently, Muschalle et al. [137] outline critical inhibitors of data pricing based on interview results.Beyond empirical research, systematic literature reviews are also conducted to study data pricing opportunities and challenges in data marketplaces [138].This approach is also employed to explore the different data pricing models in the data marketplace literature [47,139].
Other topics are auction-based pricing using the Bayesian mathematical model [140,141], a pricing mechanism negotiation based on a negotiation game theory based in the energy domain [142], and a generic pricing mechanism based on a non-cooperative game theory in Mobile Crowdsensing [143].Finally, Stahl and Vossen [144] discuss data quality criteria (such as accuracy, completeness) that can be used to relatively price data, while Jang et al. [145] propose a three-hierarchal model of data trading and create a pricing function to achieve Nash Equilibrium (NE).

Discussion
This paper aims to investigate the current state-of-the-art of data marketplace research.Specifically, we want to know whether research lacks topics that would advance data marketplaces toward commercialization.As indicated in the introduction section, data marketplaces are hardly commercially exploited, even though the concept has existed for years.Apparently, existing data marketplaces struggle to move from the initial stage into the second stage of the platform's lifecycle (i.e., the platform adoption).One possible reason for the lack of data marketplace commercialization could be that previous studies have not dealt extensively with non-technical topics (refer to the findings elaborated in the previous section).Hence, contributions from the academic perspective toward data marketplace commercialization are still scant.Therefore, this section discusses various possible explanations for the technical research domination on data marketplace and connects these explanations to recommendations for future research.

Domination of Technical Research in the Data Marketplace Literature
As shown in Figure 2, we reveal that data marketplace research is still primarily dominated by technical literature.Based on this finding, the pattern of evolution of data marketplace research tends to follow the technology push (i.e., technological advancement drives innovation).We suggest three explanations for the dominance of technical research in data marketplaces literature.
First, funding and project availability are intensely focused on the technological development of data marketplaces-refer to the description of EU-funded projects on data markets (https://cordis.europa.eu/programme/id/H2020_ICT-13-2018-2019,accessed on 9 August 2021).The European data strategy [1] provides a clear example of this, as it intends to "invest €2 billion in a European High Impact Project to develop data processing infrastructures, data sharing tools, [and] architectures."Second, with recent increases in funding, many of these projects are still in the initial design phase.As suggested by Henfridsson and Bygstad [146], the goals in this phase tends to typically focus on foundational work, such as architectural design.This may explain why the debate in the data marketplace literature focuses on technical rather than non-technical aspects.
Finally, policymakers and other key stakeholders have already defined the overall aim of EU-funded projects (e.g., trust and sovereignty) as reflected in regulations and standards like the European data governance act (https://digital-strategy.ec.europa.eu/en/policies/data-governance-act, accessed on 17 November 2021) and Gaia-X (https://www.gaia-x.eu/what-is-gaia-x, accessed on 17 November 2021).In this regard, scholars might take these aims for granted and immediately focus on designing technical components of data marketplaces to achieve those pre-determined goals.
As a result of the three above-mentioned developments, extant research on data marketplaces has so far primarily been published in technical conference proceedings and in more technology-oriented journals, such as the IEEE Access and the IEEE Internet of Things Journal.

Service Domain Aspects
The findings indicate that little attention has been paid to the topics categorized in the service domain (this domain was covered least by our studied papers).Based on business model knowledge, this domain is essential and should be the starting point for data marketplaces to be commercially exploited [12].The topics in the service domain are essential to design services that fulfill customers' needs.Although a few attempts have been made to discuss relevant topics such as value proposition, many other topics such as customer expected value and market segmentation have barely been discussed in the selected articles.
Regarding the value proposition, we recommend studies that go beyond the mere value propositions of facilitating data exchange, and that include data analytics, data products, and advice.Studies can also distinguish value derived from different data types, such as real-time versus aggregated data, business versus personal data, and sensitive versus non-sensitive data.Segmentation is especially promising to study given that data marketplaces are in principle applicable to any business sector and any business type, but the desired value proposition likely differs drastically between segments of businesses.For instance, digitally native firms may be looking merely for access to data for running their own algorithms, whereas firms without data processing capabilities may look for additional value propositions of analytics features or even data products that are directly usable in the daily business practice.Empirical methods such as cluster analysis or class analysis could help to distinguish segments of data marketplace users, although also methods that combine qualitative and quantitative research, such as Q-methodology, may help to distinguish different perspectives on the value that data marketplaces offer.Given the expected proliferation of data marketplaces in heterogeneous business sectors, we also call for situated research, such as case studies, that considers how contextual characteristics of business sectors affect the desired value propositions by data traders.
Besides studies on the value proposition per se, we also recommend studies that interlink technical and pricing model choices with value delivered to user segments.For instance, decentralized technology paradigms such as blockchain-based data marketplaces may affect the value that users receive.Similarly, data collaboration algorithms such as multi-party computation affect value proposition too, as these enable deriving and sharing business-relevant insights rather than disclosing the raw data.These decentralized and collaborative technologies may also resolve the negative impacts of using data marketplaces, as they afford control over data without a trusted third party.We recommend design science research (DSR) and (controlled) experiments to derive the impact of these new technology paradigms on value delivery to data marketplace users.
Moreover, data marketplace projects are often conducted in a consortium based on academia-practitioners collaborations (e.g., the EU-funded projects).Academic publications may also reflect the work conducted by practitioners, for instance, by investigating the challenges and success factors of the few data marketplaces that exist in the market so far.This is important because, besides an imbalance in the current state of data marketplace research, we might also lack a clear understanding of problems faced by data marketplaces.
As a result, scholars and practitioners may try to solve the wrong problem or even problems that do not exist.Hence, comparative case studies and quantitative surveys among data marketplaces could yield meaningful insights to identify problems faced by such platforms and suggestions for future development.Given that data sharing and trading is a complex socio-technical process, investigating non-technical aspects may open opportunities to speed up the platform adoption process in practice.

Organizational Domain Aspects
Considering the organizational domain, one crucial overlooked aspect in current literature is value networks (or ecosystems) that describe actors and their interactions.It is essential to understand the dynamic to align their vision by developing organizational arrangements to achieve the common goal.In the area of data marketplaces, data governance and data provenance are especially important areas, in order for data sellers to retain a sense of being in control of their own data.Possible future research directions include efforts to transfer ideas from data stewardship and data governance to the area of data marketplaces.Such studies should not only provide technical or legal means to exert governance over data sales, but also empirically study the impact of such governance means on the willingness of data owners to sell their data.The issue of organizational arrangements will likely become even more important as data marketplaces are emerging in many different industries with fragmentation, thus leading to an ecology of data marketplaces with incompatible data governance regimes (see Abbas [147]).The cross-over between organizational arrangements and the service domain is a fertile study ground too, for instance, in choice experiments that contrast data marketplaces operated by big tech providers with those of a more decentralized ownership structure.
Other topics such as the meaning of openness in data marketplaces are also worth investigating.Typically, scholars have emphasized data as the object of openness by identifying approaches to incentivize data sharing.However, openness in data marketplaces can go beyond access to data, such as access to analytics modules (cf.Mucha and Seppala [148]) provided by third-party complements.In this regard, literature on digital platforms (e.g., De Reuver et al. [149]) might explain why openness matters (or not) in the context of data marketplaces.On the one hand, openness could attract more service complementors [150] and boost third-party innovation by analytics providers [151], ultimately attracting more users [152] and attaining critical mass [153].On the other hand, openness could also lead to increased costs and effort to control complementors [154], especially complements that could harm platform's integrity [155].Hence, it would be interesting to see if current understandings of platform openness could simply be applied to the new context of data marketplaces.
Considering actors and their interactions, the value on a data marketplace is not only provided by a single stakeholder but jointly created in an ecosystem setting.Typically, data marketplace owners rely on third-party providers to realize their value offerings, such as data suppliers, data aggregators, applications developers, and service providers [6,99].To successfully design and commercialize data marketplaces, it is crucial to identify the different players in data marketplaces and understand the economic value exchanges between them.Therefore, future research can focus on studying the roles and value flows of stakeholders in and around data marketplaces.We recommend using existing value modeling techniques, such as e3-value [156], to connect relevant stakeholders to their respective value flows.In doing so, the partnerships among data marketplaces and third-party providers to co-create value are likely to emerge.

Finance Domain Aspects
The finance domain aspect is essential to create viable business models [12].Nevertheless, the current literature merely emphasizes data pricing.Future research should cover other essential topics in the finance domain, such as cost sources and investments because they are essential to building operating models of data marketplaces.For example, opera-tors need to hire internal developers to maintain a stable core system of data marketplaces.Another example is the need for primary and supporting activities (e.g., marketing or human resources, respectively) to deliver value to end customers [157], which required careful cost calculation.Therefore, future research could identify a framework to identify cost sources and calculate them appropriately.Cost sources are also inseparably linked with investments because marketplace owners need to calculate required capital to sustain marketplaces in the medium-and long-term [12].Thus, future works can also examine possibilities of funding sources for data marketplaces, including the transition strategies (or roadmaps) to connect new funding to the creation of additional services or technology developments (see De Reuver et al. [158]).

Research Approaches
Our additional impressions after reading and analyzing the articles are as follows.We only found a few studies, e.g., Schomakers, Lidynia and Ziefle [17], Spiekermann and Korunovska [19], that conduct empirical investigations in non-technical literature.Case studies on data marketplaces that did reach the next phase of platform adoption would yield valuable insights into what business model choices lead to viability.Moreover, the many technology-focused studies hardly consider the link between practical problems, theories, and evaluations, such as is common in Design Science Research (DSR) approaches [159,160].DSR is further helpful in examining data marketplace business model configurations that do not yet exist, which is essential given the absence of highly successful data marketplaces businesses in practice.Stronger links between technical solutions and value-related problems would help focus data marketplace research on resolving practical problems.
The literature also hardly discusses solutions to some core non-technical challenges of data marketplaces, such as: defining data ownership [3], assessing data quality [3], lacking legal frameworks [116], lacking technical expertise and resources to operate the ecosystem [99], and unclear organizational structure [99].Thus, we generally suggest conducting various empirical research approaches such as case studies and grounded theory (see Sekaran and Bougie [161]) to understand those challenges in non-technical domains.

Conclusions
This study provides an overview of the state-of-the-art of data marketplace research.Specifically, we want to know whether research is scarce on topics that would advance data marketplaces toward commercialization.We find that the existing literature on data marketplaces is dominated by technical research, such as the discussion related to computational pricing and architecture.We highlight possible explanations about the dominance of technical research: the recent project financing availability that has pre-determined goals such as trusts and sovereignty.Moreover, most current works and research are still in their infancy; therefore, they focus on the technological advancement of data marketplaces.We also suggest future research agendas in the service, organizational, and finance domains, equipped with potential research approaches to advance marketplaces for data toward commercialization.
A limitation of this study is that the topic identification process is subject to the researchers' knowledge and interpretations about the topic, i.e., different readers may have different judgments.However, independently categorizing the present papers by different authors showed overall alignment.Moreover, as indicated in Section 2, some articles may have many overlapping topics.Because we attempted to classify an article into a specific category, we analyzed the central theme of the discussion by examining the research objectives, questions, and methods of articles.The study is also limited by its scope and the number of publications included in the analysis due to our criteria, e.g., a single database, the timeframe selection, and a paper quality check.Nonetheless, we argue that we have reached a sufficient level of saturation, i.e., analyzing more articles

Table 2 .
The service domain.

Table 3 .
The technical domain.

Table 4 .
The organization domain.

Table 5 .
The finance domain.