1. Introduction
Big data and artificial intelligence have remained a global trend [
1,
2], and several types of data that existing analytical technologies cannot handle have been frequently encountered. Social movements have also increased attention toward areas not centered on these technologies. Rather than relying on a single data source, expectations have arisen in solving problems and finding new values in data through the distribution, exchange, and linking of data across various fields [
3,
4,
5]. Thus, data has become a transferrable and interchangeable resource in the digital economy [
6,
7].
Recently, various forms of data marketplaces have been launched as platforms [
5,
7,
8,
9,
10,
11]. Moreover, linked data [
12], ontology matching [
13], and the development of data standardization and data catalogs such as data catalog vocabulary (
https://www.w3.org/TR/vocab-dcat/) and semantic sensor network ontology (
https://www.w3.org/TR/vocab-ssn/) have enabled searches across databases that were closed to each domain. These services and technologies have made it easier to link and exchange data from different fields and have functions of a collaborative environment related to data, such as the discovery of related knowledge by structured knowledge for data utilization [
14], a communication environment that guarantees reliability by consortia [
9,
15], and an integrated environment for data analysis [
16,
17].
Although there are numerous data exchange services and platforms on the Web, they only provide unilateral information from the data providers. Most of the technologies, such as data catalogs and data merging methods, are for supporting data providers in making their data available in the marketplace. However, the means for data providers to learn what kinds of data users (the data buyers) desire and for what purpose are insufficient. In contrast, methods for data users to request data have been inadequately discussed. Therefore, the distribution and trading of valuable data has been hindered. To solve this problem, the following two approaches are necessary in the data marketplace:
In this paper, we propose the description items of data requests and present a matching platform, named “treasuring every encounter of data affairs” (TEEDA), for externalizing, sharing, and matching data requests with providable data in response to the above issues. Data requests, in this paper, are the needs of data users, and providable data are the information on data that can be provided to the marketplace by data providers. TEEDA facilitates connection between data providers and users who seek data to suit their needs; thus, data providers can learn the needs of users and provide suitable sets of data. In this study, in addition to describing the basic features of the matching platform, we analyzed two types of data information collected by TEEDA. Subsequently, we discuss their structural differences using variables, as well as the possibilities of data matching.
Previous papers have reported research on data marketplaces and data exchange platforms where data are traded as exchangeable economic goods [
18,
19]. These studies have explored game theory [
20], privacy models [
21,
22], a market model for innovative collaboration [
5], a secure model using blockchain [
10], pricing mechanisms of data [
23,
24], and the complex network approach to conceptualize data exchange platforms [
25]. Therefore, considering data matching in the data marketplace is a natural extension of these previous works. Furthermore, data platforms to encourage cross-disciplinary data-driven innovation have been proposed in the literature; examples of these platforms include DataHub [
16], Labbook [
17], and the DJ site (
https://datajacket.org/?lang=english) or DJ store [
14]. These platforms not only function as data portal sites, but also enable the discovery of data and sharing of analytical knowledge between various stakeholders in academia and industry. However, as discussed earlier, these platforms primarily provide unilateral information on data by the data providers themselves, which is insufficiently compatible with users’ calls for data. In contrast, Virtuora DX [
9], D-Ocean [
11], or Web-based IMDJ [
15] have functionalities that enable the sharing of users’ opinions via chat. However, data provided in response to these calls are in the form of free text, which does not define the description items necessary to appropriately express data attributes including variables. Moreover, data matching to achieve matching between different types of data sources, such as schema matching or record linkage, has also been previously discussed [
13,
26]. However, the primary focus of the current study is to not only explore the linkages between data themselves, but also match the data providers and users via metadata. Thus, the novelty of this study is not only proposing a description framework of users’ needs and providing a platform with functions to convey these needs to data providers, but also exploring data matching approaches.
Matching problems were first addressed via Gale and Shapley’s stable matching [
27], which is famously referred to as the marriage problem and has been since developed in various ways. In the data marketplace, considering a market where stakeholders are matched through data can be considered a natural extension of market design. As mentioned above, the data market is an emerging market, and to the best of our knowledge, the current study is the first to explore data matching in the data marketplace. Therefore, we made several assumptions in order to understand and discuss data matching in the data market. First, in addition to data providers and users, there are several other stakeholders, such as data brokers and analysts [
19,
28,
29,
30]. Although there is a matching problem considering two or more players [
31], for the sake of simplicity, we only considered the matching between the data providers and users. Second, it has been noted that data users do not always have sufficient knowledge about data, which makes it difficult to express the data [
14]. However, in this study, to simplify the model, we assumed that data users can sufficiently describe the data they want in the form of data requests. Third, for data matching, we considered a data-specific feature: duplicability. In the marriage problem, every participant can match at most one person, and this case is called one-to-one matching. In contrast, there is many-to-one matching for firms and workers [
32]. As data can be duplicated easily, we must consider many-to-many matching. In this paper, we discuss the matching possibility using the similarity of data structures based on common variables without using the concept of preference. The most important contribution of TEEDA is to make it possible to externalize the data users’ calls for data, which has not been addressed in previous studies and platform services.
The remaining part of this paper is organized as follows. In
Section 2, we explain the design of TEEDA based on the description items of data requests and providable data, and we demonstrate the functions of the matching platform. In
Section 3, we present the experimental details of this study. In
Section 4, we discuss the results obtained from our experiment and mention areas of future work. Finally, we provide concluding remarks in
Section 5.