Pricing Personal Data Based on Data Provenance
Abstract
:1. Introduction
- A creative pricing model is proposed. The pricing model includes two functions: price setting and pricing. A flexible set of pricing methods based on the p-norm [14] is proposed to provide the possibility of tuning and adapting to the pricing strategy for the data market. The pricing model satisfies three required properties: arbitrage-free, monotonic, and bounded.
- Our pricing model first sets prices for source tuples according to their importance and then makes query pricing based on data provenance, which considers both the importance of the data itself and the relationships between the data.
- An exact algorithm is used to calculate the exact price of a query with exponential complexity. Furthermore, an easy approximate algorithm that can calculate the approximate price of a query in polynomial time is devised.
2. Data Provenance
2.1. Why-Provenance
- .
- = .
- = .
- = .
2.2. Where-Provenance and How-Provenance
2.3. Problems with Current Provenance
3. Pricing Data
3.1. Pricing Model
3.2. Minimal Provenance
- such that . Build a database instance . Because denotes a minimal provenance of ,Moreover, is not the provenance of . In that case, cannot be a minimal provenance,Thus,Since , and . This result violates .
- such that . Build a database instance . Because denotes a minimal provenance of ,Moreover, is not the provenance of . In that case, cannot be a minimal provenance,Thus,Since , and . This result violates .
- so that or . Build a database instance . Because denotes a minimal provenance of ,Moreover, is not the provenance of since . Additionally, cannot be a provenance of .Since such that ,Thus,Since , and . This result violates .
3.3. p-Norm
3.4. Pricing Function
- Equivalent queries should have the same price.
- The pricing function is monotonic.
- The pricing function is bounded.
- The pricing function is arbitrage-free.
4. Pricing Algorithm
4.1. Exact Algorithm
Algorithm 1: Exact pricing. |
|
4.2. Approximate Algorithm
4.2.1. Define
Algorithm 2: Approximate pricing. |
|
4.2.2. Approximability
5. Experimental Results
5.1. Effectiveness
5.2. Efficiency
6. Related Work
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Shen, Y.C.; Guo, B.; Shen, Y.; Duan, X.L.; Dong, X.Q.; Zhang, H. A pricing model for Big Personal Data. Tsinghua Sci. Technol. 2016, 21, 482–490. [Google Scholar] [CrossRef]
- Guo, B.; Li, Q.; Duan, X.L.; Shen, Y.C.; Dong, X.Q.; Zhang, H.; Shen, Y.; Zhang, Z.L.; Luo, J. Personal Data Bank: A New Mode of Personal Big Data Asset Management and Value-Added Services Based on Bank Architecture. Chin. J. Comput. 2017, 40, 126–143. [Google Scholar] [CrossRef]
- Durkee, D. Why cloud computing will never be free. Commun. ACM 2010, 53, 62–69. [Google Scholar] [CrossRef]
- Moiso, C.; Minerva, R. Towards a user-centric personal data ecosystem The role of the bank of individuals’ data. In Proceedings of the International Conference on Intelligence in Next Generation Networks, Berlin, Germany, 8–11 October 2012; pp. 202–209. [Google Scholar]
- Chen, R.; Fung, B.C.M.; Mohammed, N.; Desai, B.C.; Wang, K. Privacy-preserving trajectory data publishing by local suppression. Inf. Sci. 2013, 231, 83–97. [Google Scholar] [CrossRef] [Green Version]
- Ng, I.C.L.; Ho, S.Y. Creating New Markets in the Digital Economy: Value and Worth; Cambridge University Press: Cambridge, UK, 2014; pp. 124–125. [Google Scholar]
- Koutris, P.; Upadhyaya, P.; Balazinska, M.; Howe, B.; Dan, S. Query-Based Data Pricing. J. ACM 2015, 62, 1–44. [Google Scholar] [CrossRef]
- Dai, C.; Dan, L.; Bertino, E.; Kantarcioglu, M. An Approach to Evaluate Data Trustworthiness Based on Data Provenance. In Proceedings of the Workshop on Secure Data Management, Auckland, New Zealand, 24 August 2008; pp. 82–98. [Google Scholar]
- Huang, L.; Cheng, H.B. Query Optimization Based on Data Provenance. Adv. Mater. Res. 2011, 186, 586–590. [Google Scholar] [CrossRef]
- Xing, N.; Kapoor, R.; Glavic, B.; Gawlick, D.; Zhen, H.L.; Radhakrishnan, V. Provenance-Aware Query Optimization. In Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 April 2017; pp. 473–484. [Google Scholar]
- Buneman, P.; Khanna, S.; Wang, C.T. Why and Where: A Characterization of Data Provenance. In Proceedings of the International Conference on Database Theory, London, UK, 4–6 January 2001; pp. 316–330. [Google Scholar]
- Cui, Y.; Widom, J. Practical Lineage Tracing in Data Warehouses. In Proceedings of the 16th International Conference on Data Engineering (Cat. No.00CB37073), San Diego, CA, USA, 29 February–3 March 2000; pp. 367–378. [Google Scholar]
- Green, T.J.; Karvounarakis, G.; Tannen, V. Provenance semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Beijing, China, 11–13 June 2007; pp. 31–40. [Google Scholar]
- Gallone, F. Quantum Mechanics in Hilbert Space; World Scientific Publishing: Singapore, 2015; pp. 611–696. [Google Scholar]
- Cheney, J.; Chiticariu, L.; Tan, W.C. Provenance in Databases: Why, How, and Where. Found. Trends Databases 2007, 1, 379–474. [Google Scholar] [CrossRef]
- Muschalle, A.; Stahl, F.; Löser, A.; Vossen, G. Pricing Approaches for Data Markets. In Proceedings of the 38th International Conference on Very Large Databases, Istanbul, Turkey, 27 August 2012; pp. 129–144. [Google Scholar]
- Balazinska, M.; Howe, B.; Suciu, D. Data Markets in the Cloud: An Opportunity for the Database Community. In Proceedings of the 37th International Conference on Very Large Data Bases, Seattle, Washington, 1 August 2011; pp. 1482–1485. [Google Scholar]
- Xiao, Y.L. Optimization Algorithms for the Minimum-Cost Satisfiability Problem. Ph.D. Thesis, Department of Electrical and Computer Engineering, North Carolina State University, North Carolina, NC, USA, 2004. [Google Scholar]
- Khanna, S.; Trevisan, L.; Williamson, D.P. The Approximability of Constraint Satisfaction Problems. SIAM J. Comput. 2001, 30, 1863–1920. [Google Scholar] [CrossRef] [Green Version]
- MovieLens 1M Dataset. 2003. Available online: https://grouplens.org/datasets/movielens/1m/ (accessed on 16 August 2019).
- Niu, C.Y.; Zheng, Z.Z.; Wu, F.; Tang, S.J.; Gao, X.F.; Chen, G.H. Unlocking the Value of Privacy: Trading Aggregate Statistics over Private Correlated Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 19–23 August 2018; pp. 2031–2040. [Google Scholar]
- Tang, R.; Wu, H.; Bao, Z.; Bressan, S.E.; Valduriez, P. The Price is Right: Models and Algorithms for Pricing Data. In Proceedings of the 24th International Conference on Database and Expert Systems Applications, Prague, Czech Republic, 29 August 2013; pp. 380–394. [Google Scholar]
- Balazinska, M.; Howe, B.; Koutris, P.; Dan, S.; Upadhyaya, P. A Discussion on Pricing Relational Data; Springer: Berlin, Germany, 2013; pp. 167–173. [Google Scholar]
- Deep, S.; Koutris, P. The Design of Arbitrage-Free Data Pricing Schemes. In Proceedings of the 20th International Conference on Database Theory, Venice, Italy, 21–24 March 2017; pp. 1–18. [Google Scholar]
- Lin, B.R.; Kifer, D. On arbitrage-free pricing for general data queries. In Proceedings of the 40th International Conference on Very Large Databases, Hangzhou, China, 1–5 September 2014; pp. 757–768. [Google Scholar]
- Deep, S.; Koutris, P. QIRANA: A Framework for Scalable Query Pricing. In Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, 14–19 May 2017; pp. 699–713. [Google Scholar]
- Gkatzelis, V.; Aperjis, C.; Huberman, B.A. Pricing private data. Electron. Mark. 2015, 25, 109–123. [Google Scholar] [CrossRef]
- Riederer, C.; Erramilli, V.; Chaintreau, A.; Krishnamurthy, B.; Rodriguez, P. For sale: Your data: by: You. In Proceedings of the 10th ACM SIGCOMM Workshop on Hot Topics in Networks, Cambridge, MA, USA, 14–15 November 2011; pp. 1–6. [Google Scholar]
- Li, C.; Li, D.Y.; Miklau, G.; Suciu, D. A Theory of Pricing Private Data. Commun. ACM 2017, 60, 79–86. [Google Scholar] [CrossRef]
- Acquisti, A.; John, L.; George, L. What Is Privacy Worth? J. Leg. Stud. 2013, 42, 249–274. [Google Scholar] [CrossRef] [Green Version]
- Niu, C.Y.; Zheng, Z.Z.; Wu, F.; Gao, X.F.; Chen, G.H. Achieving Data Truthfulness and Privacy Preservation in Data Markets. IEEE Trans. Knowl. Data Eng. 2019, 31, 105–119. [Google Scholar] [CrossRef]
- Nget, R.; Cao, Y.; Yoshikawa, M. How to Balance Privacy and Money through Pricing Mechanism in Personal Data Market. In Proceedings of the 2017 SIGIR Workshop On eCommerce, Tokyo, Japan, 7–11 August 2017; pp. 1–10. [Google Scholar]
ID | Name | Course No. |
---|---|---|
t1 | John | 00602001 |
t2 | Tom | 03310117 |
t3 | James | 04110235 |
t4 | Tom | 05110208 |
ID | Course No. | Grade | Credit |
---|---|---|---|
t5 | 00602001 | 85 | 2 |
t6 | 00602001 | 90 | 2 |
t7 | 03310117 | 95 | 3 |
t8 | 05110208 | 75 | 1 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shen, Y.; Guo, B.; Shen, Y.; Wu, F.; Zhang, H.; Duan, X.; Dong, X. Pricing Personal Data Based on Data Provenance. Appl. Sci. 2019, 9, 3388. https://doi.org/10.3390/app9163388
Shen Y, Guo B, Shen Y, Wu F, Zhang H, Duan X, Dong X. Pricing Personal Data Based on Data Provenance. Applied Sciences. 2019; 9(16):3388. https://doi.org/10.3390/app9163388
Chicago/Turabian StyleShen, Yuncheng, Bing Guo, Yan Shen, Fan Wu, Hong Zhang, Xuliang Duan, and Xiangqian Dong. 2019. "Pricing Personal Data Based on Data Provenance" Applied Sciences 9, no. 16: 3388. https://doi.org/10.3390/app9163388
APA StyleShen, Y., Guo, B., Shen, Y., Wu, F., Zhang, H., Duan, X., & Dong, X. (2019). Pricing Personal Data Based on Data Provenance. Applied Sciences, 9(16), 3388. https://doi.org/10.3390/app9163388