An Organized Repository of Ethereum Smart Contracts’ Source Codes and Metrics
Abstract
:1. Introduction
2. Related Work
2.1. Previous Literature on Software Corpus Analysis
2.2. Static Analysis on Smart-Contract Code
2.3. Related Projects
2.3.1. GitHub
- The smart-contract source codes collected in GitHub typically do not have a direct reference to smart contracts deployed on the blockchain through an Ethereum address; therefore, it is hard to find out whether it has been really tested or used on the blockchain.
- GitHub does not implement a search engine to filter smart contracts based on particular software metrics, such as the number of modifiers or payables. This is due to the fact that some metrics are specific to the type of language employed to write smart contracts, i.e., Solidity.
- In GitHub, there is no information on smart contracts’ use in a real blockchain scenario, on the number of transactions invoking smart contracts or on the number of tokens associated with each smart contract.
- GitHub does not provide smart-contract ABIs or Opcodes.
2.3.2. Ethereum Block Explorers
- Ethplorer (https://ethplorer.io/) provides an API to access many Ethereum data, such as the balances for a specified token and the description of a specific address, but it does not allow access to the smart contracts’ code. The full documentation of the Ethpoler API is available at the following address (https://github.com/EverexIO/Ethplorer/wiki/Ethplorer-API). The requests to API are limited to 5 per second, 50/min, 200/h, 2000/24 h and 3000/week.
- EtherChain (https://etherchain.org/) is an explorer for the Ethereum blockchain. Unlike Ethplorer, it claims to provide smart contract code, even though it actually displays the contract byte code and the constructor arguments for a specific smart contract’s address. EtherChain provides the API just to access the Oracle gas price predictions (https://www.etherchain.org/api/gasPriceOracle), but not the Ethreum data. If the users want to gather Ethereum data from EtherChain, they need to parse the HTML code.
- BlockScout (https://blockscout.com/poa/xdai/) provides an API to access the Ethereum data. It claims to have an API to access only the source code of a few verified smart contracts. Anyway, the addresses list of the verified smart contracts is not available in BlockScout.
- EtherScan allows for exploration and searching of the Ethereum blockchain for smart contracts. However, when downloading the smart contracts’ source code, the block explorer presents some limitation. First, smart contracts’ data and number are huge (on the Giga scale, based on our estimation), but there is a limited API rate of 100 submissions per day per user to retrieve just a smart contract, making the complete download of data an impossible endeavour (https://etherscan.io/apis#contracts). Second, the EtherScan’s API does not provide facilities to obtain a list of the smart contracts’ addresses, as the existing API calls mainly allow navigation from one block to another. Third, a researcher cannot directly and easily explore the smart contract’s source code but, rather, has to first inspect any block in Ethereum and then look for all the transactions that involve an address associated with the smart contract.
3. Research Methodology
- data retrieving,
- data cleaning,
- data modelling and
- data querying.
3.1. Retrieving Data
3.2. Cleaning Data
3.3. Modelling Data
3.3.1. Smart Contracts’ Intrinsic Metrics
3.3.2. Smart Contracts’ Extrinsic Metric
3.4. Filtered Data
- Relational databases are prone to deterioration when data sets overcome a size threshold, while a document-oriented database such as MongoDB comes with an inbuilt load balancer, which makes it a better solution in applications with high data load [23]. We update MongoDB each day to generate the data archive.
- Unlike relational databases where data is stored in rows and columns, document-oriented databases store data in documents. The documents typically use a structure similar to JSON (JavaScript Object Notation); they indeed provide a natural way to model data that is closely aligned with object-oriented programming. Each document is considered an object in object-oriented programming; similarly, each document is a JSON in document-oriented database. The concept of a schema in document databases is dynamic: every document might contain a different number of fields. This is useful when modeling unstructured and polymorphic data. Also, document databases allow robust queries: any combination of fields in the document can be combined for querying data [24].
3.5. User Interface
3.5.1. Smart Corpus HTML User Interface
- At the top, the user can find the form to filter the smart contracts. The form is made of a number of drop-down lists, each one corresponding to a different metric and a submit button to perform the research. The GUI form allows the user to inspect smart contracts based on some metadata, such as the “pragma version”, and software metrics, such as the numbers of “modifiers” and/or the numbers of “payable”.
- Below the form, the smart contracts filtered by the user are displayed. For readability, only a part of the smart-contract metrics are presented in the table layout format. Each column header in the table indicates the name of a metric associated to smart contracts. While the HTML GUI displays just some metrics, the user can access all the metrics and the smart contracts’ source codes by selecting the checkbox displayed on the right of the smart-contract address and by clicking on the red button “download”. The user can also access the original repository where the smart contract was retrieved, i.e., the EtherScan service.
3.5.2. Smart Corpus GraphQL Application
- To reduce possible overload of data transfer relative to Representational State Transfer (REST)-like web service models in terms of both the amount of data unnecessarily transferred and the number of separate queries required to do it.
- To reduce the potential of errors caused by invalid queries on the part of the client. In particular, with the GQL application, the user can execute “type introspection”, i.e., the user can examine the type or properties of an object at runtime. For example, thanks to introspection queries, the user can find out both the intrinsic and the extrinsic metrics associated with a specific smart-contract while typing the query.
3.6. Use Case
- connect to the service through the link: https://aphd.github.io/smac-corpus/,
- select the option “version 6.0” from the drop-down menu entitled “pragma version”,
- select the option “greater than zero” from the drop-down menu entitled “number of payables” and
- submit the form by clicking on the button “submit”.
4. Results
5. Conclusions and Future Works
Author Contributions
Funding
Conflicts of Interest
Abbreviations
ABI | The smart-contract Application Binary Interface |
GQL | Graph Query Language |
JSON | JavaScript Object Notation |
SLOC | Source lines of code |
REST | Representational State Transfer |
Appendix A. Queries
{ metrics(query:{functions_gt: 20}) { adress events functions modifiers payable } }
{ "data": { "metrics": [ { "contractAddress": "0xb7f4c286851cbf0cbf2fe8ebf40412b196c0e8ad", "events": 7, "functions": 27, "modifiers": 1, "payable": 1 }, { "contractAddress": "0x755cebe8cc53c7cb1e1bb641026a17d37d4aea91", "events": 4, "functions": 31, "modifiers": 1, "payable": 4 }, { "contractAddress": "0xb92aa4a864daf0d6a509e73a9364feba44384965", "events": 3, "functions": 24, "modifiers": 1, "payable": 1 }, ... } }
{ metrics(query:{address_eq: "0x536c7efeebff067a69393133b1c87a163a6b0598"}) { adress transactions balance } }
{ "data": { "metrics": [ { "contractAddress": "0x536c7efeebff067a69393133b1c87a163a6b0598", "transactions": 639 , "balance": 0 Ether } ] } }
References
- O’Donovan, P.; O’Sullivan, D.T.J. A Systematic Analysis of Real-World Energy Blockchain Initiatives. Future Internet 2019, 11, 174. [Google Scholar] [CrossRef] [Green Version]
- Zheng, Z.; Xie, S.; Dai, H.N.; Chen, W.; Chen, X.; Weng, J.; Imran, M. An overview on smart contracts: Challenges, advances and platforms. Future Gener. Comput. Syst. 2020, 105, 475–491. [Google Scholar] [CrossRef] [Green Version]
- Shala, B.; Trick, U.; Lehmann, A.; Ghita, B.; Shiaeles, S. Blockchain and Trust for Secure, End-User-Based and Decentralized IoT Service Provision. IEEE Access 2020, 8, 119961–119979. [Google Scholar] [CrossRef]
- Ibba, S.; Pinna, A.; Lunesu, M.; Marchesi, M.; Tonelli, R. Initial Coin Offerings and Agile Practices. Future Internet 2018, 10, 103. [Google Scholar] [CrossRef] [Green Version]
- Mense, A.; Flatscher, M. Security Vulnerabilities in Ethereum Smart Contracts. In Proceedings of the 20th iiWAS, Yogyakarta, Indonesia, 19 November 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 375–380. [Google Scholar] [CrossRef]
- Amani, S.; Bégel, M.; Bortin, M.; Staples, M. Towards Verifying Ethereum Smart Contract Bytecode in Isabelle/HOL. In Proceedings of the 7th ACM SIGPLAN CCP, Los Angeles, CA, USA, 8 January 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 66–77. [Google Scholar] [CrossRef]
- Tran, H.; Menouer, T.; Darmon, P.; Doucoure, A.; Binder, F. Smart Contracts Search Engine in Blockchain. In Proceedings of the 3rd ICFNDS, Paris, France, 1 July 2019; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
- Tonelli, R.; Destefanis, G.; Marchesi, M.; Ortu, M. Smart Contracts Software Metrics: A First Study. arXiv 2018, arXiv:cs.SE/1802.01517. [Google Scholar]
- Jaccheri, L.; Osterlie, T. Open Source Software: A Source of Possibilities for Software Engineering Education and Empirical Software Engineering. In Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development, Minneapolis, MN, USA, 20–26 May 2007; p. 5. [Google Scholar] [CrossRef] [Green Version]
- Bragagnolo, S.; Rocha, H.; Denker, M.; Ducasse, S. Ethereum Query Language. In Proceedings of the 1st WETSEB, Gothenburg, Sweden, 7 May 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; Davis, J. Open Source Software Reliability Model: An Empirical Approach. In Proceedings of the Fifth Workshop on Open Source Software Engineering; Association for Computing Machinery: New York, NY, USA, 2005; pp. 1–6. [Google Scholar] [CrossRef]
- Kratzke, N. Volunteer Down: How COVID-19 Created the Largest Idling Supercomputer on Earth. Future Internet 2020, 12, 98. [Google Scholar] [CrossRef]
- Gabel, M.; Su, Z. A Study of the Uniqueness of Source Code. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, Santa Fe, NM, USA, 7 November 2010; Association for Computing Machinery: New York, NY, USA, 2010; pp. 147–156. [Google Scholar] [CrossRef] [Green Version]
- Tempero, E.; Anslow, C.; Dietrich, J.; Han, T.; Li, J.; Lumpe, M.; Melton, H.; Noble, J. The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies. In Proceedings of the 2010 Asia Pacific Software Engineering Conference, Sydney, NSW, Australia, 30 November–3 December 2010; pp. 336–345. [Google Scholar]
- Hegedus, P. Towards Analyzing the Complexity Landscape of Solidity Based Ethereum Smart Contracts. In Proceedings of the 2018 IEEE/ACM 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), Gothenburg, Sweden, 27 May–3 June 2018; pp. 35–39. [Google Scholar]
- Pinna, A.; Ibba, S.; Baralla, G.; Tonelli, R.; Marchesi, M. A Massive Analysis of Ethereum Smart Contracts Empirical Study and Code Metrics. IEEE Access 2019, 7, 78194–78213. [Google Scholar] [CrossRef]
- Pierro, G.A.; Tonelli, R. PASO. Conference Proceedings on Object-Oriented Programming Systems, Languages, and Applications. 2020. Available online: https://ieeexplore.ieee.org/document/9050263 (accessed on 10 July 2020).
- Loeliger, J. Version Control with Git; O’Reilly Media: Sebastopol, Calif, 2012. [Google Scholar]
- Gousios, G.; Spinellis, D. Mining Software Engineering Data from GitHub. In Proceedings of the 39th International Conference on Software Engineering Companion, Buenos Aires, Argentina, 20–28 May 2017; pp. 501–502. [Google Scholar] [CrossRef]
- Bistarelli, S.; Mazzante, G.; Micheletti, M.; Mostarda, L.; Tiezzi, F. Analysis of Ethereum Smart Contracts and Opcodes. In Advanced Information Networking and Applications; Barolli, L., Takizawa, M., Xhafa, F., Enokido, T., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 546–558. [Google Scholar]
- Chidamber, S.R.; Kemerer, C.F. Towards a Metrics Suite for Object Oriented Design. In Proceedings of the Conference Proceedings on Object-Oriented Programming Systems, Languages, and Applications, Phoenix, AZ, USA, 6–11 October 1991; Association for Computing Machinery: New York, NY, USA, 1991; pp. 197–211. [Google Scholar] [CrossRef] [Green Version]
- Chodorow, K. MongoDB: The Definitive Guide; O’Reilly Media, Inc.: Newton, MA, USA, 2013. [Google Scholar]
- Diogo, M.; Cabral, B.; Bernardino, J. Consistency Models of NoSQL Databases. Future Internet 2019, 11, 43. [Google Scholar] [CrossRef] [Green Version]
- Baker Effendi, S.; van der Merwe, B.; Balke, W.T. Suitability of Graph Database Technology for the Analysis of Spatio-Temporal Data. Future Internet 2020, 12, 78. [Google Scholar] [CrossRef]
Project’s Name | Home Page | REST API URL | Limitations |
---|---|---|---|
GitHub | https://github.com/ | https://developer.git... | Some repositories have restricted access. |
Ethplorer | https://ethplorer.io/ | https://api.ethplorer... | Requests are limited to 3000/week. |
EtherScan | https://etherscan.io/ | https://etherscan... | Smart contracts’ addresses are not immediately available. |
EtherChain | https://www.etherch... | https://www.etherch... | Smart contracts’ source codes are not available. |
BlockScout | https://blockscout.com/ | https://blockscout.com... | Smart contracts’ source codes are not available. |
Name | Description |
---|---|
Pragma | “Pragma” indicates which version of Solidity compiler is used to prevent issues with future compiler versions. |
SLOC | “SLOC” indicates the number of lines in a smart contracts’ source code. |
Modifiers | “Modifiers” indicates the number of function modifiers in a smart-contract. |
Payable | “Payable” indicates the number of payable functions in a smart-contract. |
Mapping | “Mapping” indicates the number of variables of mapping types in a smart-contract. |
Address | “Address” indicates the number of variables of address types in a smart-contract. |
Name | Description |
---|---|
Transactions | “Transactions” represent the total number of transactions generated by the smart contract (sent or received). |
Balance | “Balance” is the amount of crypto coins associated with a smart-contract address. |
EtherValue | “EtherValue” is the dollar value associated with a smart-contract address. |
Token | “Token” is the value for each token associated with a smart-contract address. |
Last_seen | “Last_seen” is the timestamp of the last time that the smart contract was used (sent or received). |
First_seen | “First_seen” is the timestamp of the first time that the smart contract was used (sent or received). |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pierro, G.A.; Tonelli, R.; Marchesi, M. An Organized Repository of Ethereum Smart Contracts’ Source Codes and Metrics. Future Internet 2020, 12, 197. https://doi.org/10.3390/fi12110197
Pierro GA, Tonelli R, Marchesi M. An Organized Repository of Ethereum Smart Contracts’ Source Codes and Metrics. Future Internet. 2020; 12(11):197. https://doi.org/10.3390/fi12110197
Chicago/Turabian StylePierro, Giuseppe Antonio, Roberto Tonelli, and Michele Marchesi. 2020. "An Organized Repository of Ethereum Smart Contracts’ Source Codes and Metrics" Future Internet 12, no. 11: 197. https://doi.org/10.3390/fi12110197
APA StylePierro, G. A., Tonelli, R., & Marchesi, M. (2020). An Organized Repository of Ethereum Smart Contracts’ Source Codes and Metrics. Future Internet, 12(11), 197. https://doi.org/10.3390/fi12110197