Thirty Thousand 3D Models from Thingiverse

: This dataset contains ﬁles and geometrical analysis of 3D model data, acquired from the thingiverse online repository. More than thirty thousand stereolithography ﬁles (STL) were retrieved and analysed. The geometrical analysis of the respective models is presented along with model renderings in both GIF and PNG format, and pre-sliced machine instructions as GCode. This dataset is intended to be used as a basis for further research in Additive Manufacturing (AM), such as 3D printing time estimation, printability assessment or slicing algorithm development. All ﬁles retrieved are user-generated, with the respective user and associated licence presented in the overview. The dataset was acquired between 2016 and 2017.


Introduction
There is a large number of online repositories for digital models for consumers, e.g., https: //grabcad.com, https://3dwarehouse.sketchup.com, or https://www.tinkercad.com/things. Some of these repositories are directly aimed for digital models for additive manufacturing or 3D printing. Additive manufacturing (AM) and 3D printing are used synonymously in this work. The most prominent is thingiverse [1]. Such repositories enable users to share their work, acquire and mix other user's content. Furthermore, thingiverse caters for its users by offering community functionality that enables discussions on specific 3D printing related topics, such as models, parameter selection, handling and management issues, and 3D printer selection.
The focus of thingiverse is consumer-centric with a number of abilities to upload, host, share, re-mix, discuss, and acquire objects via 3rd party providers. Monetisation is also enabled in some online repositories, i.e., sell, their designs. An indication for this community centricity is the availability of public statistics on the object, e.g., download numbers and view counter, the integration of third party social media capabilities, e.g., share on Facebook [2] or Twitter [3], and the availability of "like" and other indicators of appreciation.
Thingiverse can be regarded as the most prominent example of 3D printing model data by the metric of results on the Google Scholar index for scientific publications. On this index searching for thingiverse yields 2640 results compared to 54 results for Repables, 174 results for YouMagine, 1010 results for GrabCAD, 740 results for tinkerCAD, 1970 results for 3D Warehouse, and 134 results for MakePrintable. All results exclude citations and patents. These are other available repositories for digital models for 3D printing. Furthermore, thingiverse has been identified as the largest model repository for this domain in Alcock et al. [4], as one of two most common sources for digital models in Evans [5] and as a highly recommended source for digital models by Mallon [6].

Acquisition
The model files were downloaded from thingiverse using a custom bash script. Models on thingiverse are part of so called Things, which are identified by a unique integer number. Multiple models and other files, such as renderings or pre-sliced machine instructions, can be part of a Thing. Each Thing has one creator. Furthermore, each Thing can have: Each Thing can be directly access via a Web browser by its ThingID. For the acquisition of the models, an upper limit of 1,609,996 of the ThingID was set following a previous experiment. In the data collection script, 100,000 ThingIDs were randomly tried for availability. If a Thing with the randomly tried ThingID was available, the first file from it was downloaded and the above information stored.
The downloaded files were consecutively analysed geometrically by custom scripts written in BASH and Python. The files were furthermore converted to GCode, using the Slic3r software, and AMF. See Section 5 for details on the utilized software. Renderings of the models were created in PNG and GIF format.
The usage of the acquired data is covered under the German Urheberrechts-Wissensgesellschafts-Gesetz. (https://www.bmjv.de/SharedDocs/Gesetzgebungsverfahren/Dokumente/BGBl-UrhWissG. html) which allows for the re-distribution of 15% of the original content for the purpose of research and, furthermore, allows for the text and data mining of existing corpora. Thingiverse claims to have 981,600 models (https://www.thingiverse.com/about) indexed, the extracted 31,121 models are far less than the allowed 15%. Under US law, fair-use (https://www.copyright.gov/fair-use) of this data is claimed as no commercial interest is pursued by the author.

Results
A total of 31,121 STL files were downloaded using the method described in Section 2. The data is uploaded to Zenodo under DOI: 10.5281/zenodo.1098527. For this manuscript, an overview of the retrieved data is presented as an HTML (Hypertext Markup Language) document, with links to the respective files. This overview document is available as supplementary material to this manuscript, see below. Zenodo does not allow large numbers of individual files in depositions, therefore, the deposition contains aggregations as archives. The archives are of ZIP format and are separated by file type. The following archives are available. Archives with fewer than 31,121 files are due to conversion or creation errors.
To illustrate the nature of the acquired files, the following figures are presented that have the highest numbers of downloads.

•
Thing ID: 86187, "Royal manticore navy", by "jamesH" with 10,4701 downloads; See    In Section 4, the content of the individual columns is explained. Furthermore, the overview is provided as a Comma-separated Values (CSV) file for ease of use in software.

Dataset Description
The dataset is provided as an overview in form of an HTML document. The individual columns of the tabular representation are presented in the following with short explanations. Missing entries are indicated by N/A. From the CSV overview document 57 entries were removed due to malfunctioned encoding. The CSV file contains 31,064 entries.
Vertices, the number of vertices in the model file as reported by the analysis script utilizing the freecad library.

Software
In this section, the utilized software is described. The complete analysis is performed on a desktop computer (Intel(R) Core(TM) i5-4570, 16 GiB RAM) running the Linux operating system (ArchLinux, https://www.archlinux.org).

GPX
GPX [13] is a conversion software that prepares files for 3D printing on Makerbot [14] 3D printers. The Makerbot 3D printers use a proprietary GCode format called X3G. Makerbot is the owner of thingiverse. This software is used to calculate a printing time estimation. The version used is 2.5.3.

Slic3r
Slic3r [15] is an open source slicing software that prepares STL model files for 3D printing by producing a machine instruction file, i.e., a GCode file. The parameters and configuration used to create the GCode files are presented in the Appendix A. The version used is 0.9.11-dev.

Admesh
Admesh [16] is a software to read, analyse and repair STL, both binary and ASCII, files. It is used to generate geometrical information on the model. The version used is 0.98.2.

Metric Hash
In order to make model files comparable, a metric is devised that is based on the geometrical representation of the model. For this metric, random samples of 20% of all vertices are selected pairwise. The distance between those two vertices is calculated and normalized by the maximum distance between all vertices in this sample. Ten buckets between 0.0 and 1.0 are created with equal distance. The metric is derived by counting how often a pairwise distance occurs in each of these buckets. The 10 values are then converted to a hexadecimal representation for readability. This metric is designed to yield similar results for geometrically similar objects. Evaluation of this metric is not yet performed. See Appendix B for the implementation of the metric calculation as a BASH [17] script that converts the individual elements to a hexadecimal representation.
The following Figure 4 illustrates the principle to derive the metric for similarity. In this figure, the vertices V x with x = 1, . . . , 15, are all vertices for one single layer or height of the model. From the totality of the vertices in each layer n, a subset is randomly selected and the distances between each vertex pair is calculated, see d m with m = 1, . . . , 6. The distances are then normalized and processed as described in Appendix B.

Summary
This dataset is provided to facilitate research in AM. User-generated content from the thingiverse repository was selected, retrieved and analyzed to further research in areas such as historical development of user-generated content and associated meta-data, slicing algorithm development, 3D printing time estimation, or complexity analysis. The data was retrieved in an automated manner via the repository's website. The data was analyzed and processed using open-source software. This manuscript presents the overview over the complete dataset as an HTML document.

Abbreviations
The following abbreviations are used in this manuscript: