MyChEMBL: A Virtual Platform for Distributing Cheminformatics Tools and Open Data

MyChEMBL is an open virtual platform which provides a free, secure, standardised and easy to use chemoinformatics environment for bioactivity data mining, machine learning, application development, learning and teaching. The main technical features of myChEMBL along with its applications and future plans are discussed here.


Introduction
MyChEMBL [1] is an open virtual platform that combines public domain bioactivity data with open source web, database and chemoinformatics technologies.MyChEMBL consists of a Linux (Ubuntu) Virtual Machine (VM), with key installed components including a PostgreSQL version of the ChEMBL database [2] and the latest RDKit chemoinformatics toolkit and chemistry cartridge [3].

OPEN ACCESS
The primary aim of the system is to remove the technical hurdles often associated with building and deploying chemoinformatic platforms, thus allowing both novice and expert users easy access to domain-specific data and tools.In addition to the ChEMBL database and RDKit libraries, myChEMBL VM also provides secure local access to the ChEMBL Web Services [4], interactive IPython notebook tutorials [5], the phpPgAdmin PostgreSQL schema browser [6] and example KNIME [7] workflows.Furthermore, these components are linked together by middleware developed in-house; the latter abstracts common tasks, such as interaction with the database and API, networking, etc. Access to all of these tools and services, along with additional documentation, is provided through the myChEMBL LaunchPad landing page.

Results and Discussion
Based on the technical features of myChEMBL described above, the platform has several applications and advantages:  No Costs-myChEMBL uses exclusively free and open source tools and libraries, so it removes the expensive licensing costs often associated with similar applications.
 Security-myChEMBL runs locally behind a firewall, therefore the typical concerns regarding submission of sensitive data to web-based applications do not apply.
 Application Development-the source code is available for all myChEMBL applications, so developers can use this as a starting point for applications they wish to develop in the future.
 Ease of use-Due to the availability of interactive, web-and GUI-based tools, myChEMBL requires no prior programming experience or knowledge.
 Learning-myChEMBL provides a versatile platform for learning chemical data mining and cheminformatics in an intuitive and straightforward way.The combination of data with relevant pre-installed tools effectively lowers the 'activation barrier' and shifts the focus to hands-on programming and learning.
 Training-myChEMBL is a proven resource for training scientists on the use of essential tools in the field of chemoinformatics and computer-aided drug discovery.

Conclusions
In conclusion, the primary goal of the myChEMBL project (currently in its second release) has been to provide a truly open chemoinformatics platform, combining open data with open tools and tutorials.Although a fairly recent development, myChEMBL has already been adopted by both academic and industrial groups as a standardised chemoinformatics resource.Looking forward, we envisage broadening the scope of myChEMBL by integrating more open tools, such as Beaker [8], along with adding completely new functionality, such as a compound registration mechanism and bioactivity curation interface.The latter could be linked to an open electronic lab notebook (eLNB), thus offering a complete solution for reporting, storing and querying experimental data.Furthermore, it is hoped that the availability of a completely free, self-contained and extendable version of ChEMBL will catalyze further innovation and development in emerging economies and Open Science/Data projects in areas such as malaria and TB research [9].Finally, due to the open philosophy of this project, we encourage the community to provide feedback, new ideas, IPython notebooks or complete tools, in order to enhance and improve the current functionality.