Assessing the impact of chemicals on the environment and addressing subsequent issues are two central challenges to their safe use. Environmental data are continuously expanding, requiring flexible, scalable, and extendable data management solutions that can harmonize multiple data sources with potentially differing nomenclatures or levels of specificity. Here, we present the methodological steps taken to construct a rule-based labeled property graph database, the “Meta-analysis of the Global Impact of Chemicals” (MAGIC) graph, for potential environmental impact chemicals (PEIC) and its subsequent application harmonizing multiple large-scale databases. The resulting data encompass 16,739 unique PEICs attributed to their corresponding chemical class, stereo-chemical information, valid synonyms, use types, unique identifiers (e.g., Chemical Abstract Service registry number CAS RN), and others. These data provide researchers with additional chemical information for a large amount of PEICs and can also be publicly accessed using a web interface. Our analysis has shown that data harmonization can increase up to 98% when using the MAGIC graph approach compared to relational data systems for datasets with different nomenclatures. The graph database system and its data appear more suitable for large-scale analysis where traditional (i.e., relational) data systems are reaching conceptional limitations.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited