An Open Access Data Set Highlighting Aggregation of Dyes on Metal Oxides

: The adsorption of a dye to a metal oxide surface such as TiO 2 , NiO and ZnO leads to deprotonation and often undesirable aggregation of dye molecules, which in turn impacts the photophysical properties of the dye. While controlled aggregation is useful for some applications, it can result in lower performance for dye-sensitized solar cells. To understand this phenomenon better, we have conducted an extensive search of the literature and identiﬁed over 4000 records of absorption spectra in solution and after adsorption onto metal oxide. The total data set comprises over 3500 unique compounds, with observed absorption maxima in solution and after adsorption on the semiconductor electrode. This data may serve to provide further insight into the structure-property relationships governing dye-aggregation behaviour. Dataset: The data set can be queried at https://vvishwesh.github.io/dyeaggregation. Search results can be exported as CSV ﬁles.


Background & Summary
Aggregation in small molecules has been shown to alter the shape of the absorption bands [1]. This phenomenon has been used advantageously in organic photoconductive materials for xerography [2], sensors [3], colour filters for fluorescence detection [4], organic photovoltaics [5,6] and optoelectronics [7][8][9]. For dye sensitized solar cells (DSSCs) in particular, aggregation often reduces device efficiency. This is largely due to dye-dye interactions on the metal oxide (during sensitization) surface that facilitate the formation of aggregates [1,6]. These events in turn lead to shifts in the absorption spectrum compared to the dye in solution. While H-aggregates lead to a hypsochromic/blue shift, J-aggregates result in bathochromic/red shift of the absorption spectra. Such shifts can be either substantial (indicating considerable aggregation) or negligible [10]. While J-aggregation can enhance the device performance (due to broadening of the absorption spectra), H-aggregation invariably lowers the light-harvesting capability.
In identifying new dyes for DSSCs, multiple criteria such as broad absorption spectra, stable metal oxide binding and reduced aggregation need to be considered. Among these, aggregation is commonly occurring phenomenon that is not easily eliminated. In many cases, cholic acid derivatives such as chenodeoxycholic acid (CDCA) are used [11][12][13][14] to prevent π − π stacking, thereby reducing aggregation but has the unwanted side effect of lowering the surface concentration (dye loading) of the sensitizer. Owing to the planar and extended π configurations, organic dyes show an increased tendency to aggregate leading to inefficient electron injection and lower performance. In order to impede the process, modifications to the dye structure include addition of long-chain alkyl groups or bulky substituents. In a recent study, CDCA was attached covalently to triarylamine sensitizers with encouraging results [15].
We have recently proposed a machine learning route to predict whether dye adsorption on titania is likely to induce a change in its absorption characteristics [10]. In this article, we present a new data set containing experimentally determined UV/vis absorption maxima in solution and on the metal oxide. This work builds on previously published research on dye sensitized solar cells [16] and aims to enable the scientific community in conducting new studies into data-driven materials discovery for photovoltaics and optoelectronic applications.

Data Acquisition
The data was extracted from over 1500 literature articles. For each dye, the absorption maxima in solution (λ soln max ) and after deposition on the metal oxide (λ MO max ) were taken from tabulated values, or alternatively from the the images of the spectra. Where possible, the chemical name to structure parser (OPSIN [17]) was used, failing which, structures were drawn using MarvinSketch [18]. Experimental data in the presence of additives such as chenodeoxycholic acid were excluded. The structures were subsequently saved in the SMILES format. The overall format of the data records is shown in Table 1. The type of aggregation was determined based on difference between the solution phase and solid-state maxima given by: In order to provide a reasonable distinction between dyes that show low levels of aggregation from those that show large red/blue shifts, the values of ∆λ were grouped into three categories-hypsochromic, bathochromic and unchanged with respect to the following criteria: Hypsochromic otherwise The selected value is an arbitrary choice, but provides a simple scheme that enables easy distinction between H and J-aggregates. The value is also set to a small number to allow for limited experimental variation. Thus, an absorption shift of ±10 nm implies a low degree of molecular aggregation onto the nanoparticle surface, while larger values on either side suggest moderate to substantial hypsochromic/bathochromic aggregation. This categorization was further used to create machine learning models [10] that were able to predict the nature of the shift with good accuracy.

Data Analysis
The data set contains a total of 4035 entries, spanning 3685 unique dyes. While studies have largely focused on titania (TiO 2 ), other metal oxides such as ZnO and NiO have also been used (see Table 2). Spectroscopic properties show a significant solvent dependence. To account for the solvatochromic effects, the data set collates information on the different solvents used for spectroscopic measurements. A total 37 solvents that include 18 pure solvents and 19 solvent mixtures have been studied experimentally. Table 3 gives a solvent-wise distribution of the dyes tested. For reasons of simplicity and readability, we do not make any distinction between mixtures containing the same solvents in different volume ratios. The sensitization solvents are often chosen based on solubility of the dyes. In general, poorly soluble dyes can exhibit aggregation [19]. Solvent polarity can exert considerable on aggregation. For example, high polarity solvents may cause a red shift of the absorption peaks [6,10,20,21]. Figure 1 provides an overview of the impact of polarity on aggregation. In the case of solvents such as N-Methyl-2-Pyrrolidone and acetic acid, only single entries were available. For N-Methyl-2-Pyrrolidone, the spectra was hypsochromically shifted while for acetic acid negligible change with respect to titania was observed.   In order to uncover frequently occurring moieties, we used the MolBlocks [22] software where the minimum size for a fragment was set to four atoms. A total of 2126 fragments were generated by applying RECAP [23] rules. The fragments include a number of donor scaffolds such as triphenylamine, phenothiazine, phenoxazine, bodipy, ruthenium, carbazole, porphyrin, julolidine, indoline, coumarin and functional groups that include various benzene derivatives (phenol, anisole, aniline, naphthalene, anthracene), anchoring/acceptor moieties (cyanoacrylic acid, benzoic acid, 2-cyanoprop-2-enoic acid, malononitrile), alkyl chains of varying lengths (butyl, hexyl, octyl, dodecyl) and conjugation groups such as pyrene, quinoxaline, thiophene, furan, pyrrole, pyridopyrazine [24]. Selected fragments that are found in many dyes are shown in Figure 2.
Among these fragments, we focused on the aggregation behaviour of the different donor classes and anchoring groups. The pie charts shown in Figure 3A,B summarize the data in terms of the dye class and type of anchoring groups, respectively. While zinc porphyrin and ruthenium dyes show a lower tendency to aggregate (as seen by the relatively small proportion of dyes showing hypsochromic/bathochromic shifts), metal free dyes clearly show a higher tendency to aggregate. Further examination of Figure 3A shows that dyes containing imidazole form J-aggregates while other dye classes tend to exhibit both blue and red shifts. Analysis of the anchoring groups (see Figure 3B) does not suggest a clear explanation of the behaviour, echoing what was seen for the donors. Nonetheless, the dyes containing catechol show a significant propensity for J-aggregation which may be attributed to the fact that these groups bind more strongly to the metal oxide than other anchoring groups [25].

Usage Notes
The data set can be queried using an online search utility (see Figure 4) that can be accessed at https://vvishwesh.github.io/dyeaggregation. Search queries can be based on solvent, class of the dye, anchoring group, metal oxide type (TiO 2 /ZnO/NiO/Al 2 O 3 ), and absorption maxima range. Alternatively, a structure/substructure-based search can be performed. The comparison and search for molecule structures on web browsers has been enable using using the open source JavaScript library Kekule.js [26]. The JavaScript Molecule Editor [27] (JSME) is used to provide interactive editing of molecules. The results are displayed as a table with molecular structures displayed using the SmilesDrawer [28] JavaScript library and can be saved as a tab-separated file for future use.

Conflicts of Interest:
The authors declare no conflict of interest.