Sampling CASE Application for the Quality Control of Published Natural Product Structures

Structure elucidation with NMR correlation data is dicey, as there is no way to tell how ambiguous the data set is and how reliably it will define a constitution. Many different software tools for computer assisted structure elucidation (CASE) have become available over the past decades, all of which could ensure a better quality of the elucidation process, but their use is still not common. Since 2011, WebCocon has integrated the possibility to generate theoretical NMR correlation data, starting from an existing structural proposal, allowing this theoretical data then to be used for CASE. Now, WebCocon can also read the recently presented NMReDATA format, allowing for uncomplicated access to CASE with experimental data. With these capabilities, WebCocon presents itself as an easily accessible Web-Tool for the quality control of proposed new natural products. Results of this application to several molecules from literature are shown and demonstrate how CASE can contribute to improve the reliability of Structure elucidation with NMR correlation data.


Introduction
Software tools for computer assisted structure elucidation (CASE) of small molecules have been under development for over 50 years now. The methods used can roughly be divided into stochastic (S), deterministic (D), and hybrid (H), with hybrid representing various different combinations of stochastic and deterministic methods. The most prominent methods are fragment assemblers (H) [1-6], expert systems (H) [7][8][9], databases of 13 C NMR chemical shifts or neural networks (S) [10][11][12][13], structure generation by reduction (S/H) [14], logic engines (D) [15], stochastic structure generators (S) [16], combinatorial brute force (D) [17][18][19][20][21], combinatorial brute force with restraints (D/H) [22,23], genetic algorithms (S) [24][25][26], simulated annealing (S) [27], convergent structure generation (S) [28,29], evolutionary algorithm (S) [30], fuzzy structure generation (S) [31], and expert systems with DFT (H) [32]. Altogether, 14 different methods were implemented in more than 20 different tools, some of which are freely usable, and many of them have not seen further development by their authors after the first publication. It can be observed that, out of the eight stochastic methods, only database methods are still actively developed, whereas both purely deterministic methods see ongoing development. However, the most active development happens for some of the hybrid methods, that combine stochastic and deterministic elements in various ways. Despite of all these efforts, none of these tools has become popular enough to be used for quality control of published structures by a wider community of researchers or journal editors. The 13 C chemical shift-based tools have an outstanding position in this list, as they do not take NMR correlation data into account. They can either map chemical shifts to chemical neighborhoods, which in turn are assembled into a proposed constitution, or determine chemical neighborhoods for each carbon of a proposed constitution and predict a chemical shift. One such free to use tool is CSEARCH [10,13,33], which allows for an automated verification of 13 C chemical shift assignments for a proposed structure combining a database and a neural network verification. This approach might fail if few similar structures are available, or if incorrect data differentiation normally is not possible, the incorporation of this data is more difficult. The use of CASE for the interpretation is recommended for this ambiguous data. Other experiments have been developed over the years to provide additional information about a molecule's skeleton, most prominently the 2,3 J HN -HMBC, 1,1-ADEQUATE, and H2BC. The 2,3 J HN -HMBC is similar to the carbon version described before but correlates to a nitrogen. The 1,1-ADEQUATE [22,34,35,46,47] and H2BC [48][49][50] are experiments that provide a 2 J HC -HMBC type of correlation and can be used complimentary to the aforementioned 2,3 J HC -HMBC data. Theoretical 2,3 J HN -HMBC and 1,1-ADEQUATE correlations were not used in this analysis because this data is not routinely acquired and, therefore, also not included into publications.
WEBCOCON is a web-based interface to the program COCON. Initially, WEBCOCON helps in the creation of the input file for COCON, offering different functionalities for this purpose. With this input file, COCON creates a list of all constitutions that are compatible with the provided correlation data. These constitutions are ranked on the server by different methods and then transferred back to the web interface for visualization. Alternatively, the suggested constitutions can be downloaded in a SDF file and inspected offline by the user.

Results
The results resumed in Table 1 are just samples from the literature. Further automation in the ranking process will streamline the analysis for many more molecules.

Luteolin 8-C-E-Propenoic Acid
Luteolin 8-C-E-propenoic acid [51] was published as molecule 1-A from Figure 1. WEBCOCON suggested 25 constitutions with the theoretical NMR correlation data set, including the published molecule. Out of those, only 6 underwent further inspection, as the others contained very restrained ring systems. The force field total energies calculated strongly favor molecules 1-A and 1-B.
The distinction of 1-A from the other suggested constitutions can be done using 13 C chemical shifts of selected atoms, when comparing experimental data with back-calculated data in Table 2. The back-calculated 13

Tomentodiplacone
Tomentodiplacone [52] was published as molecule 2-A from Figure 2. WEBCOCON suggested 2 constitutions with the theoretical NMR correlation data set, including the published molecule. The force field total energy calculated for both molecules is almost equal. The distinction of 2-A from the other suggested constitution can be done using 13 C chemical shifts of selected atoms. The average deviation of the back-calculated chemical shifts from the experimental values is considerably smaller for 2-A, as shown in Table 3. The back-calculated 13 C chemical shift values for all carbons of all verified constitutions are in Figure A2.

Kadangustin A
Kadangustin A [53] was published as molecule 3-A from Figure 3. WEBCOCON suggested 13 constitutions with the theoretical NMR correlation data set, 5 of which are shown here, including the published molecule. The force field total energy calculated for these 5 molecules is very similar. The distinction of 3-A from the other suggested constitutions is possible using 13 C chemical shifts of some selected atoms, when comparing experimental data with backcalculated data in Table 4 using the average deviation (∆). However, the standard deviation (σ) and range ( ∆ max ∆ min ) clearly favor 3-D as solution, in spite of the higher total energy. This case should be verified in more detail. The back-calculated 13 C chemical shift values for all carbons of all verified constitutions are in Figure A3. Table 4. Comparison of experimental and back-calculated 13 C chemical shifts for kadangustin A (3-A) and the 4 suggested alternative constitutions. ∆ is the average deviation, σ is the standard deviation of the chemical shifts, and ∆ max ∆ min is the range of the deviations of the calculated chemical shifts to the experimental values. E total is the total force field energy calculated in kcal/mol for the constitutions. Exp.

Berkeleyamide D
Berkeleyamide D [54] was published as molecule 4-A from Figure 4. WEBCOCON suggested 2 constitutions with the theoretical NMR correlation data set, including the published molecule. The force field total energy calculated for both molecules clearly favors 4-B. The confirmation of 4-A as correct solution is also not possible using 13 C and 1 H chemical shift values of selected atoms, as the average deviation is smaller for 4-B, as shown in Table 5. The back-calculated 13 C and 1 H chemical shift values using CS Chemdraw for all atoms of both verified constitutions are in Figures A4 and A5, respectively. Table 5. Comparison of experimental and back-calculated 13 C and 1 H chemical shifts for berkeleyamide D (4-A) and the suggested alternative constitution. ∆ is the average of the deviations of the calculated to the experimental values. E total is the total force field energy calculated in kcal/mol for the constitutions. Deviations 10 ppm are labeled with "*". Although the average deviations shown in Table 5 indicate that 4-B matches the experimental chemical shift values better, for both suggested constitutions, there is one carbon with a chemical shift deviation 10 ppm (carbon 8 for 4-A and carbon 11 for 4-B). Therefore, all 13 C chemical shift values back-calculated by CS Chemdraw (M-I) and CSEARCH (M-II) for all carbon atoms were included in a comparison in Table 6. Table 6. Comparison of experimental and back-calculated 13 C chemical shifts for all carbons in berkeleyamide D (4-A) and the suggested alternative constitution, using CS Chemdraw (M-I) and CSEARCH (M-II). σ is the standard deviation and ∆ max ∆ min is the range of the deviations between the back-calculated and experimental value.  is very high, indicating that neither one of the constitutions might be correct. An analysis with experimental correlation data would be be the next step, followed by a new verification of back-calculated carbon chemical shift values for all alternatives. If the NMR data was available as NMReDATA, this would be straightforward.

5-A 5-B
Figure 5. 14-norpseurotin A (5-A) and the alternative constitution suggested by WEBCOCON. 13 C and 1 H chemical shifts were back-calculated for the atoms numbered in the molecules.
The distinction of 5-A from 5-B is also not possible using 13 C chemical shifts, when comparing experimental data with back-calculated data for selected atoms, as can be seen in Table 7. Actually, for 5-B the back-calculated 13 C and 1 H chemical shifts match the experimental values better, as can be verified by the average deviation. This case should be verified in more detail. The back-calculated 13 C and 1 H chemical shift values for all verified constitutions are in Figures A6 and A7, respectively. Table 7. Comparison of experimental and back-calculated 13 C and 1 H chemical shifts for 14norpseurotin A (5-A) and the suggested alternative constitution. ∆ is the average of the deviations of the calculated to the experimental values. E total is the total force field energy calculated in kcal/mol for the constitutions.

Feruloylpodospermic Acid A
Feruloylpodospermic acid A [56] was published as molecule 6-A from Figure 6. WEB-COCON suggested 2 constitutions with the theoretical NMR correlation data set, including the published molecule. The force field total energy calculated for both molecules slightly favors 6-B. . Feruloylpodospermic acid A (6-A) and the alternative constitution suggested by WEBCO-CON. 13 C chemical shifts were back-calculated for the atoms numbered in the molecules.
A comparison of the experimental and back-calculated 13 C chemical shifts for the 5 carbons numbered in Figure 6 is shown in Table 8. The observed average deviation also slightly favors 6-B. This case should be verified in more detail. The back-calculated 13 C chemical shift values for all carbons of all verified constitutions are in Figure A8.

Cochinchistemoninone
Cochinchistemoninone [57] was published as molecule 7-A from Figure 7. WEBCO-CON suggested 3 constitutions with the theoretical NMR correlation data set, including the published molecule. The force field total energy calculated for the molecules favors 7-C.  . Cochinchistemoninone (7-A) and the alternative constitution suggested by WEBCOCON. 13 C chemical shifts were back-calculated for the atoms numbered in the molecules.

7-C 7-A
A comparison of the experimental and back-calculated 13 C chemical shifts for the 5 carbons numbered in Figure 7 is shown in Table 9. The observed average deviation favors 7-A as correct solution, but, in all suggested constitutions, the experimental value of the chemical shift of carbon 2 is considerably different from the back-calculated value. This case should be verified in more detail. The back-calculated 13 C chemical shift values for all carbons of all verified constitutions are in Figure A9. Table 9. Comparison of experimental and back-calculated 13 C chemical shifts for cochinchistemoninone (7-A) and the 2 suggested alternative constitutions. ∆ is the average of the deviations of the calculated to the experimental values. E total is the total force field energy calculated in kcal/mol for the constitutions.

Milicifoline B
Milicifoline B [58] was published as molecule 8-A from Figure 8. When using theoretical NMR correlation data from 8-A, WEBCOCON suggested 3 constitutions, including the published molecule. The force field total energy calculated for the three molecules is very similar.  In their publication, the authors used 3 NOEs (see Figure 9) in order to decide between the two possible constitutions, so this information was also included into WEBCOCON. The distances measured after the WEBCOCON run are shown in Table 11, and they favor the originally published constitution. However, the distances measured in the calculated molecules are all > 500 pm and, therefore, at the limit of measurable NOEs [59]. Due to this, no reliable decision between the two suggested constitutions can be made. 2.9. 5α-Cyprinol Sulfate 5α-cyprinol sulfate [60] was published as molecule 9-A from Figure 10. When using theoretical NMR correlation data from 9-A, WEBCOCON suggested only one solution. However, for this molecule, experimental data is available in the NMReDATA format. Thus, the WEBCOCON analysis was repeated using the published experimental data, resulting in 4 suggested constitutions, including the published molecule. The force field total energy calculated for the four molecules is very similar. All constitutions have the same skeleton; they differ only in the positioning of the sulfate group. Hence, 13 C chemical shifts are expected to vary mainly for the carbons adjacent to the sulfate group. The experimental and back-calculated 13 C chemical shifts for these 4 carbons of all constitutions are shown in Table 12. The calculated average deviation over the 4 carbons for the suggested constitutions clearly favors the constitution 9-A, matching the publication. The back-calculated 13 C chemical shift values for all carbons of all verified constitutions are in Figure A11.

Viridiol aka TAEMC161
Viridiol (10-C) was originally published in 1969, as derivative of viridin [61]. The crystal structure was published in 2013 [62], and, in recent years, the total synthesis has become available [63][64][65][66]. In 2000, TAEMC161 was isolated as metabolite from Trichoderma hamatum, and 10-B was suggested as structure [67]. Shortly thereafter, the structure was revised as being identical to viridiol [68]. In 2010, the structure revision was reviewed by the use of CASE software, that confirmed previous findings [69]. The revisions were based mainly on the comparison of experimental and calculated carbon chemical shifts, using DFT and database methods.
The theoretical correlation data-based analysis of TAEMC161 reveals that this constitution could be described unambiguously by NMR correlation data. The same analysis with viridiol results in the 3 constitutions with similar total energies shown in Figure 11 and three more results with complex bridged ring systems that are not shown, with much higher total energies. From these 3 constitutions, 10-B is the originally published metabolite TAEMC161, meaning that NMR correlation data cannot distinguish between viridiol and TAEMC161. Thus, other means of distinction need to be used. For this analysis, we have performed carbon chemical shift calculation for all atoms in all 3 constitutions using several methods: M-I was CSEARCH, M-II was CS Chemdraw, M-IIIa DFT with ORCA v5.0.1 using the conformation obtained by WEBCOCON and M-IIIb DFT with ORCA v5.0.1 using a DFT optimized conformation. The results are shown in Table 13 for 10-A, Table 14 for TAEMC161 (10-B), and Table 15 for viridiol (10-C).
The resulting RMSD values for the chemical shift deviations observed for the different methods and constitutions show that the empirical methods outperform the DFT method in all conditions, as has already been observed previously [70,71]. In this case, 10-B and 10-C cannot be distinguished by DFT, whereas both empirical methods clearly favor 10-C as solution. Thus, the analysis confirms the published revisions but also proves that correlations data of viridiol are alone are not enough for a distinction of 10-B and 10-C.

Discussion & Conclusions
CASE revealed more than one solution for all molecules discussed here. Since the suggested constitutions are compatible with the theoretical NMR correlation data set of the originally published structure, other means of identification of the correct constitution had to be explored. First, an inspection with "a chemists eye" of the results was carried out, supported by force field total energies, as WEBCOCON might have generated structures that are not likely to exist. In one case, the inclusion of NOE data from the original publication favors one of the possible solutions, but was not enough for a decision. Secondly, calculation of 13 C NMR chemical shifts for the suggested constitutions was carried out and compared to the experimental data. However, for most examples, this was also not enough for a final decision.
The obtained results suggest that many of the new structures published using NMR correlation data could not be described unambiguously by theoretical NMR correlation data. Taking into account that the experimental data sets usually contain fewer data, even more new structures could be subjected to questions. Surveys of revised structures [32,69,[71][72][73] clearly show that improvements as to how new small molecules are published are urgently needed. The use any of the software tools for CASE available, together with the popularization of more comprehensive data formats, such as NMReDATA, will lead to an improved traceability of the structure elucidation process, as shown here for 5α-cyprinol sulfate. As it is, WEBCOCON already could easily be integrated into a quality control workflow.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

5-A 5-B
Figure A6. 14-norpseurotin A and the alternative constitution suggested by WEBCOCON with all back-calculated 13 C chemical shifts. The 13 C chemical shifts in red were used for the discussion.