A Data Resource for Sulfuric Acid Reactivity of Organic Chemicals

: We describe a dataset of the quantitative reactivity of organic chemicals with concentrated sulfuric acid. As well as being a key industrial chemical, sulfuric acid is of environmental and planetary importance. In the absence of measured reaction kinetics, the reaction rate of a chemical with sulfuric acid can be estimated from the reaction rate of structurally related chemicals. To allow an approximate prediction, we have collected 589 sets of kinetic data on the reaction of organic chemicals with sulfuric acid from 262 literature sources and used a functional group-based approach to build a model of how the functional groups would react in any sulfuric acid concentration from 60–100%, and between − 20 ◦ C and 100 ◦ C. The data set provides the original reference data and kinetic measurements, parameters, intermediate computation steps, and a set of ﬁrst-order rate constants for the functional groups across the range of conditions − 20 ◦ C–100 ◦ C and 60–100% sulfuric acid. The dataset will be useful for a range of studies in chemistry and atmospheric sciences where the reaction rate of a chemical with sulfuric acid is needed but has not been measured.


Summary
The chemistry of sulfuric acid is of practical importance to industry [1], as well as to atmospheric sciences and astronomy. The kinetics of reaction with sulfuric acid-whether a substance will react and how fast, under a wide range of conditions-is, therefore, of great practical importance. The reactivity of a molecule in sulfuric acid may be calculated exactly when the molecule's pKa is known (and so its ionization state in the acid of different concentrations is known), and the reaction mechanism is known (reviewed in [1][2][3][4]). However, these properties are rarely known unless the kinetics have already been experimentally measured for the compound of interest or a compound that is structurally closely related. Estimating the stability of new substances in sulfuric acid, therefore, must rely on comparison with diverse chemicals containing functional groups similar to the test molecule. The dataset presented in this paper provides a set of kinetic examples for such reactivity estimation and modeling from literature data and an extrapolated set of kinetic parameters based on the experimentally derived examples.
Data on the reactivity of chemicals in sulfuric acid is scattered in diverse literature, using diverse reporting methods and units. We compiled data on the reaction kinetics for 589 reactions in concentrated sulfuric acid from 262 literature sources and converted them Data 2021, 6, 24 2 of 10 to a dataset in a unified system of units. Literature searches were both from citations in books and papers on sulfuric acid chemistry and by keyword. In the case of phosphine, a model of reactivity was built from scattered literature hints to fill in a crucial part of chemical space, for reasons described below. The compiled data were then used to build a dataset of extrapolated reaction rates for a set of 130 functional groups liable to breakdown or solvolysis in concentrated sulfuric acid and a complementary set of reaction parameters for sulfonation of phenyl rings.
The data provided by the dataset can be used to give a pragmatically based indication of the stability of an arbitrary chemical in sulfuric acid. Detailed mechanistic studies (either ab initio quantum calculations of reaction paths or detailed experimental kinetic studies) would give much more accurate predictions but are extremely time-consuming. In their absence, we believe that the dataset presented in this paper will be useful for any researcher looking to explore the reactivity of chemicals in sulfuric acid for applications such as, but not limited to, stratospheric chemistry or planetary atmospheric modeling. The dataset will also have industrial applications, where sulfuric acid is a widely used reagent. In this regard, future work could extend this study to the reactivity of organic compounds with nitric acid and with nitric and sulfuric acid mixtures, which are widely used in nitration reactions [5].

Summary of Data
The dataset is a compilation of literature data on sulfuric acid reactivity of molecules, converted to unified data units. The data fall into three parts:
The derived kinetic constants are extrapolated from the literature data using the functional group definitions, as described below in Section 3.5.
The structure of the data is illustrated in Figure 1. Data on the reactivity of chemicals in sulfuric acid is scattered in diverse literature, using diverse reporting methods and units. We compiled data on the reaction kinetics for 589 reactions in concentrated sulfuric acid from 262 literature sources and converted them to a dataset in a unified system of units. Literature searches were both from citations in books and papers on sulfuric acid chemistry and by keyword. In the case of phosphine, a model of reactivity was built from scattered literature hints to fill in a crucial part of chemical space, for reasons described below. The compiled data were then used to build a dataset of extrapolated reaction rates for a set of 130 functional groups liable to breakdown or solvolysis in concentrated sulfuric acid and a complementary set of reaction parameters for sulfonation of phenyl rings.
The data provided by the dataset can be used to give a pragmatically based indication of the stability of an arbitrary chemical in sulfuric acid. Detailed mechanistic studies (either ab initio quantum calculations of reaction paths or detailed experimental kinetic studies) would give much more accurate predictions but are extremely time-consuming. In their absence, we believe that the dataset presented in this paper will be useful for any researcher looking to explore the reactivity of chemicals in sulfuric acid for applications such as, but not limited to, stratospheric chemistry or planetary atmospheric modeling. The dataset will also have industrial applications, where sulfuric acid is a widely used reagent. In this regard, future work could extend this study to the reactivity of organic compounds with nitric acid and with nitric and sulfuric acid mixtures, which are widely used in nitration reactions [5].

Summary of Data
The dataset is a compilation of literature data on sulfuric acid reactivity of molecules, converted to unified data units. The data fall into three parts:
The derived kinetic constants are extrapolated from the literature data using the functional group definitions, as described below in Section 3.5.
The structure of the data is illustrated in Figure 1. Bottom row-types of data and applications that can be taken from the current data set for uses as a literature resource, a resource for modeling or a lookup table for chemical stability. Bottom row-types of data and applications that can be taken from the current data set for uses as a literature resource, a resource for modeling or a lookup table for chemical stability. Throughout, we have used the simplified molecular-input line-entry system (SMILES) notation to code molecular structure [6] and its extended SMARTS search notation [7]. This widely used notation allows coding of molecular topology as text.

Summary Statistics
The original data comprise 589 reactions of 489 compounds identified from 262 literature references, most deriving from the period 1950-1980, but in total covering the period 1907-2019. The data were classified as to whether reactions were measured at a SINGLE acid concentration value or MULTIPLE acid concentrations, and whether they were measured at one temperature ("No_Q10"), multiple temperatures for one acid concentration ("SINGLE_Q10"), or multiple temperatures at multiple acid concentrations ("MULTIPLE_Q10"). The coverage of these combinations is summarized in Table 1. Measurements of the kinetics of reactions in sulfuric acid, classified by the data available. SINGLE = rate measured at a single sulfuric acid concentration. MULTIPLE = rate measured at multiple concentrations. NO_Q10 = rate measured at one temperature, SINGLE_Q10 = rate measured at multiple temperatures at one sulfuric acid concentration, MULTIPLE_Q10 = rate measured at multiple temperatures and at multiple acid concentrations.

Data Schema
The data are presented as an Excel file with multiple sheets; further files provide each sheet in a tab-separated variable (.TSV) format. All cells are free-form (i.e., not fixed length). All numbers are floating-point unless indicated otherwise. The data fields are summarized in Table 2.

Field Name Field Description
Sheet "Data"(in columns)

Literature Search for Data
The literature was searched for papers presenting kinetic data as follows: only papers that presented original rate measurements were considered; papers that presented summary kinetic equations or qualitative statements of rate were not included. Papers were identified (a) from citations in standard texts on sulfuric acid chemistry [1,2,4], (b) by search of Google Scholar using keywords "concentrated sulfuric" or "concentrated sulfuric" and "kinetics" or "rate of reaction", and (c) search of all papers cited in or citing papers found by approaches (a) and (b). Roughly 1500 papers were scanned, of which 262 yielded usable data.
All these papers reported the kinetic analysis of reaction of a compound containing a specific chemical group, here termed a functional group. Each molecule contained one and only one functional group. This is because the objective of all the papers was to study the mechanism of the reaction of that functional group. The molecule might contain other structures, but these were selected not to react under the conditions used. For example, the solvolysis of benzamides is studied in compounds that (by definition) contain phenyl rings [8,9], but the amide group is broken down many orders of magnitude faster than phenyl rings are sulfonated, and so in effect, the molecule can be considered as having only amide groups as a reacting moiety.
The data for the reactivity of phosphine were compiled differently. Because of the interest in phosphine in the acid clouds of Venus [10], the reactivity of this molecule was of particular interest. However, no systematic kinetic data exists for the reaction of phosphine with sulfuric acid, despite the industrial importance of this reaction [11]. Single reaction points were, therefore, identified in five sources [12][13][14][15][16] and matched to Q 10 , m and c values from Equation (2) below. This is not ideal, as these are semi-anecdotal reports gathered up to a century ago under a wide range of conditions and illustrates the need for more systematic measurement of some literature values.

Data Checking and Validation
The data within a paper were assessed for consistency by checking plots of rate vs. concentration with the paper's own conclusions and for a smooth rate: concentration relationship. Outliers or sudden changes in curve gradient suggested an error, often in transcription, but sometimes in the original publication (usually typographical errors, such as a missing minus sign). The data were validated by internal reference within the dataset as follows: During compilation of kinetic parameters (see below), the scatter of predicted rates for a class of compounds was plotted for three sets of conditions. Typically ranges of rates were 1-2 log units. Outliers with a greater difference from the mean than 2 log units were checked against the original paper for transcription errors.
No attempt was made to check a paper's underlying scientific rationale. Not only was this impractical, but it was also not in the spirit of the database, which is to report what the literature says, not to critique it.

Rate Units
Rates were assumed to be first order with respect to the target chemical, i.e., represented by a constant rate k with units of time −1 . This is true of almost all reactions studied; for those where second-order kinetics applies at higher concentrations (e.g., [17,18]), firstorder kinetics are approximated at low concentrations. If rate constants or raw data were reported in units other than second −1 , the rate was converted according to Equation (1) reported in time units other than seconds where: rate(second) = rate(units)/(seconds/unit), where the rate(second) is the rate in units of seconds −1 , rate(units) is the rate in some other time unit, and (seconds/unit) is the number of seconds in that time unit. Concentrations were converted to % acid units. Common concentration units were molar acid, molar water, and Hammett acidity Ho [1]. Specific tables for converting these units to % acid were constructed from [1,2,4,19,20], as the relationship between them is nonlinear. Lookup tables are provided in the sheet "Concentration Lookup" in the dataset.
For some measurements, a rate of reaction at just a single concentration of sulfuric acid is reported (measurements we have designated SINGLE). For others, reaction rates at multiple concentrations are reported; such measurements are often used to determine the reaction mechanism. Reactions with rates measured at several sulfuric acid concentrations were designated MULTIPLE. For MULTIPLE entries, rates were matched to Equation (2) Log(rate) = m * (conc.) + c where m (the gradient of the linear plot of log(rate) vs. acid concentration) and c (the intercept of that plot) are constants, the rate is the rate in the reported units (see above), and conc. is the concentration of sulfuric acid. For MULTIPLE data entries, m and c were recorded in the dataset. A log-linear relationship was found to give a reasonable match to almost every data set for concentrations between 60% and 98%. Below 60%, some (but not all) reactions show substantially nonlinear kinetics with acid concentration due to competing reaction mechanisms. Above 98% species, such as SO 3 and H 2 S 2 O 7, become significant players in reaction chemistry [21], Hammett acidity becomes highly nonlinear with concentration [1], and this simple formalism is less reliable.

Calculation of Q 10 Values
Reactions invariably proceed faster at higher temperatures, but the degree to which a reaction is accelerated by an increase in temperature depends on the activation energy of the reaction, which itself is dependent on the mechanism and is not readily inferred. The rate at which a reaction increases in rate is conveniently parameterized as a Q 10 value-the multiple by which a reaction rate increases for a 10 • C increase in temperature. A wide range of temperature data was converted to a uniform Q 10 parameter for the 83 reports that included data on rates at different temperatures. Q 10 was calculated from the reports of the rate at a different temperature as follows: For a table of rate values and temperatures, starting with the lowest temperature, the gradient of a plot of (log(rate/rate o ) vs. (T − T o )/10 gave as gradient log(Q 10 ), where the rate is the rate at temperature T, and rate o is the rate at temperature To.
Some papers only allowed the calculation of Q 10 for a single concentration of sulfuric acid (measurements designated SINGLE_Q10). This was reported in the database, together with the concentration at which it was measured. Others allowed calculations of Q 10 for several acid concentrations (measurements, which were designated MULTIPLE_Q10). For these, the value of Q 10 was plotted as a function of acid concentration, and the data set records the gradient and intercept of a straight line least-squares match through those data to form Equation (3): Note that the majority of the entries had no Q 10 data, as shown in Table 1 (By contrast, all entries, by definition, had kinetic rate data).

Inference of Unmeasured Kinetic Parameters for Solvolysis Reactions
The ideal would be to have MULTIPLE data and MULTIPLE_Q10 for all entries. As shown in Table 1, this is not the case, and the data set needed to predict the reaction of a chemical at an arbitrary temperature and acid concentration is largely missing. For a complete picture of sulfuric acid reactivity, the missing rate values must be reconstructed from the single measured values and the typical rate at which the rate of reaction changes with acid concentration for that class of chemicals. This was achieved as follows: The pipeline is summarized in Figure 2. complete picture of sulfuric acid reactivity, the missing rate values must be reconstructed from the single measured values and the typical rate at which the rate of reaction changes with acid concentration for that class of chemicals. This was achieved as follows: The pipeline is summarized in Figure 2. Reactions that resulted in solvolysis or other breakdowns of the molecule (for example, hydrolysis of esters, oxidation of thiols, dehydration of alcohols) were treated differently from reactions involving sulfonation (adding an SO3 group to a molecule). Solvolysis reactions have kinetics that depends on the nature of the group being affected, and there is a wide range of possible groups. Sulfonations, by contrast, happens almost exclusively on phenyl rings (for reasons discussed below) and hence have more uniform chemistry.
Solvolysis reactions are predicted on the basis of functional groups. A functional group is defined as a small group of atoms with characteristic chemical reactivity. It is expected that each functional group would have a characteristic sensitivity to attack by sulfuric acid and that molecules containing a given functional group would have similar mechanisms of reaction with sulfuric acid. Reactions that resulted in solvolysis or other breakdowns of the molecule (for example, hydrolysis of esters, oxidation of thiols, dehydration of alcohols) were treated differently from reactions involving sulfonation (adding an SO 3 group to a molecule). Solvolysis reactions have kinetics that depends on the nature of the group being affected, and there is a wide range of possible groups. Sulfonations, by contrast, happens almost exclusively on phenyl rings (for reasons discussed below) and hence have more uniform chemistry.
Solvolysis reactions are predicted on the basis of functional groups. A functional group is defined as a small group of atoms with characteristic chemical reactivity. It is expected that each functional group would have a characteristic sensitivity to attack by sulfuric acid and that molecules containing a given functional group would have similar mechanisms of reaction with sulfuric acid.
Each of the compounds whose reactions were characterized was assigned to a functional group on the basis of the group that was being affected by sulfuric acid. As noted above, each compound contains only one functional group that is relevant for the conditions of the experiment, and hence each compound could be uniquely associated with one and only one functional group. Two sets of average properties for functional groups were then assigned: A mean gradient for Q 10 as a function of acid concentration for each functional group. If a functional group has no MULTIPLE_Q10 values, then the mean of the gradient of Q 10 with acid concentration for the entire dataset is used instead. The result is designated GROUP_Q10_DEFAULT_GRADIENT.
The following algorithm was then used to compile a set of reaction rates for each of the functional groups 1.
Fill in a complete set of parameters for Equations (1) and (2)  The constant d in Equation (3) was calculated according to Equation (5): where Measured_Q10 is the single measured value for Q10, and concentration is the concentration of acid at which it was measured e.
For No_Q10 values (i.e., measurements where no data on the dependence on rate with temperature was published) i. Gradient n in Equation (3) where To is the temperature at which that reaction was originally measured, T is the temperature we wish to predict, acid is the concentration of acid in which we wish to predict reaction rate, in units of %, m, c, n and d are the parameters derived above, and k is the rate constant in seconds −1 . The selection of T and acid is arbitrary, but to build a systematic table in which values can easily be looked up, a matrix of temperatures from −20 • C to 100 • C and 60% acid to 100% acid was used.

3.
Average rates for all reactions in each functional group, to make an average rate for that functional group The final output of the algorithm is a table of reaction rates for each of the functional groups for the whole matrix of conditions modeled in Equation (6). This matrix is provided in the sheet "Group Rates Matrix" as the Log 10 (rate constant)

Inference of Unmeasured Kinetic Parameters for Sulfonation Reactions
Unlike solvolysis reactions, sulfonation reactions are effectively the chemistry of phenyl rings. Sulfonation only occurs in concentrated sulfuric acid and at elevated temperatures. A range of reagents can sulfonate other groups and can sulfonate phenyl at lower temperatures [2], but this chemistry is not relevant to the chemistry of sulfuric acid itself. Thus, groups that are less stable to solvolysis than phenyl rings will be attacked and degraded before they are sulfonated (e.g., furan rings), and nitrogen-containing aromatics are resistant to any sulfuric acid chemistry and can only be sulfonated by more potent sulfonating agents, such as "oleum" (solutions of SO 3 in pure sulfuric acid) [22,23].
For phenyl rings, sulfonation is accelerated or retarded depending on the electronic effects of substituents on the rings. Sulfonation rates for a phenyl ring with arbitrary substituents were calculated as follows: A base rate for sulfonation of the phenyl ring was obtained directly from experimental data, as described above. Sulfonation for a substituted phenyl ring was estimated as where R p is the rate of sulfonation of benzene, S i is the multiplicative factor for substituent i, and z is the number of i substituents on the ring. Multiplicative factors for the addition of substituents to the ring were then estimated using a simulated annealing algorithm [9] to minimize the difference between predicted values and experimental values for the range of substituted phenyls present in the original experimental dataset under the same matrix of conditions described in Section 3.5, algorithm step 2 above.
The output of the algorithm is a table of "base rates" of reaction for unmodified benzene at the same selected temperatures and acid concentrations as for the solvolysis reactions in Section 3.5 above, and a set of S z i parameters in the sheet "group defaults".

User Notes
The dataset is available for downloading from zenodo.org as an Excel 2010 spreadsheet and as separate TSV files representing each of the Excel sheets, which, together with a "readme" file, are presented as a *.zip archive. We suggest that there are three ways the data could be used.

•
Lookup of reaction rates for specific functional groups, using tables in the sheets "group defaults" (for sulfonation) and "group rates matrix" (for solvolysis) for a specific set of conditions; • Lookup interpolated kinetic data from the sheet "data" to build a customized database of rates, either for a customized set of conditions or using a different averaging schema; • Use the primary data from the datasheet to build a new algorithm for "filling in" missing data and/or trace original data from the references listed.