Analysis of the Equilibrium Distribution of Ligands in Heterogeneous Media–Approaches and Pitfalls

The equilibrium distribution of small molecules (ligands) between binding agents in heterogeneous media is an important property that determines their activity. Heterogeneous systems containing proteins and lipid membranes are particularly relevant due to their prevalence in biological systems, and their importance to ligand distribution, which, in turn, is crucial to ligand’s availability and biological activity. In this work, we review several approaches and formalisms for the analysis of the equilibrium distribution of ligands in the presence of proteins, lipid membranes, or both. Special attention is given to common pitfalls in the analysis, with the establishment of the validity limits for the distinct approaches. Due to its widespread use, special attention is given to the characterization of ligand binding through the analysis of Stern–Volmer plots of protein fluorescence quenching. Systems of increasing complexity are considered, from proteins with single to multiple binding sites, from ligands interacting with proteins only to biomembranes containing lipid bilayers and membrane proteins. A new formalism is proposed, in which ligand binding is treated as a partition process, while considering the saturation of protein binding sites. This formalism is particularly useful for the characterization of interaction with membrane proteins.


Introduction
Biological systems are very heterogeneous media, with macromolecules or supramolecular aggregates dispersed in an aqueous medium. The hydrophobic effect is the major force that drives the formation of these structures [1,2] where the dispersed structures are characterized by a polar or ionic shell and a nonpolar core. Small molecules distribute between the aqueous phase and the non-aqueous structures as a result of the established interactions. The hydrophobic effect is the major force for non-specific binding, while polar interactions (type, configuration and geometry) are responsible for interaction specificity.
( 3) In the case of association between two entities, it may be easily identified that the physically meaningful solution is always x − , because x + is higher than [B T ] and/or [L T ].
Where not evident, the physically meaningful solution may be decided by calculating x ± over a large range of parameter and concentration values, with the constraints that it must be >0, <[B T ] and <[L T ]. When the protein saturation is experimentally followed, the concentration of ligand is the independent variable, and the association affinity (K b or K d ) may be found through the best fit of the correct solution of the quadratic equation to the experimental results. The solution of Equation (3) is valid in all conditions, low-or high-affinity binding, comparable or dissimilar concentrations of ligand and protein. Figure 1 shows the dependence of the protein saturation, with [L T ] (top plots) and with the [L T ]/[B T ] ratio (bottom plots) for several values of the association affinity. As the affinity of the ligand to the protein increases (higher K b or lower K d ), the protein becomes saturated at lower ligand concentrations and at lower ratios of total ligand to proteinbinding sites. The representation of the x axis in the logarithmic scale (plot B and D), which is proportional to the natural thermodynamic variable (the chemical potential), facilitates the interpretation of the results. For high-affinity binding, the protein is fully saturated for In the previous figures it was considered that protein saturation was being experimentally followed. If a property of the ligand is monitored instead (e.g., ligand fluorescence), refs. [29,[57][58][59][60] the dependent variable is now the fraction of ligand bound, and Note that the asymmetry of the isotherms obtained for high-affinity binding is due to the representation as a function of [L T ]. If the concentration of free ligand is used instead, symmetric isotherms are always obtained, as shown in Figure S1. However, the free ligand concentration is not a known parameter, as it depends on the unknown binding affinity. Thus, to evaluate the adequacy of the conditions regarding the characterization of the binding affinity, one has to consider the known total ligand concentration. From Figure 1 one notes that the binding isotherms for the two highest affinities considered (red and black) are mostly overlapped. This shows that for the total concentration of protein considered in those simulations (10 µM), the highest affinity that may be accurately characterized is K b = 10 7 M −1 (K d = 0.1 µM). For the characterization of higher affinity binding, a lower protein concentration must be used. Illustrating this point, simulations for 1 µM total protein concentration ( Figure S2) show that the isotherms for K d = 0.1 µM and 0.01 µM no longer fully overlap in these conditions, allowing their experimental discrimination. The upper limit for accurate estimation of the binding affinity is now K d = 0.01 µM (note the overlap of black and grey symbols in Figure S2A).
In the previous figures it was considered that protein saturation was being experimentally followed. If a property of the ligand is monitored instead (e.g., ligand fluorescence), refs. [29,[57][58][59][60] the dependent variable is now the fraction of ligand bound, and the total concentration of protein is the independent variable. The concentration of ligand bound is again calculated from the x − solution of Equation (3), from which its fraction may be calculated.
The fraction of bound ligand for several values of the association affinity is shown in Figure 2, as a function of the concentration of binding sites. As expected, the curves are equivalent to those shown in Figure 1 when saturation of the protein is being followed. In fact, the variations observed in the experimentally followed variable (∆S) are proportional to [L B ], which is the same for a given set of K d , [B T ] and [L T ], Equation (4). The normalization of ∆S by its limit value (∆S max corresponding to full protein saturation) leads exactly to the same curve shape as when a property of the protein or ligand is being followed, Equation (5).

Approximations Commonly Used
Although the analysis methods currently available allow the easy use of the full equ tion, the use of simplified equations to quantitatively characterize the association betwe ligands and proteins is still frequent. The two most common approximations are p

Approximations Commonly Used
Although the analysis methods currently available allow the easy use of the full equation, the use of simplified equations to quantitatively characterize the association between ligands and proteins is still frequent. The two most common approximations are presented below, with a critical analysis of their validity.
When protein saturation is followed, it is commonly assumed that the ligand is in large excess ([L T ] [B T ]). Under those conditions (corresponding to negligible ligand depletion due to binding), Equation (3) simplifies to the classical hyperbolic equation: This approximation of excess ligand was originally considered in the context of enzyme kinetics [61], where for in vitro studies the approximation is usually valid because the enzyme is in very low concentrations. However, as shown below, when studying the association between ligands and proteins this approximation may easily become inapplicable and lead to incorrect estimations of the binding affinity.
When binding is followed through a ligand property, it is the fraction of bound ligand that is accessed. The total concentration of protein is usually the independent variable that is varied until nearly all ligand is bound. If it may be considered that the protein is in large excess ([B T ] [L T ]), the general equation simplifies to: Figure 3 (plot A) represents the real saturation of the protein (calculated from Equation (3), circles), and that predicted when assuming excess ligand (continuous lines, Equation (6)). The concentration of binding sites was always 10 µM, and ligand concentration was varied between 10 −1 and 10 4 µM. For the lower affinity considered (K d =100 µM), half saturation occurs at a ligand concentration close to 100 µM, where the condition of excess ligand is a good approximation. The best fit of Equation (6) leads to K d = 105 µM, differing by only 5% from the true value. However, as expected, higher affinities lead to significant binding at ligand concentrations comparable to or smaller than the concentration of binding sites. Under those conditions, the approximation of excess ligand is not valid, and the best fit of Equation (6) leads to incorrect parameter estimations. For K d = 10 µM this parameter is overestimated by 50%, and the approximation is totally inappropriate for lower dissociation constants (higher affinities). This inadequacy results from the fact that the most relevant part of the saturation curve for parameter estimation occurs for ligand concentrations that are lower than the concentration of binding sites, violating the assumption of excess ligand. An equivalent behaviour is obtained when the approximation of excess protein is used ( Figure S3). In this case, the concentration of binding sites is the independent variable and the fractional binding of ligand is the dependent one.
It should be noted that the general equations for ligand binding to proteins with a single type of binding sites, Equation (1), are valid at all ligand and protein concentrations, for low-and high-affinity binding. The limits of validity of Equations (6) and (7) arise from the substitution of the concentration of free species (ligand or binding sites) with their total concentration. The inadequacy of the approximations of excess ligand or protein is well-known and has been repeatedly highlighted in the literature [61,62]. However, although more careful researchers and specialized software take this into consideration, refs. [47,[63][64][65][66] it is still frequently overlooked. the substitution of the concentration of free species (ligand or binding sites) with th total concentration. The inadequacy of the approximations of excess ligand or protei well-known and has been repeatedly highlighted in the literature [61,62]. However, hough more careful researchers and specialized software take this into consideration, r [47,[63][64][65][66] it is still frequently overlooked.  (6) to the saturation of binding sites observed (note the logarithmic s in plot (B) and linear scale in plot (C)). The dissociation constants obtained from the best fit of Eq tion (6) are indicated in plot (B).

Formalism-Microscopic and Macroscopic Association Constants
A commonly encountered difficulty when studying protein/ligand association occ when the number of binding sites per protein is not well-characterized. If protein satu tion may be achieved and followed quantitatively, the macroscopic binding associati may be obtained and from their dependence on protein saturation some conclusions m be drawn regarding the number of binding sites, their similarity and independence some conditions, one may also obtain information regarding the microscopic binding sociation to the various binding sites and the degree of cooperativity. Excellent revie are available in the literature for the full and accurate analysis of protein saturation for case of multiple binding sites (e.g., [67]). The cases of three independent binding s (equal or different) will be briefly analysed below, calling attention to common proble when oversimplifying the analysis of the data.
There are three possibly distinct binding sites in the system (in this case all thre the same protein, but the same formalism applies if each is in a distinct protein beca they are independent), so binding is described by the following scheme:

Formalism-Microscopic and Macroscopic Association Constants
A commonly encountered difficulty when studying protein/ligand association occurs when the number of binding sites per protein is not well-characterized. If protein saturation may be achieved and followed quantitatively, the macroscopic binding associations may be obtained and from their dependence on protein saturation some conclusions may be drawn regarding the number of binding sites, their similarity and independence. In some conditions, one may also obtain information regarding the microscopic binding association to the various binding sites and the degree of cooperativity. Excellent reviews are available in the literature for the full and accurate analysis of protein saturation for the case of multiple binding sites (e.g., [67]). The cases of three independent binding sites (equal or different) will be briefly analysed below, calling attention to common problems when oversimplifying the analysis of the data.
There are three possibly distinct binding sites in the system (in this case all three in the same protein, but the same formalism applies if each is in a distinct protein because they are independent), so binding is described by the following scheme: (8) where B i and L B i correspond respectively to the free and occupied binding sites i, and K i b is the respective microscopic association constant, with i representing the distinct binding sites (I, II, or III). The association of the ligand with each binding site cannot be characterized independently. Instead, a general equation considering ligand association with all binding sites must be defined and solved. If the binding sites are equivalent and independent, the solution is simply to consider the total concentration of binding sites ([B] = B I + B II + B III ) in Equation (2) and use the solution of the quadratic equation as indicated above. However, if the binding sites are not equivalent, the system is described by a set of n + 2 equations where n is the number of binding sites, and must be solved numerically. If the concentration of protein is known and each protein contains the three distinct binding sites, protein saturation with ligand is best described by the macroscopic overall association constants β i , or the stepwise macroscopic association constants K i , defined in Equations (9) and (10), respectively.
Note that PL i represents a sub-ensemble of protein-ligand complexes with i ligands bound per protein. It does not contain information regarding the binding sites where the ligands are bound. For example, PL 1 represents the collection of protein molecules with only one ligand bound, which may be in site 1, 2 or 3. The relative fraction of ligand bound to each of the possible binding sites depends on the microscopic binding affinities (K i b ). The same holds for PL 2 and PL 3 , although in this case the relative fractions are not simply a result of the relative values of K i b but also of possible dependencies between the distinct binding sites.
The two sets of macroscopic association constants (overall and stepwise) are interrelated by: The set of mass balance equations that needs to be solved to calculate the concentration of free ligand in solution is: where L B x represents ligand bound to site x.
Protein saturation as a function of the concentration of free ligand is given by: where i is the average number of ligands bound per protein molecule. The ligand saturation is given by: which corresponds to the fraction of ligand molecules that is bound.
If the binding sites are independent, the stepwise macroscopic and the microscopic association constants are related by: The derivation of these equations, the relations between the stepwise macroscopic association constants for the case of dependent binding sites, and details regarding the numerical methods used to calculate the free ligand concentration may be found in the literature [4,46,62,67,68].

Characterization of the Binding Parameters from Protein Saturation Curves
The saturation of a protein with 3 independent binding sites is shown in Figure 4 with a high affinity for binding site I (K I b = 10 6 M −1 ), and distinct relations to the other binding sites to which the ligand binds with equal or lower affinity. The total concentration considered for the protein is 10 µM and the total ligand concentration is varied to cover the full range of protein saturation.   When the affinities for the distinct binding sites differ by 3 orders of magnitude (upper plots), saturation of the distinct sites occurs at very different ligand concentrations (plot A), and proteins associate with 1, 2 or 3 ligands at different ligand concentrations (plot B), leading to a saturation curve with well-defined steps (plot C). A similar behaviour is observed for binding affinities that differ by 2 orders of magnitude (plots D to F), although the stepwise saturation of protein with ligand becomes less pronounced, with extensive overlap of the ligand concentration range for formation of PL 1 , PL 2 , and PL 3 . This overlap is even more pronounced when the affinities differ by only one order of magnitude (plots G to I), in which case the saturation steps can no longer be observed in the protein saturation curve. The lower plots show the case where all 3 binding sites in the protein have equal affinity for the ligand. In this case, proteins with only 1 or 2 ligands have a relatively low abundance, and over a very limited ligand concentration range. In this case, the narrowness of the ligand concentration range where the proteins have intermediate numbers of ligands bound arises from the following two reasons. First, taking each binding site from 10 to 90% saturation involves nearly two orders of magnitude variation in [L W ], from 0.11 K d to 9 K d . When the binding affinities are very different, significant occupation of each binding site occurs at very distinct ligand concentrations, leading to a wide concentration range for protein saturation. In turn, for similar binding affinities, the concentration ranges of significant occupation with ligand overlap. Second, for similar binding affinities and intermediate values of the average protein saturation, it is equally probable that the ligand is associated with different binding sites in the same protein, or to the same or a different type of binding site in distinct proteins, thus decreasing the relative population of protein molecules with intermediate values of ligands bound.
The best fit of the saturation curves with a model considering an adjustable number of equal and independent binding sites per protein (solution x − of Equation (3)) is shown in the plots on the right. As expected, a perfect fit and the correct binding parameters (n = 3 and K b = 10 6 M −1 ) are obtained when the binding affinity for the distinct binding sites is the same (plot L). On the other extreme, if the distinct binding sites have well separated ligand affinities, this simplified model leads to clear misfits to the saturation curve, with no risk of accidental inadequate use (plots C and F). However, the full description of the protein saturation curve requires varying the ligand concentration over 8 orders of magnitude. Such a wide range of ligand concentrations is not usually evaluated experimentally. If only 2 or 3 orders of magnitude are considered, only the binding of the first ligand would be observed, leading to a good fit. For binding affinities differing by only an order of magnitude (plot I), the stepwise association is not evident in the shape of the saturation curve, which becomes reasonably well described by the simplified model considering equal and independent binding sites. The best fit leads to a similar value for the number of binding sites, but to an intermediate value of the association constant. The simulations shown in Figure 4 assume that the protein concentration is well known and that protein saturation may be quantitatively and directly followed. Obviously, the model that assumes one binding site per protein is not adequate to describe the saturation curve in this case (dashed grey lines).
The analysis of the simulations depicted in Figure 4 shows that when the number of ligands per protein may be directly obtained, the analysis of the results with a model considering a variable number of equal and independent binding sites is a good first option. If the model adequately describes the shape of the saturation curve (plots I and L in this case study), the association constant estimated corresponds to the microscopic association constant if the binding sites are equal and independent. If the binding affinity for the binding sites is different, the value obtained is close to the average value (within the same order of magnitude) of all association constants. If the model does not adequately describe the shape of the saturation curve, the values obtained are therefore devoid of physical meaning, and more complicated models must be used [56,[62][63][64][65][67][68][69][70]. The accurate description of the saturation curve permits obtainment of the global association constants, Equation (9), and the stepwise macroscopic association constants, Equation (10). If the binding sites are independent, the number of microscopic and macroscopic binding constants is the same and it may in principle be possible to obtain one set of constants from the other. However, if the binding sites are not independent, the number of microscopic binding constants exceeds the number of parameters that may be obtained from the protein saturation curves, and therefore cannot be quantitatively characterized. In this case, only some relations between them may be obtained, as well as possible indication of cooperativity for binding to the distinct binding sites.

Scatchard Plots and Evaluation of Cooperativity
The analysis of the protein saturation curve with complex models requires the use of complicated fitting procedures, including numerical methods to obtain the concentration of free ligand and protein, the equivalent to Equation (12) but adjusted to the number of binding sites. To avoid the laborious procedure of data analysis (or, in an attempt to complement the analysis), specific graphical representations of the data are usually performed to evaluate for the presence of cooperativity between the binding sites. Scatchard plots are a prominent example, where n P /[L W ] is represented as a function of n P . A linear dependence between both variables is expected for the case of a protein with n independent binding sites, and deviations from linearity are interpreted as indicative of cooperativity.
Before analysing the results obtained with Scatchard plots, some remarks must be made regarding this and other linearization procedures. A first big limitation is that, because the binding constants are still not known (the objective is to obtain them from this analysis), if the concentrations of bound or free ligand are not directly measured, [L W ] cannot be calculated. In these situations, [L T ] is commonly used, leading to incorrect interpretations. Another important concern is that these linearization procedures do not comply with the assumptions imposed by non-linear least-square regression analysis, e.g., that the x-axis must be an independent variable with negligible error. Also, while the uncertainty associated with concentrations may be assumed to be normally distributed and to be constant within the range analysed (allowing the use of regressions not weighted by the uncertainty in the variables represented in the x and y axis), this is not the case for the inverse of concentration (which shows a much higher uncertainty for large values of the variable, corresponding to low concentrations). The confidence intervals are also not symmetric for inverse concentrations, with both properties leading to best fits that do not reflect the average behaviour of the system [71]. Finally, those linearization procedures were introduced at a time when computers were not available for data analysis. While the risk of misinterpreting the data through the use of linearization procedures was acceptable then, it is certainly not justified now.
To further evaluate the adequacy of data linearization and representation in Scatchard plots, this procedure was applied to the case of 3 independent binding sites in the conditions considered previously for Figure 4. The results are shown in Figure 5. Plot A represents the protein saturation divided by the concentration of free ligand, as a function of protein saturation. As expected, when the ligand affinity to all binding sites is the same (black), a linear dependence is observed with the intercept in the y-axis being equal to n K b (3 × 10 6 M −1 ), the x-axis intercept being n, and the slope equal to K b . In the other extreme, when the binding affinities differ by 3 orders of magnitude (blue), only the population of the highest affinity site is captured (although in the concentration range analysed they all become occupied, as indicated in the x-axis that goes up to n P = 3), with a linear variation leading to n K b (1.0 × 10 6 M −1 ) and n = 1.0. A similar behaviour is observed when the affinities differ by two orders of magnitude (green), the best linear fit for low values of n P leading to n K b (9.9 × 10 5 M −1 ) and n = 1.0. When the binding affinities differ by only one order of magnitude (red), non-linear Scatchard plots are obtained, with the best fit for small n P leading to n K b (9.0 × 10 5 M −1 ) and n = 1.2. The non-linear dependence may be incorrectly interpreted as negative cooperativity, despite resulting from binding to different, but independent, binding sites. Therefore, unless the protein being studied is known to have several equivalent binding sites (as is usually the case for multimeric proteins) and the sample is of high purity, negative cooperativity cannot be inferred from non-linear Scatchard plots.
The lines in plot A are best fits of straight lines to the region of low protein saturation. Note that in plot (A) the protein saturation in the y -axis is divided by the concentration of free ligand, while in plot (B) the total ligand concentration is used.

Indirect Characterization of the Binding Parameters Indirectly from Changes in a Protein Property
In the previous analysis, it was assumed that the number of ligands per protein could be directly characterized. However, if protein saturation is not directly accessed, but rather inferred from a property that varies with protein saturation ( S Δ ), the number of binding sites can only be evaluated from the analysis of the results, and inadequate procedures may lead to an incorrect description of the system. This will be addressed in the current section, considering proteins with 3 independent binding sites and several cases with distinct relative values of the binding affinities.
The signal from the protein at a given concentration of ligand, S , depends on the concentration of all different proteins species, , and on the signal characteristic of each species, i s , Equation (17).
Note that i PL does not represent a protein with i ligands, which would be i PL , in the notation followed in this work. A protein with a given number of ligands corresponds to many distinct species, depending on the binding sites to which the ligands are bound. Thus, in Equation (17), N is not equal to the number of binding sites ( n), but rather to the number of possible combinations of bound ligands. Assuming that the same signal is obtained for all combinations with the same number of bound ligands, Equation (17) becomes The lines in plot A are best fits of straight lines to the region of low protein saturation. Note that in plot (A) the protein saturation in the y-axis is divided by the concentration of free ligand, while in plot (B) the total ligand concentration is used.
As expected, if the total concentration of ligand is used instead of the free concentration (plot B), the information obtained is meaningless and misleading. In this case, non-linear dependences are obtained for all cases analysed, with the obtained downward curvatures suggesting positive cooperativity. Unfortunately, this incorrect data analysis may be found in the literature. Hopefully this review will alert inexperienced readers to possible pitfalls and will contribute to a proper data analysis.

Indirect Characterization of the Binding Parameters Indirectly from Changes in a Protein Property
In the previous analysis, it was assumed that the number of ligands per protein could be directly characterized. However, if protein saturation is not directly accessed, but rather inferred from a property that varies with protein saturation (∆S), the number of binding sites can only be evaluated from the analysis of the results, and inadequate procedures may lead to an incorrect description of the system. This will be addressed in the current section, considering proteins with 3 independent binding sites and several cases with distinct relative values of the binding affinities.
The signal from the protein at a given concentration of ligand, S, depends on the concentration of all different proteins species, PL i , and on the signal characteristic of each species, s i , Equation (17).
Note that PL i does not represent a protein with i ligands, which would be PL i , in the notation followed in this work. A protein with a given number of ligands corresponds to many distinct species, depending on the binding sites to which the ligands are bound. Thus, in Equation (17), N is not equal to the number of binding sites (n), but rather to the number of possible combinations of bound ligands. Assuming that the same signal is obtained for all combinations with the same number of bound ligands, Equation (17) becomes and may be expressed in terms of the macroscopic binding associations, Equation (19).
In this section, it is considered that the signal variation is the same when the ligand binds to any of the binding sites, and that the magnitude of the signal variation is proportional to the number of bound ligands, (s i − s 0 ) = ∆s max (i/n). This is a good approximation when properties such as the overall mass or charge of the ligand-protein complex are being followed. However, the observed property may vary differently, and this aspect will be discussed in Section 2.3 in the context of quenching of protein fluorescence by bound ligand.
From the analysis of the signal variation as a function of the ligand concentration, it is in principle possible to obtain the intrinsic signal variation for each number of bound ligands (s i − s 0 ), and the macroscopic association constants β i . In turn, from the relations between the macroscopic association constants, some information may be obtained regarding the similarity and/or dependency of the different binding sites, Equation (20).
1/i − 1 = 1 identical and independent binding sites < 1 non − identical binding sites or identical but not independent (negative cooperativity) > 1 non − identical binding sites or identical but not independent (positive cooperativity) (20) In the case of equal and independent binding sites, the macroscopic association constants depend on the microscopic association constant according to [4]: The simulations shown in Figure 6A were performed for [P T ] = 1 µM, K I b = 10 6 M −1 , and at different relative values of the binding affinity to the other two binding sites in the protein ( The lines correspond to the best fit of Equation (3) assuming an adjustable number of equal and independent binding sites (n ≥ 1, solid thin lines), or considering only one binding site per protein (n = 1, dashed thick lines). The two sets of fitting curves cannot be visually distinguished and lead to the same parameters for the cases considered of binding sites with different affinities (curves in blue and green). The quality of the best fit for different values of n is presented in Figure S4, showing a small improvement as n decreases. The fitting parameters are thus the same for both models, with the association constant obtained from the best fit being similar to that of the binding site with intermediate affinity (K II b = 10 5 M −1 ) and n = 1. However, different parameters are obtained for the best fit of the simulation results when all binding sites are equal ( Figure 6, curve and parameters in red). In this case, the best fit with adjustable n shows a minimum in the squared deviation for n = 3 (plot B), with reasonable sensitivity for both lower and higher values of n. Nevertheless, the representation of the best fit curves for values of n from 1 to 4 (plot C) shows that the fitting sensitivity is insufficient for the accurate characterization in the case of experimental data due to the unavoidable presence of noise (here represented as large symbols for the simulated results). lower and higher values of n. Nevertheless, the representation of the best fit curves for values of n from 1 to 4 (plot C) shows that the fitting sensitivity is insufficient for the accurate characterization in the case of experimental data due to the unavoidable presence of noise (here represented as large symbols for the simulated results).  In all cases considered, the magnitude of the signal variation per ligand bound decreases as the number of binding sites considered increases, such that the total variation in the signal is obtained only when the protein is fully saturated ( Figures 6B and S4, plots A and C). Somewhat surprisingly, the association constant obtained from the best fit decreases with the decrease in the number of binding sites, leading to a large variation in the overall protein binding affinity obtained from the best fit: 10 K K K K = = (plots I and J). The analysis of the data presented in Figures 6 and S5 shows that the sensitivity of the best fit to n depends on the fraction of ligand associated with the protein, which is presented in Figure 7 for the various conditions considered. If the ligand is in large excess, the shape of the protein saturation curve is not dependent on the considered number of The lines are the best fits considering only one type of binding sites, Equation (3), with the number of binding sites fixed at 1 (thick dashed lines) or as an adjustable variable (thin continuous lines). The parameters obtained from the best fit are also shown. Plot (B): effect of the number of binding sites per protein in terms of the quality of the best fit (•), association constant (•) and signal variation per ligand bound ( ), for the case In all cases considered, the magnitude of the signal variation per ligand bound decreases as the number of binding sites considered increases, such that the total variation in the signal is obtained only when the protein is fully saturated (Figures 6B and S4, plots A and C). Somewhat surprisingly, the association constant obtained from the best fit decreases with the decrease in the number of binding sites, leading to a large variation in the overall protein binding affinity obtained from the best fit: nK b varies from 5.4 × 10 5 M −1 for n = 1 to 5.9 × 10 6 M −1 for n = 4 in the case of equal binding sites, with the correct description of the system for n = 3 leading to nK b = 3.0 × 10 6 M −1 . However, the quality of the distinct best fits to experimental results would not allow a discrimination between the distinct set of parameters, and the simpler model would thus be selected. This would underestimate the global binding affinity by nearly one order of magnitude. Attempting to understand this effect, distinct simulations have been performed at lower and higher affinities. The results are shown in Figure S5. For K I b = 10 4 M −1 (plots A to C), the quality of the best fit with a variable number of equal and independent binding sites is even less sensitive to n, with K b being always approximately 1 × 10 4 M −1 for the case of equal binding sites (plot B and C), and close to the average K i b for different affinity binding sites. In the case of high-affinity binding, K I b = 10 8 M −1 (plot D), a high sensitivity to n is observed for the case of equal binding sites (plots E and F) and for (plots I and J). The analysis of the data presented in Figure 6 and S5 shows that the sensitivity of the best fit to n depends on the fraction of ligand associated with the protein, which is presented in Figure 7 for the various conditions considered. If the ligand is in large excess, the shape of the protein saturation curve is not dependent on the considered number of binding sites. In this case, it is not possible to obtain the number of binding sites from the best fit of the saturation curve. On the other hand, if the fraction of ligand bound to the protein is significant, increasing the number of binding sites leads to a decrease in the fraction of ligand free in solution, and consequently to differences in the shape of the protein saturation curve. In this case, the protein saturation curve is a straight line until almost full saturation. When the data is tentatively described by a model with fewer binding sites, the predicted fraction of free ligand is higher and the saturation curve linearly reaches saturation at lower total ligand concentrations. The best fit then obtained underestimates saturation at high values and overestimates it at low saturation, leading to a lower association constant. protein saturation curve. In this case, the protein saturation curve is a straight line until almost full saturation. When the data is tentatively described by a model with fewer binding sites, the predicted fraction of free ligand is higher and the saturation curve linearly reaches saturation at lower total ligand concentrations. The best fit then obtained underestimates saturation at high values and overestimates it at low saturation, leading to a lower association constant. In conclusion, when protein saturation with ligand is followed indirectly for proteins with several similar binding sites, information regarding the number of binding sites and their respective association constants can only be obtained if binding is limited by ligand availability. Otherwise, the dependence of the property on the total ligand concentration follows a hyperbola and is insensitive to the number of binding sites in the protein. In this situation, there is no justification to use complex models, and the simplest one (n = 1) is usually used. The association constant obtained is similar to or smaller than the true binding association, thus leading to a significant underestimation of the fraction of ligand bound to the protein. If binding is limited by ligand availability but the data is noisy, it will also be impossible to distinguish between models with different n. The description of the system with the simplest model may in this case lead to a severe underestimation of the amount of ligand associated with the protein. It is therefore essential to perform the experiment at different protein concentrations, to check for consistency and find the most adequate model and parameter values.

Quenching of Protein Fluorescence: Stern-Volmer Plots
The characterization of binding to proteins through ligand quenching of protein fluorescence is a method that is very commonly used and deserves special attention. The fluorescence decrease is usually described by the Stern-Volmer equation, where 0 S and S represent the fluorescence intensity in the absence and in the presence of ligand, respectively. The latter depends on the concentration of all different protein In conclusion, when protein saturation with ligand is followed indirectly for proteins with several similar binding sites, information regarding the number of binding sites and their respective association constants can only be obtained if binding is limited by ligand availability. Otherwise, the dependence of the property on the total ligand concentration follows a hyperbola and is insensitive to the number of binding sites in the protein. In this situation, there is no justification to use complex models, and the simplest one (n = 1) is usually used. The association constant obtained is similar to or smaller than the true binding association, thus leading to a significant underestimation of the fraction of ligand bound to the protein. If binding is limited by ligand availability but the data is noisy, it will also be impossible to distinguish between models with different n. The description of the system with the simplest model may in this case lead to a severe underestimation of the amount of ligand associated with the protein. It is therefore essential to perform the experiment at different protein concentrations, to check for consistency and find the most adequate model and parameter values.

Quenching of Protein Fluorescence: Stern-Volmer Plots
The characterization of binding to proteins through ligand quenching of protein fluorescence is a method that is very commonly used and deserves special attention. The fluorescence decrease is usually described by the Stern-Volmer equation, where S 0 and S represent the fluorescence intensity in the absence and in the presence of ligand, respectively. The latter depends on the concentration of all different protein species, PL i , and on the signal characteristics of each species, s i , Equation (17) in Section 2.2.4.
The equation for S 0 /S is thus: Remember that PL i does not represent a protein with i ligands, but rather each different combination of i ligands bound to the distinct binding sites. Thus, in Equations (17) and (23) N is not equal to the number of binding sites (n), but rather to the number of possible combinations of bound ligands. Assuming that the same quenching efficiency is obtained for all combinations with the same number of bound ligands, Equation (23) may be expressed in terms of the macroscopic binding associations, where N is now equal to n: It should be highlighted that the validity of this approximation is difficult to evaluate and will depend both on the protein and ligand. For example, for proteins with several fluorescent groups (e.g., bovine serum albumin, which has two tryptophan residues), the fluorescence quantum yield of each group may be different, and each of them will most likely be quenched with different efficiencies by ligands in the distinct binding sites. This approximation is also invalid if ligand binding at some of the binding sites in the protein causes conformational changes that affect the protein fluorescence quantum yield and quenching efficiency by subsequent ligands. In the following analysis the possible nonvalidity of this approximation will be ignored, but this should be kept in mind whenever the results obtained by protein fluorescence quenching do not agree with those from other approaches.
Equation (24) is further simplified if it may be considered that the residual protein fluorescence is zero when any of the binding sites is occupied with ligand (s i = 0 f or i = 0, corresponding to 100% efficient fluorescence quenching by the ligand): Although usually assumed, full quenching of the protein fluorescence by the ligand is frequently not supported experimentally, and in fact it is commonly not expected. Attention will therefore be given to the consequences of its non-validity later in this section.
The relation between the Stern-Volmer constant and the binding constants to the different binding sites may be obtained by comparing Equation (25) with Equation (22), leading to: If the protein has only one binding site, Equation (26) simplifies to Equation (27), and the Stern-Volmer constant corresponds to the association constant: Thus, for proteins with a single binding site, when ligand binding leads to full quenching of protein fluorescence, and if the free (not the total) ligand concentration is used as the independent variable, the slope of a linear Stern-Volmer plot is equal to the association binding constant. If any of the above conditions is violated, K SV will deviate from K b .
If the proteins contain several binding sites, there is no simple correspondence between K SV and K b . In fact, in this case K SV is not a constant, its value is dependent on the concentration of ligand, and thus non-linear Stern-Volmer plots are obtained. This is frequently observed when characterizing ligand association with proteins containing several binding sites, such as serum albumin. Although this is the expected behaviour, the authors usually analyse the results with the simplified Equation (22) which is unable to capture the observed behaviour and leads to misinterpretation. However, some relevant information may be obtained when the linear range of S 0 /S variation is considered (that is, for low ligand concentrations). In this case, Equation (26) simplifies to Equation (28). Therefore, in the case of proteins with several binding sites with full quenching by the ligand in any of the binding sites, the value of K SV obtained from the linear fit at low ligand concentrations provides an estimate of the overall binding affinity.
In the simplest case, with n identical and independent binding sites, β 1 = n K b . If the quenching efficiency is not 100%, the limit at low ligand concentrations may still provide useful information. From the general Equation (23) one may obtain Equation (29). The derivation of Equations (28) and (29) is provided in Section S3 of the Supplementary Materials.
The quenching efficiency is usually an unknown parameter, which prevents researchers from using Equation (29). If Equation (28) is used but the quenching efficiency is not 100%, the value obtained for the parameter β 1 will underestimate the true value. For example, if the quenching efficiency is only 10% (s i = 0.9), the estimated value for β 1 will be an order of magnitude lower than its true value. Higher quenching efficiencies lead to a more accurate estimation of β 1 .
To exemplify some expected Stern-Volmer plots for quenching of a protein with several binding sites, a set of conditions has been simulated for ligand binding to a protein with 3 binding sites, Figures 8 and 9. In Figure 8 the 3 binding sites are considered equal and independent, with the ligand affinity changing from low (K b = 10 4 M −1 ), to intermediate (K b = 10 6 M −1 ), and high (K b = 10 8 M −1 ). However, the possibility of different quenching efficiencies for the ligand bound to each binding site was accounted for in the examined situations: 100% quenching efficiency regardless of the binding site to which the ligand is associated (s i = 0 for i = I to III, circles in black), s I = 0.6, s II = 0.3, and s III = 0 (circles in red), and only ligand in binding site III leading to fluorescence quenching s I = s II = 1, and s III = 0 (circles in blue). For the case of distinct quenching efficiencies, it was considered that when two binding sites are occupied the fluorescence quenching efficiency is given by the product of the residual fluorescence when the ligand is in each of the binding sites (s I&II = s I s II ). The concentration of protein with a given combination of ligands PL i was calculated from [PL i ] and the relative value of the binding affinities that lead to that ligand occupation. As an example, the concentration of protein with ligand only in binding site I is equal to [PL 1 ]K I b / K I b + K II b + K III b , and with ligand in binding sites I and II is equal The first observation from the results shown in Figure 8 is that Stern-Volmer plots with an upward curvature are generally obtained. The sole exceptions occur when only the ligand bound to site I I I leads to fluorescence quenching (blue circles). This situation is formally equivalent to having only one binding site (i.e., the other binding sites are "silent"), and in this case a linear dependence of S 0 /S with [L W ] is in fact expected, Equation (27). However, if (incorrectly) the total concentration of ligand is considered (bottom plots), an upward curvature is observed even in this case when the binding affinity is intermediate or high (plots E and F).
Due to the non-linear Stern-Volmer plots, the value of K SV obtained from the best fit of the simplified Equation (22) depends on the considered range of ligand concentrations and the corresponding magnitude of fluorescence variation. The dashed lines shown in Figure 8 were obtained from the best fit for S 0 /S between 1 and 1.1. When the concentration of free ligand is considered (upper plots), the values obtained for K SV lead to reasonable estimates of K b (within the same order of magnitude) for all considered conditions, in agreement with the predictions from Equations (28) and (29). As predicted from Equation (29), the value of K SV depends on the extent of protein fluorescence quenching by the ligand. The larger the quenching efficiency, the higher the value obtained for K SV . The upper limit of K SV is obtained for s i = 0, i = 0 and should have been equal to K I b + K II b + K III b , Equation (28). However, somewhat higher values are obtained in some situations due to the upward curvature of S 0 /S. The above discussion shows that the order of magnitude of the binding affinity may be obtained from the analysis of the linear region of Stern-Volmer plots (Equations (22), (27), (28) or (29), depending on the specific conditions) if the concentration of free ligand is known. However, unless measured directly, the concentration of free ligand cannot be known before calculation of the binding affinity. If the total concentration of ligand is considered instead, deviations by several orders of magnitude may be obtained (e.g., for The first observation from the results shown in Figure 8 is that Stern-Volmer plots with an upward curvature are generally obtained. The sole exceptions occur when only the ligand bound to site III leads to fluorescence quenching (blue circles). This situation is formally equivalent to having only one binding site (i.e., the other binding sites are "silent"), and in this case a linear dependence of 0 is in fact expected, Equation (27). However, if (incorrectly) the total concentration of ligand is considered (bottom plots), an upward curvature is observed even in this case when the binding affinity is intermediate or high (plots E and F). of proteins with multiple binding sites cannot be obtained from Stern-Volmer plots of protein fluorescence quenching by the ligand. At most, the order of magnitude of the overall binding affinity may be obtained. It is therefore preferable to consider the simplest models, with only one binding site per protein, or several equal and independent binding sites at most. The simulations shown in this section considered proteins with several binding sites, and it was shown that an upward curvature in the Stern-Volmer plots is expected, Figures  8 and 9. The remainder of this section will be dedicated to the simpler case of proteins with a single binding site, and it will be seen that in this case downward curvature is the general behaviour. A downward curvature in Stern-Volmer plots is commonly observed To increase the accuracy of K SV as an estimate of the protein-binding ability, its evaluation must include the calculation of the corresponding free ligand concentrations. This approach is not feasible for proteins with several binding sites with distinct ligand affinities and/or binding cooperativity, because in these cases [L W ] must be calculated numerically for each evaluation of K SV (and corresponding K b ). A reasonable compromise between accuracy and feasibility may be achieved by considering that all binding sites are equal and independent. In this case, [L W ] may be calculated from [L T ] − [L B ], with [L B ] being the x − solution of Equation (3). In this case (considering equal and independent binding sites), the binding affinity for the binding sites (K b ) may be obtained from the dependence of S 1 /S vs. [L W ] through several alternative approaches: (i) K SV may be obtained from the best fit of a straight line at low quenching, S 1 /S between 1 and 1.1 being usually a good balance between sensitivity and little deviation from linearity. The binding constant K b is then obtained from K SV using Equation (28), considering 100% quenching efficiency (s i = 0, i = 0), or (29) if the quenching efficiency is known. In this approach it is assumed that each protein has n equal and independent binding sites, with the corresponding [B T ] in Equation (3) being given by n[P T ]. (ii) K b may also be obtained from the best fit of the whole range of S 1 /S, assuming that the same quenching efficiency is attained for proteins with 1 or more ligands bound (s i /s 0 = s x /s 0 , i = 0), and with S 1 /S given by Equation (30).
with β i being related with K b through Equation (21) in Section 2.2.4, for the case of n equal and independent binding sites. A Microsoft Excel file has been developed for the analysis of Stern-Volmer-like plots for protein-ligand binding experiments and will be provided to interested readers upon request.
The fitting procedure indicated above was applied to the simulated data to evaluate its applicability limits. For low binding affinity (K b = 10 4 M −1 ) and 100% quenching efficiency, the best linear fits to S 1 /S < 1.1 lead to good estimates of n K b , independently of the number of binding sites considered (n = 1 to 4). When the whole curve was analysed, assuming n = 1 leads to a poor fit and overestimation of n K b . A better fit and more accurate parameter estimates were obtained for n = 3, suggesting that this analysis may provide some information on the number of binding sites per protein. As observed before (Figure 8), as the quenching efficiency is progressively decreased, n K b tends to be underestimated, both from the linear best fit at low S 1 /S or from the whole range of S 1 /S. An equivalent situation is obtained for intermediate binding affinities (K b = 10 6 M −1 ), with the linear best fit leading to good estimates and some discrimination regarding the number of binding sites when the whole curve is analysed. For high binding affinities and 100% quenching efficiency, only n = 1 could be used for the best linear fit to S 1 /S < 1.1. Setting n > 1 leads to poor sensitivity and estimates of n K b diverging to infinity. This is due to the upward curvature that occurs even at very low quenching. Again, as the quenching efficiency is decreased, the estimates obtained for n K b deviate significantly from the true values, underestimating this parameter by more than two orders of magnitude for s I = s II = 1, s III = 0. The values obtained for the parameters of all conditions considered are shown in the supplementary materials (Table S1). Importantly, the previous results show that the analysis of Stern-Volmer plots per se underestimates the values of high-affinity binding constants, and thus cannot be relied upon for their estimation. For high-affinity binding, the direct quantification of the free ligand concentration is required. This allows understanding of the observation that the binding affinities obtained from the analysis of Stern-Volmer plots are never much higher than 10 6 M −1 , leading to dependences of n K b for homologous series (e.g., with increased hydrophobicity) that level-off close to 10 6 M −1 [72].
In the above analysis, the models used considered that the binding sites in the protein were equal and independent. This was justified when describing the simulated results as they were in fact obtained for equal and independent binding sites. This is usually not the case in real systems where the complexity of the system may be significantly higher, with the binding sites in the protein being different and/or displaying cooperativity. If the general Equation (25) is used without any assumption regarding the relation between the macroscopic binding constants, from their relations the dissimilarity of the binding sites or eventual cooperativity in ligand binding could in principle be evaluated, Equation (20) in Section 2.2.4. However, this procedure is strongly discouraged because it could lead to severe misinterpretation of the data. To elucidate this, the simulations obtained assuming that the protein contains 3 equal and independent binding sites with K b = 10 6 M −1 and a fluorescence quenching efficiency dependent on the binding site (s 1 = 0.6, s 2 = 0.3, s 3 = 0) was analysed with several possible combinations of β i . The results obtained are shown in Figure 9.
As expected, the variation of S 0 /S predicted from Equation (25) for the set of binding parameters used to obtain the simulation, n = 3 and K i b = 1.0 × 10 6 M −1 for all i (green line on plot A), does not accurately describe the simulation results. This is because only partial fluorescence quenching was considered, s 1 = 0.6, s 2 = 0.3, s 3 = 0, which is not in agreement with the approximations required to obtain Equation (25). To obtain a good fit it is necessary to adjust the binding parameters which will thus deviate from the "true" ones. Many different combinations of parameter estimates yield equally good fits of the observed dependence of S 0 /S with [L W ]. The examples shown in Figure 9 correspond to the best fit of equation (25), allowing β i to vary while fixing the number of binding sites at n = 3 (plot A), n = 2 (plot B), or n = 1 (plot C). The best fit with n as an adjustable parameter and considering infinite cooperativity, a procedure commonly followed to describe Stern-Volmer plots with upward curvature [53,72], is also shown in plot C. The ratio between the values of the concentration of free ligand obtained from each best fit, [L W ] Fit , and its "true" value, is shown in plots D to F. For this particular situation this ratio is almost always higher than 1, to compensate for the apparent weaker binding affinity that results from the inefficient fluorescence quenching. The values obtained for β i vary widely, suggesting positive cooperativity in some cases and negative cooperativity in others, with the binding affinity ranging from ∼ = 10 6 M −1 (close to the true one) to over 10 8 M −1 .
This shows very clearly that detailed information regarding the binding properties of proteins with multiple binding sites cannot be obtained from Stern-Volmer plots of protein fluorescence quenching by the ligand. At most, the order of magnitude of the overall binding affinity may be obtained. It is therefore preferable to consider the simplest models, with only one binding site per protein, or several equal and independent binding sites at most.
The simulations shown in this section considered proteins with several binding sites, and it was shown that an upward curvature in the Stern-Volmer plots is expected, Figures 8 and 9. The remainder of this section will be dedicated to the simpler case of proteins with a single binding site, and it will be seen that in this case downward curvature is the general behaviour. A downward curvature in Stern-Volmer plots is commonly observed in protein-binding experiments, and it is usually interpreted as indication that there are several fluorophores in the protein and not all are being quenched by the ligand. For the case of proteins containing several tryptophan residues this is a particularly convenient interpretation. However, the reason behind this behaviour may be simply that the presence of the ligand in the binding site does not lead to full quenching (s i = 0). Figure 10 shows simulations for a protein with one binding site, that binds the ligand with low, moderate or high affinity, leading to full or only partial quenching of the protein fluorescence. The simulation results for S 0 /S were analysed to obtain estimates of the binding affinity and fluorescence quenching efficiency. The equations used for the linear best fit at small S 0 /S and for the best fit of the whole variation in S 0 /S are easily obtained from Equations (29) and (30), respectively, and are shown below for completeness.
The slope at S 0 /S close to unity is related to the binding affinity and quenching efficiency by: and the whole variation of S 0 /S is related to the binding affinity and quenching efficiency by: Equation (32) is formally equivalent to the equation commonly used to account for downward curvature in Stern-Volmer plots, with s 1 /s 0 being interpreted as the fraction of fluorescent groups in the protein that are not accessible to the ligand (1 − f a ). However, the interpretation of the results obtained from both approaches is fundamentally different. In Equation (32), s 1 /s 0 provides information regarding the proximity of the ligand to the protein fluorophores in a single binding site protein. In contrast, in the usual analysis, f a is interpreted as the fraction of ligands in a given binding site, assuming a priori and without any support that quenching is 100% efficient and that the protein has multiple binding sites. W and the incorrect assumption that 1 0 s s = 0 (red) will also lead to underestimation of b K (for 1 0 s s = 0.8, the value obtained for SV K is 2.7 × 10 5 M −1 , leading to the same value for b K if full quenching is assumed). If both [ ] W L from the fit and the correct 1 0 s s are considered (blue), SV K is 1.5 × 10 6 M −1 leading to b K = 7.5 × 10 6 M −1 , which although closer still underestimates the true value.  (1) and x − solution of (3) assuming that K b = K SV (•, and ), or using Equation (31) to calculate K b from K SV (•, and ). The full variation of S 0 /S was also analysed with Equation (32), to obtain directly K b and s 1 /s 0 (•, and ). The values of K SV obtained from the best fit at low S 0 /S are shown in the plots with the same colour as the respective best fit lines. The parameters obtained from the best fit of the whole variation of S 0 /S are shown in green.
In Figure 10, the quenching efficiency was varied from 100% to 20% (s 1 /s 0 = 0, 0.2, 0.5, and 0.8, left to right plots), and the binding affinity was varied from low to high (K b = 10 4 , 10 6 , and 10 8 M −1 , top to bottom). The dashed lines in each plot correspond to the best fit of Equation (22) for S 0 /S close to unity, with the relation of K SV with K b given by Equation (31), and the concentration of free ligand calculated from the x − solution of Equation (3). The circles and dashed lines in red assume full quenching, while the value of s 1 /s 0 considered in the simulations is used for the circles and dashed lines in blue. To highlight the inadequacy of using [L T ] instead of [L W ], this condition is also shown with black circles and dashed lines. Finally, the best fit of Equation (32) to the whole variation of S 0 /S is shown as green circles and a continuous line. In this case, the quenching efficiency was an adjustable parameter. The estimated parameter obtained from the distinct best fits is shown in the respective plots with the same colours as the corresponding curves.
When the data is analysed with the best fit of the linear region, only one parameter may be estimated, K SV . If the quenching efficiency is known, K b may be calculated accurately from K SV using Equation (31), otherwise K b will be underestimated. For low-affinity binding (plots A to D), the value obtained for K SV is essentially independent on the fitting procedure because in this case binding of the ligand to the protein does significantly deplete it in the aqueous phase. As the quenching efficiency decreases, the K SV obtained decreases as well, underestimating K b if s 1 /s 0 is assumed to be zero. For intermediate binding affinity (plots E to H), significant depletion of solute in the aqueous medium is observed and underestimates of K b will be obtained if [L T ] is considered (black). For the other fitting procedures, the conclusions are the same as given above for low-affinity binding.
The case of high binding affinity is represented in plots I to L (and M to P with a logarithmic scale for the x axis). Neglect of ligand depletion leads to underestimation of K b by several orders of magnitude (black). Considering [L W ] obtained from the best fit and the incorrect assumption that s 1 /s 0 = 0 (red) will also lead to underestimation of K b (for s 1 /s 0 = 0.8, the value obtained for K SV is 2.7 × 10 5 M −1 , leading to the same value for K b if full quenching is assumed). If both [L W ] from the fit and the correct s 1 /s 0 are considered (blue), K SV is 1.5 × 10 6 M −1 leading to K b = 7.5 × 10 6 M −1 , which although closer still underestimates the true value.
The correct parameters (K b and s 1 /s 0 ) are obtained when the full variation of S 0 /S is analysed with Equation (32) (green) for all binding affinities and quenching efficiencies. However, some care must be taken regarding the sensitivity of the best fit to the estimated parameters. This is illustrated in Figure S6 for low, intermediate and high affinity. At the concentration of protein considered in those simulations (1 µM), the sensitivity is high for low binding affinity, with the best fit, considering ratios of 1.5 or 0.67 between the estimated and the true K b , leading to clear misfits (dark blue and orange). A good sensitivity is still obtained for the case of intermediate affinity, although when the x axis is represented in the logarithmic scale it is seen that a clear misfit is only obtained for ratios of 2 and 0.5. This is due to the significant ligand depletion, which is dependent on the estimated K b and shifts the curve of S 0 /S in the x axis. This effect becomes more evident for high-affinity binding (right plots), where essentially the same quality is obtained for the best fit with the estimated parameter varying from 0.5 to 2 times the true K b . A clear misfit of the experimental S 0 /S is only observed for an underestimation of K b by at least one order of magnitude, and essentially no sensitivity is observed for overestimation of K b . A better sensitivity in the estimation of K b is obtained if the total concentration of protein is decreased, with [P T ] = 10 −8 M leading to a good parameter estimation when In conclusion, quenching of protein fluorescence due to ligand binding is an appropriate methodology to quantitatively characterize the binding to single binding site proteins, providing both the binding affinity and the quenching efficiency. However, the range of ligand concentrations must be large enough to lead to non-linear Stern-Volmer plots, and the whole variation of S 0 /S should be analysed with Equation (32). Additionally, to ensure a good sensitivity regarding the estimated binding affinity, the total concentration of protein should be lower than the estimated value for the dissociation constant. If the solubility of the ligand in the aqueous medium does not allow for reaching high enough concentrations to generate non-linear Stern-Volmer plots, the variation of S 0 /S must be analysed with Equation (22), and K b assumed equal to K SV . This procedure will underestimate K b if the quenching efficiency is not 100%. If s 1 /s 0 may be obtained independently, an accurate estimation of K b may be obtained from K SV using Equation (31).
When the proteins have several binding sites, the quenching of protein fluorescence may very easily lead to incorrect estimations of the binding parameters. To increase the accuracy, a wide range of ligand concentrations should be used to lead to non-linear Stern-Volmer plots (which in this case are expected to show upward curvature) and S 0 /S should be analysed with models considering several equal and independent binding sites, Equations (21) and (30). This procedure is expected to yield good estimates of the total binding affinity. If moderate-to-high binding affinity is obtained, the experiments should be repeated at a lower total concentration of protein to increase the confidence in the parameter estimated for high binding affinity.
In any case, the use of the simple Stern-Volmer equation with [L W ] replaced by [L T ] is an incorrect procedure that may underestimate the binding affinity by several orders of magnitude. The concentration of free ligand is dependent on the binding affinity and must be calculated for each set of parameters. The assumption that all binding sites in the protein are equal and independent leads to a simple procedure for the calculation of [L W ], being a good compromise between accuracy and convenience.

Characterization of the Binding Parameters from Ligand Saturation Curves
It is sometimes easier or more convenient to follow the interaction of the ligand with the protein through a variation in the properties of the former. This is a convenient situation in the case of fluorescent ligands that change their fluorescence properties upon binding. It may also be a necessity if the ligand solubility does not allow attainment of protein saturation with ligand, limiting the maximum concentration achieved and/or leading to the formation of aggregates.
If the protein has only one binding site and the concentration of protein is known, the formalism is simple, and following protein saturation or ligand saturation leads to equivalent and accurate results. Equation (3) or Equation (7) can be used for this purpose, the latter equation only if the protein is in large excess. However, accuracy and equivalence are unwarranted when the protein has several binding sites, and/or when its concentration is not well-characterized. Some specific examples will be analysed below.

Ligand Binding to Proteins with Multiple Binding Sites
The ligand saturation curves obtained for the case of proteins with three independent binding sites are shown in Figure 11 for different binding affinities and distinct relations between the affinities for the distinct binding sites in the protein.
in the protein are equal and independent leads to a simple procedure for the calculation of [ ] W L , being a good compromise between accuracy and convenience.

Characterization of the Binding Parameters from Ligand Saturation Curves
It is sometimes easier or more convenient to follow the interaction of the ligand with the protein through a variation in the properties of the former. This is a convenient situation in the case of fluorescent ligands that change their fluorescence properties upon binding. It may also be a necessity if the ligand solubility does not allow attainment of protein saturation with ligand, limiting the maximum concentration achieved and/or leading to the formation of aggregates.
If the protein has only one binding site and the concentration of protein is known, the formalism is simple, and following protein saturation or ligand saturation leads to equivalent and accurate results. Equation (3) or Equation (7) can be used for this purpose, the latter equation only if the protein is in large excess. However, accuracy and equivalence are unwarranted when the protein has several binding sites, and/or when its concentration is not well-characterized. Some specific examples will be analysed below.

Ligand Binding to Proteins with Multiple Binding Sites
The ligand saturation curves obtained for the case of proteins with three independent binding sites are shown in Figure 11 for different binding affinities and distinct relations between the affinities for the distinct binding sites in the protein.
The first observation is that a good fit is always obtained when the fraction of bound ligand is described by a model that considers an adjustable number of equal and independent binding sites, as in Equation (3). This clearly shows that the shape of the ligand saturation curve does not allow to distinguish between equal or different binding sites in the protein. For intermediate and high-affinity binding, it is observed that the best fit with an adjustable number of binding sites leads to a better description of the ligand saturation curve than when n = 1 is imposed (continuous vs. dashed lines, respectively). The misfit for n = 1 is clear in the case of high-affinity binding, but for low or intermediate affinity it would not be possible to distinguish between the two situations due to the noise in the real data.
The bottom plots in Figure 11 show the best fit of the simplified Equation (7), which considers that the protein is in large excess. As expected, a good fit is obtained for the case of low affinity (binding occurs at protein concentrations of several hundred µM and the ligand concentration is 1 µM). A clear misfit is obtained for high-affinity binding, and a reasonable fit for intermediate affinity.   (3) with an adjustable number of binding sites n with equal affinity K b (continuous lines), or for a fixed number of binding sites n = 1 (dashed lines). The equivalent is shown in the lower plots but showing the best fit of the simplified Equation (7), considering that the protein is in large excess. The total concentration of ligand is 1 µM in all plots.
The first observation is that a good fit is always obtained when the fraction of bound ligand is described by a model that considers an adjustable number of equal and independent binding sites, as in Equation (3). This clearly shows that the shape of the ligand saturation curve does not allow to distinguish between equal or different binding sites in the protein. For intermediate and high-affinity binding, it is observed that the best fit with an adjustable number of binding sites leads to a better description of the ligand saturation curve than when n = 1 is imposed (continuous vs. dashed lines, respectively). The misfit for n = 1 is clear in the case of high-affinity binding, but for low or intermediate affinity it would not be possible to distinguish between the two situations due to the noise in the real data.
The bottom plots in Figure 11 show the best fit of the simplified Equation (7), which considers that the protein is in large excess. As expected, a good fit is obtained for the case of low affinity (binding occurs at protein concentrations of several hundred µM and the ligand concentration is 1 µM). A clear misfit is obtained for high-affinity binding, and a reasonable fit for intermediate affinity.
The next question is whether the good fits lead to accurate estimations of the ligand overall binding affinity of the protein. Because different binding sites cannot be distinguished, the relevant properties are the overall binding affinity (nK b vs. ∑ K i b ), the highest binding affinity (K b vs. K I b ), and the average binding affinity (K b vs. Because the sites are all assumed equal in the models considered to describe the ligand saturation profiles, the estimates obtained for the highest affinity and for the average affinity deviate significantly from the true values when the binding sites are different. However, the overall binding affinity may be estimated with good accuracy for most conditions. The estimates obtained from the best fit of ligand saturation curves for a large set of conditions are presented in Figure 12. The bottom plots in Figure 12 correspond to the estimates when the best fit of the fraction of ligand bound is obtained with the simplified equations that assume excess protein, Equation (5). As expected, the quality of the estimates is not dependent on the num- The fitting approach that yields the best estimate of the overall protein binding affinity is the use of the full equation considering an adjustable number of equal binding sites (plot A). This procedure leads to an excellent prediction of this property for low and intermediate affinities (less than 15% deviation) and reasonable predictions for high-affinity binding. In this case, the quality of the estimate gets poorer for higher ligand concentrations and for binding sites with similar (but not equal) affinities. This is due to a significant population of binding sites with intermediate affinity, which is not captured by the model that considers only one type of binding sites. When the number of binding sites is fixed at n = 1 (plot B) the predictions are usually good for low-, intermediate-and high-affinity binding at very low ligand concentrations (symbols in green), but overestimates the binding affinity by several orders of magnitude for high ligand concentrations (symbols in red). It should be noted that at high ligand concentrations the quality of the fit was very poor (Figure 11 plot C, dashed lines), which is related to the inadequate calculation of the free ligand concentration, as discussed when interpreting Figure 7.
The bottom plots in Figure 12 correspond to the estimates when the best fit of the fraction of ligand bound is obtained with the simplified equations that assume excess protein, Equation (5). As expected, the quality of the estimates is not dependent on the number of binding sites, as the results in plot C are equal to those in plot D, although the best fit was obtained for n = 1 in plot C. This is because in the used equation n is always associated with K b . The parameters obtained are only good in the case of low affinity binding, and lead to an underestimation of the overall binding affinity for intermediate and high-affinity binding. Surprisingly at first, the quality of the estimates improves as the concentration of ligand decreases, changing from nK This is because, at very low ligand concentrations, significant ligand binding occurs only at total protein concentrations that are in fact much higher than the ligand concentration, as assumed in the derivation of the simplified solution.
From this detailed analysis it may be concluded that, when analysing ligand saturation without knowledge of the detailed binding details (number of binding sites and whether they are equal or different), it is a reasonable approach to consider that all binding sites are equal, and perform the analysis in terms of binding sites using the full equation. If a good fit is obtained, this procedure is expected to yield an accurate prediction of the overall protein-binding affinity. The correct number of binding sites may also be obtained if they are all equal and if binding leads to a significant depletion of the total protein. However, if the binding sites are not equal, the value of n estimated from the best fit is an underestimation. The deviation between the estimated and the real number of binding sites in the protein depends on the values of the distinct binding affinities, the relation being n K b = ∑ K i b when a good estimate of the overall protein-binding affinity is obtained. Concomitantly, following the ligand saturation does not lead to good estimates of the highest binding affinity. Exceptions occur when the affinity is the same for all binding sites, or when the difference between the distinct binding sites is very large (more than one order of magnitude), in this case leading only to the characterization of the highest affinity binding site.
There are several examples in the literature where this procedure was followed to characterize the association of fluorescent amphiphiles with proteins [29,52,[57][58][59][60][73][74][75][76][77]. In most cases the objective was only to evaluate the overall binding affinity. As explained above, this parameter is obtained accurately, even for the case of proteins with multiple binding sites when very low concentrations of ligands are used [29,[58][59][60]73,74].

Ligand Binding as a Partition Coefficient
In some situations, the protein under study is not well-characterized (e.g., in terms of protein molecular weight, and number of binding sites), and only the total mass of protein in solution is known. In these cases, it is useful to describe binding as a partition coefficient between the aqueous medium and the protein. It may also be convenient to describe binding in terms of partition for the case of proteins with several binding sites such as serum albumin [72,78,79]. If the concentration of the ligand is much lower than that of the protein, the volume of protein available for binding (V P ) is maintained throughout the titration, and the equilibrium distribution of ligand is given by Equation (33).
The partition coefficient obtained is related with the overall equilibrium association by: where K obs b is the observed association constant when considering the whole protein, and V P is the molar volume of the whole protein.
The assumption made (V w ∼ = V T ) is valid for dilute solutions, which is the case for studies performed using model systems. Its validity must be verified if this formalism is applied to biological systems or when very high concentrations of binding agents are used.
The concentrations of ligand associated with the protein and in the aqueous phase are given by: If the concentration of ligand is comparable to that of the protein, it is not correct to assume that the volume of the protein available for binding remains constant throughout the titration. In this case, it is necessary to consider the available volume of protein, V * P , which is given by: where # is the number of binding sites per molar volume. The scheme describing the partition of the ligand between the aqueous medium and the protein is obtained replacing V P by V * P in Equation (33). The volume of protein available to interact with the ligand is calculated from the quadratic equation: where the physically meaningful solution is always (V * P ) + . The concentration of ligand associated with the protein or in the aqueous phase may be calculated from the equivalent to Equation (35) but with the volume of the protein being replaced by V * P , which is given by the solution of Equation (37). The application of this formalism to the analysis of ligand binding to proteins with multiple binding sites leads to results identical to those shown in Figures 11 and 12. A good fit is always obtained when considering a variable number of binding sites, and the volume of free protein from Equation (37). The partition coefficient is a good estimate for the overall binding affinity, with K obs b being close to ∑ K i b . The simplified Equation (35) is only valid for very low ligand concentrations or very small K W→P P , which in both cases leads to a low saturation of the protein.
The advantage of this formalism is that it allows to recover the overall binding affinity even when the concentration of protein is not known, only its total mass or volume. If the ligand concentration is high enough to saturate the protein, the best fit is dependent on n, and an estimate is obtained for the number of binding sites in the protein (considered equal and independent). From this it is possible to estimate the volume per binding site. However, if the best fit is not dependent on n, only the partition coefficient is obtained, and it is not possible to calculate the observed binding constant. Nevertheless, from the partition coefficient one may directly calculate the fraction of ligand associated with the protein for a given amount of protein, which is usually the relevant property.
Another advantage of this formalism is the ease of its application to the description of ligand distribution in heterogeneous media with increasing complexity. An example is when membranes and proteins are present in the same aqueous solution, as observed in all biological systems. This situation will be considered in the next section.

Ligand Distribution in the Presence of Proteins and Membranes
Ligand association with membranes is usually treated as a partition because the membranes do not have well-defined binding sites and because they can accumulate relatively high amounts of ligand without changing their binding properties. For small uncharged or singly charged ligands, the membrane may accommodate at least up to 5 mol% of the former without changing their binding properties [33,34]; depending on the ligand and membrane composition, local ligand concentrations as high as 20 mol% may be achieved without significant changes in the membrane binding affinity [80][81][82]. Considering the typical molar volume of a membrane lipid (0.8 dm 3 /mol), the local ligand concentration in the membrane may thus exceed 0.1 M.
The use of the partition formalism allows a direct comparison of the binding affinity of lipid membranes and proteins. For a protein with V P = 50 dm 3 /mol (corresponding to a molecular weight of~70 kg/mol), a moderate binding affinity K obs b = 10 6 M −1 corresponds to K W→P P equal to 2 × 10 4 . At low ligand concentrations, for an equivalent volume of protein and lipid bilayer the ligand will be equally distributed for K W→Lb P equal to 2 × 10 4 . However, at high ligand concentrations the protein becomes saturated while the membrane is still able to accommodate the ligand. This will shift the distribution of ligand from the protein to the membrane. The maximum local concentration of ligand in the protein is equal to 1/ V P /# . For a protein with V P = 50 dm 3 /mol and a single binding site this concentration is 0.02 M, and is thus lower than the maximum ligand concentration achievable in a membrane without changing its properties as a binding agent.
A quantitative analysis of the ligand distribution between the aqueous medium, the protein and the lipid membrane will be presented in the following sections. Focus will be given to the effect of ligand concentrations regarding the fraction of ligand bound to the protein and membrane. Several distinct conditions will be considered: proteins soluble in the aqueous phase (case I), and membrane proteins (case II to IV), as shown in Figure 13. this concentration is 0.02 M, and is thus lower than the maximum ligand concentration achievable in a membrane without changing its properties as a binding agent. A quantitative analysis of the ligand distribution between the aqueous medium, the protein and the lipid membrane will be presented in the following sections. Focus will be given to the effect of ligand concentrations regarding the fraction of ligand bound to the protein and membrane. Several distinct conditions will be considered: proteins soluble in the aqueous phase (case I), and membrane proteins (case II to IV), as shown in Figure 13.

Lipid Membranes and Proteins in Solution
When using the partition formalism, the equations that describe the distribution of ligand between the aqueous phase and the different binding agents are easily obtained. This distribution is given by Equation (38) for the case of lipid membranes and aqueous soluble proteins (case I).

Lipid Membranes and Proteins in Solution
When using the partition formalism, the equations that describe the distribution of ligand between the aqueous phase and the different binding agents are easily obtained. This distribution is given by Equation (38) for the case of lipid membranes and aqueous soluble proteins (case I).
The volume of lipid bilayer available to receive ligand is considered constant (nonsaturable binding agent), while that of the protein depends on the amount of ligand bound. The volume of protein available to the ligand is obtained from the solution of the quadratic equation given in Equation (39), the physically meaningful solution being always x + : The effect of ligand concentration on the membrane volume and that of protein available for ligand binding is represented in Figure 14A for the case of moderate affinity of the ligand for the protein (K obs b = 5 × 10 5 M −1 , K W→P P = 10 4 ), and low affinity for the lipid bilayer (K W→Lb P = 10 2 ). In the absence of ligand, at the protein and lipid concentrations considered, the volume of protein available for ligand binding is slightly higher than that of the lipid membrane. However, as the concentration of ligand is increased, the protein binding sites are occupied and the protein volume available for binding decreases. In contrast, the membrane volume available for binding is maintained. The fraction of ligand in the distinct media is presented in plot B. At very low ligand concentrations, most ligand is associated with the protein. However, as the protein becomes saturated with the ligand, it accumulates in the aqueous medium and in the membrane. For the low membrane affinity considered in this simulation, only about 10% of the ligand is associated with the membrane. A higher membrane affinity is considered in plots C (K W→Lb P = 10 3 ) and D (K W→Lb P = 10 4 ). In these conditions the fraction of ligand associated with the membrane becomes significant and may represent up to 90%. It should be noted that in plot D, the overall affinities of the ligand for the protein and membrane are equal K W→P P = K W→Lb P , and that in the absence of ligand the volume of protein available is slightly higher than that of the membrane. However, the membrane becomes the dominant binding agent for ligand concentrations higher than 9 µM.

Membranes Containing Lipids and Membrane Proteins
In this case one has to consider three possibilities for the equilibration of ligand with the membrane protein: from the aqueous phase only (Case II), from the membrane only (Case III), or from both (Case IV).
The equations that describe the distribution of ligand at equilibrium for case II are the same as those for soluble proteins, Equations (38) and (39). All the observations from the previous section are thus valid in this case, with the presence of the membrane decreasing the apparent affinity of the ligand for the protein.
becomes significant and may represent up to 90%. It should be noted that in plot D, the overall affinities of the ligand for the protein and membrane are equal ( ) W P W Lb P P K K → → = , and that in the absence of ligand the volume of protein available is slightly higher than that of the membrane. However, the membrane becomes the dominant binding agent for ligand concentrations higher than 9 µM.

Membranes Containing Lipids and Membrane Proteins
In this case one has to consider three possibilities for the equilibration of ligand with the membrane protein: from the aqueous phase only (Case II), from the membrane only (Case III), or from both (Case IV).
The equations that describe the distribution of ligand at equilibrium for case II are the same as those for soluble proteins, Equations (38) and (39). All the observations from If ligand binding to the protein occurs only after ligand association with the membrane (Case III), the equilibrium distribution of the ligand may be calculated from Equations (40) The results obtained for a moderate relative affinity between the membrane and the protein K Lb→P P = 10 are shown in Figure 15, at affinities between the aqueous medium and the membrane, K W→Lb P , increasing from 10 2 (plot A) to 10 4 (plot C). In contrast to what is observed for proteins in the aqueous medium (Figure 14), increasing the affinity of the ligand for the lipid bilayer leads to an apparent increase in the ligand affinity for the protein. This result was expected because it is the ligand in the membrane that is able to interact with the protein. However, a closer analysis of Figure 15 shows that this increased affinity for the protein is only significant for low ligand concentrations. As the protein-binding sites become saturated, most ligand accumulates in the membrane despite the lower intrinsic affinity. of the ligand for the lipid bilayer leads to an apparent increase in the ligand affinity for the protein. This result was expected because it is the ligand in the membrane that is able to interact with the protein. However, a closer analysis of Figure 15 shows that this increased affinity for the protein is only significant for low ligand concentrations. As the protein-binding sites become saturated, most ligand accumulates in the membrane despite the lower intrinsic affinity. Figure 15. Distribution of a ligand between the aqueous media, the lipid bilayer and the protein in a membrane composed of a lipid bilayer with embedded membrane proteins. The protein and lipid concentrations are the same as in Figure 14, leading to 56% of the membrane being occupied by protein and 44% by the lipid bilayer. The partition coefficient between the lipid bilayer and the protein is the same in all plots ( Lb P Even though this model assumes that the ligand cannot associate with the membrane protein directly from the aqueous medium, this step closes a thermodynamic circle and Even though this model assumes that the ligand cannot associate with the membrane protein directly from the aqueous medium, this step closes a thermodynamic circle and thus the apparent partition coefficient may be calculated from the other two equilibria, Equation (42).
Due to the thermodynamic cycle involving the ligand in the aqueous medium and associated with the lipid bilayer or with the protein, Case IV in Figure 13 cannot be formally distinguished from Case III in what concerns the equilibrium distribution of the ligand. In fact, Case II is also formally indistinguishable from Case III and IV, Equations (40) and (41) being obtained if K W→P P in Equations (38) and (39) is replaced by its relation with K W→Lb P and K Lb→P P given by Equation (42). Another mechanism could additionally be considered, with the ligand interaction with the lipid bilayer being mediated by interaction with the membrane protein, which is also described by the same formalism. This situation was not explicitly considered because interaction with the lipid membrane is usually non-specific, being observed for the vast majority of ligands with moderate and high hydrophobicity.
The formalism developed in this section may be of particular relevance for the characterization of ligand association with membrane proteins. It provides a simple method to quantitatively compare the ligand affinity for the lipid bilayer and for the proteins, and to evaluate the effect of ligand concentration and relative concentration of the membrane proteins. Those aspects will be further explored in a dedicated manuscript. For now, we will just define another important experimental observable, the partition coefficient between the aqueous medium and the membrane as a whole, as shown in Equation (43).
Thus, the characterization of the partition coefficient at low ligand concentrations in the presence and absence of the membrane protein, allows obtaining the water to protein and the lipid bilayer to protein partition coefficients. This may be of high relevance when studying interactions with biological membranes.
The situations discussed above correspond to different mechanisms of signal transduction and ligand transport through biological membranes with impact in the areas of biochemistry, pharmacology, neurochemistry and endocrinology. Distinction between the mechanisms of interaction with the membrane protein has been attempted, namely on the basis of the dependence on ligand hydrophobicity [83,84]. However, an increase in ligand lipophilicity would lead to an increased apparent affinity for the membrane protein in all cases, provided that the relative affinity for the lipid bilayer and the protein is maintained. The mechanisms may eventually be distinguished if the kinetics of the interaction is accessed. If the ligand in the aqueous medium can only interact with the membrane, a relative accumulation of ligand in the membrane is expected at short times followed by its redistribution to the membrane protein. The reverse is expected if the ligand in the aqueous medium can only interact with the membrane protein. If direct interaction is established with both binding agents, the time evolution in each medium would depend on the relative rate and equilibrium constants. Funding: This research was funded by the European Regional Development Fund, through COMPE TE2020-Operational Program for Competitiveness and Internationalization, and Portuguese funds via FCT-Fundação para a Ciência e a Tecnologia, under projects UIDB/04539/2020, UIDP/04539/2020, UIDB/00313/2020 and UIDP/00313/2020 (Portugal).