FINCHES: A Computational Framework for Predicting Intermolecular Interactions in Intrinsically Disordered Proteins

Niazi, Sarfaraz K.

doi:10.3390/ijms26136246

Open AccessReview

FINCHES: A Computational Framework for Predicting Intermolecular Interactions in Intrinsically Disordered Proteins

by

Sarfaraz K. Niazi

College of Pharmacy, University of Illinois, Chicago, IL 60612, USA

Int. J. Mol. Sci. 2025, 26(13), 6246; https://doi.org/10.3390/ijms26136246

Submission received: 12 June 2025 / Revised: 23 June 2025 / Accepted: 27 June 2025 / Published: 28 June 2025

(This article belongs to the Section Molecular Biology)

Download

Browse Figures

Versions Notes

Abstract

This comprehensive review examines FINCHES (Force field-based Interaction Network for Characterizing Heterotypic and Entropic Sequences). This groundbreaking computational framework enables the rapid, sequence-based prediction of intermolecular interactions in intrinsically disordered regions (IDRs) without the need for molecular simulations. The document provides detailed comparisons with other computational methods, including their mathematical foundations, specific applications, and experimental validations. We explore both the potential for advancing our understanding of disordered protein function and the inherent challenges in computationally modeling these dynamic biological systems. Additionally, we discuss computational assessment tools for interface prediction in molecular complexes, providing a comprehensive framework for evaluating IDR interaction predictions.

Keywords:

FINCHES; intrinsic disorder; proteins; intermolecular interaction; force field; molecular dynamics; AWSEM

1. Introduction

Intrinsically disordered regions (IDRs) are prevalent in over 70% of human proteins and play crucial roles in cellular processes despite lacking stable three-dimensional structures [1,2]. Understanding how these regions mediate specific intermolecular interactions has been challenging due to their dynamic nature and the complexity of their interaction landscapes [3]. This review examines FINCHES (Force field-based Interaction Network for Characterizing Heterotypic and Entropic Sequences). This recently developed computational framework enables rapid, sequence-based prediction of IDR-mediated interactions without requiring molecular simulations [4]. We discuss the methodology, applications, and limitations of this approach alongside detailed comparisons with other computational methods, including their mathematical foundations, specific applications, and experimental validations [5,6,7]. We highlight both the potential for advancing our understanding of disordered protein function and the inherent challenges in computationally modeling dynamic biological systems [8,9].

Intrinsically disordered proteins and protein regions represent a paradigm shift in our understanding of protein structure–function relationships [1,2]. Unlike globular proteins that fold into stable three-dimensional structures, IDRs exist as dynamic ensembles of conformations, yet they are capable of mediating highly specific biological interactions [1,10]. These regions are particularly enriched in transcription factors, signaling proteins, and other regulatory molecules, where they facilitate complex formation, phase separation, and allosteric regulation [2,11]. The challenge of predicting how these dynamic regions interact with cellular partners has hindered our ability to understand their functional mechanisms and design therapeutic interventions [12,13].

Traditional computational approaches for studying protein interactions rely heavily on molecular dynamics simulations, which, while accurate, are computationally expensive and become prohibitively slow for larger systems or proteome-scale analyses [14,15,16]. The development of coarse-grained force fields specifically parameterized for IDRs has improved the speed and accuracy of such simulations. Still, the need for alternative approaches that can rapidly screen large numbers of sequences and conditions remains pressing [17,18]. FINCHES addresses this need by repurposing established force field equations to make analytical predictions about intermolecular interactions without performing explicit simulations [4].

2. Computational Approaches for IDR Interaction Prediction

The prediction of IDR interactions has become a critical challenge in computational biology, leading to the development of diverse methodological approaches [19,20]. These can be broadly categorized into physics-based methods, machine-learning approaches, hybrid techniques, and specialized tools for phase separation prediction [21,22,23]. Each approach offers unique advantages and limitations, making them suitable for different applications and research questions [24,25].

Physics-Based Approaches

Molecular dynamics simulations remain the gold standard for detailed studies of protein interactions, providing atomistic resolution of interaction mechanisms and dynamics [14,15]. Figure 1 illustrates the fundamental components of molecular dynamics simulations, showing how force field equations govern protein behavior. The fundamental equations governing molecular dynamics (MD) simulations are based on Newton’s equations of motion, where the force F acting on each atom equals mass times acceleration (F = ma) [26]. The force field potential typically includes bonded and non-bonded terms, with the electrostatic component being crucial for IDR studies and calculated using Coulomb’s law: U_elec = k_e × q_i × q_j/r_ij, where k_e represents Coulomb’s constant, q_i and q_j are partial charges on atoms i and j, and r_ij is the distance between them.

All-atom molecular dynamics (MD) simulations can capture the full complexity of IDR behavior, including conformational changes upon binding and the role of water and ions in mediating these interactions [27,28]. However, these simulations are computationally intensive, limiting their application to small systems and short timescales [29]. The timescales accessible to conventional molecular dynamics (MD) simulations (typically microseconds) often fall short of the millisecond-to-second timescales relevant for many biological processes involving intrinsically disordered regions (IDRs) [30].

Shaw et al. [31] utilized the Anton supercomputer to conduct millisecond-scale simulations of small IDR peptides, thereby revealing the role of transient secondary structures in binding. Rauscher et al. [32] investigated the folding mechanism of the p53 transactivation domain upon binding using enhanced sampling molecular dynamics (MD), demonstrating how disorder-to-order transitions facilitate specific recognition. More recently, Palazzesi et al. [33] employed all-atom molecular dynamics (MD) to investigate the phase separation of the FUS protein, revealing that transient π–π interactions between tyrosine residues drive droplet formation.

Enhanced sampling methods, such as replica exchange molecular dynamics (REMD) and metadynamics, have been developed to overcome some of these limitations by improving conformational sampling efficiency [32,33]. REMD uses the exchange principle, where configurations at different temperatures are periodically exchanged based on the Metropolis criterion, allowing the system to overcome energy barriers more effectively [34].

The Associative Memory Water-Mediated Structure and Energy Model (AWSEM) represents a sophisticated, coarse-grained approach that combines knowledge-based potentials with physical principles [35,36]. Figure 2 demonstrates the energy components within the AWSEM framework. The total energy function in AWSEM includes multiple terms representing different aspects of protein energetics: E_total = E_backbone + E_contact + E_burial + E_hydrogen + E_water + E_fragment, where E_backbone accounts for local backbone geometry, E_contact represents residue–residue contact, E_burial accounts for hydrophobic burial, E_hydrogen represents hydrogen bonding, E_water accounts for water-mediated interactions, and E_fragment incorporates local structure biasing from the Protein Data Bank [37].

Chen et al. [38] utilized the AWSEM to investigate the aggregation of α-synuclein, elucidating how specific sequence regions facilitate fibril formation. The simulations accurately predicted the experimental observation that specific mutations can either enhance or suppress the aggregation propensity [39]. Zheng et al. [40] applied AWSEM to investigate the liquid–liquid phase separation of RNA-binding proteins, demonstrating that the model can reproduce experimental phase diagrams when calibrated appropriately.

The CALVADOS force field represents a significant advancement in IDR simulation, specifically designed to reproduce experimental phase behavior [17,41]. Figure 3 illustrates the coarse-grained representation employed by CALVADOS, where each amino acid is represented by a single bead positioned at the Cα atom. The non-bonded energy in CALVADOS combines Lennard–Jones and electrostatic interactions: U_nb = 4ε[(σ/r)¹² − (σ/r)⁶] + k_e × q_i × q_j × exp (−κr)/r, where ε and σ are Lennard–Jones parameters specific to each amino acid pair, and κ is the Debye screening parameter. Other variables are as defined previously [42].

Tesei et al. [17] validated CALVADOS against experimental data for over 20 different IDR systems, achieving quantitative agreement with the radius of gyration measurements and phase separation temperatures. Krainer et al. [43] utilized CALVADOS to investigate heterotypic interactions between distinct RNA-binding proteins, demonstrating how sequence complementarity drives co-phase separation. The simulations correctly predicted that hnRNPA1 and FUS form mixed condensates, while hnRNPA1 and TDP-43 show limited mixing, consistent with experimental observations [44].

The Mpipi force field employs a minimal, coarse-grained representation, where each amino acid is represented by a single bead [18]. Figure 4 shows the schematic representation of the Mpipi approach. The total energy includes electrostatic and Wang–Frenkel terms: U_total = U_elec + U_WF, where U_elec follows Debye–Hückel screening and U_WF accounts for steric and hydrophobic interactions through a modified Lennard–Jones potential with amino acid-specific parameters [45].

Emenecker et al. [46] employed Mpipi simulations to investigate the relationship between sequence composition and single-chain properties for hundreds of intrinsically disordered protein (IDP) sequences. The force field accurately reproduced the experimental radius of gyration measurements across diverse sequence types [47]. Ginell et al. [48] applied Mpipi to study the effects of post-translational modifications on intrinsically disordered region (IDR) conformations, demonstrating how phosphorylation-induced charge changes alter protein compaction.

Field-theoretic methods treat proteins as polymers in solution and solve the polymer field theory equations directly [49,50]. Figure 5 illustrates how these approaches model protein systems as continuous fields. The key equation is the self-consistent field theory (SCFT) functional: F[φ] = ∫dr[f_loc(φ(r)) + K/2||∇φ(r)||²], where φ(r) is the local density field, f_loc is the local free energy density, K is the gradient penalty parameter, and the integral is over all space. The Flory–Huggins interaction parameter χ_ij describes interactions between components i and j [51,52].

McCarty et al. [53] employed field-theoretic simulations to generate comprehensive phase diagrams for model IDR sequences, revealing how charge patterning influences critical temperatures and concentrations. The approach successfully predicted experimental observations that a uniform charge distribution leads to stronger phase separation than a randomly distributed charge [54]. Danielsen et al. [55] applied SCFT to study the effect of salt concentration on phase behavior, showing quantitative agreement with experimental measurements for several RNA-binding proteins.

3. Machine Learning Approaches

The PSPredictor represents one of the first successful applications of machine learning to phase separation prediction [56]. Figure 6 shows the architecture and feature representation used in PSPredictor. The method uses a support vector machine with a radial basis function kernel: K(x_i, x_j) = exp(−γ||x_i − x_j||²), where γ is the kernel parameter that controls the kernel width. The feature vector includes amino acid composition (20 features), dipeptide composition (400 features), charge distribution parameters (including fraction of charged residues (FCR), net charge per residue (NCPR), and charge asymmetry parameter (κ)), hydropathy moment calculations, and secondary structure propensities.

Orlando et al. [56] trained PSPredictor on a dataset of 239 experimentally validated phase-separating proteins and 672 negative controls. The method achieved 85% accuracy in cross-validation and successfully predicted phase separation for several previously uncharacterized proteins [57]. Validation studies showed that TIA1, G3BP1, and several other stress granule proteins were correctly identified as phase-separating, while most globular proteins were correctly classified as non-phase-separating [58].

4. Deep Learning Approaches

Several deep learning architectures have been applied to IDR analysis [59,60]. PSP (Phase Separation Predictor) utilizes a convolutional neural network with a specifically designed architecture for protein sequence analysis [61]. Figure 7 illustrates the deep learning architecture employed for phase separation prediction. The input layer uses one-hot encoding, where each amino acid is represented as a 20-dimensional binary vector or learned embeddings, typically 50–100-dimensional dense vectors that capture the physicochemical similarities of amino acids [62]. The convolutional layers employ multiple filter sizes (three, five, and seven residues) to capture local sequence patterns at different scales, with ReLU activation functions applied as f(x) = max(0,x). Max pooling layers with a stride of 2 reduce dimensionality while preserving important features. The architecture comprises two fully connected layers with 128 and 64 neurons, respectively, utilizing dropout regularization with a probability of 0.5 to prevent overfitting. The output layer contains a single neuron with sigmoid activation for binary classification: P(phase_sep) = 1/(1 + e^−z), where z is the pre-activation output.

Performance comparisons revealed that one-hot encoding achieved 85% accuracy, while learned embeddings improved performance to 88% accuracy by better capturing the physicochemical similarities of amino acids [63]. The learned embeddings allow the model to understand that chemically similar amino acids (such as leucine and isoleucine) should have identical representations, leading to better generalization across diverse protein sequences.

Mészáros et al. [63] demonstrated that PSP achieves 88% accuracy on a balanced dataset of phase-separating and non-phase-separating proteins. The method correctly identified 23 out of 25 experimentally validated stress granule proteins and exhibited low false-positive rates on globular protein datasets [64]. Feature analysis revealed that the model learned to focus on regions with high aromatic content and specific charge patterns [65].

5. Protein Language Model Applications

Recent advances in protein language models have been applied to IDR analysis [66,67]. Figure 8 demonstrates how protein language models generate sequence embeddings through transformer architectures. ProtT5 and ESM-1b embeddings capture complex sequence relationships through transformer architectures that employ self-attention mechanisms: Attention(Q, K, V) = softmax(QK^T/√d_k)V, where Q, K, and V are query, key, and value matrices derived from the input sequence representations.

The fine-tuning process for these models involves several carefully orchestrated steps. First, embeddings are extracted from pre-trained models, with ESM-1b providing 1024-dimensional representations for each residue. A task-specific classification head is then added, typically consisting of two layers that reduce dimensionality from 1024 to 256 to 64 neurons, followed by a final classification layer with 2 neurons for binary prediction. Training employs the Adam optimizer with a learning rate of 2 × 10⁻⁵ and a batch size of 32, often using gradient accumulation to effectively achieve larger batch sizes when memory constraints limit the direct use of larger batches. Early stopping is applied based on validation loss to prevent overfitting and ensure optimal generalization performance. Input sequences are tokenized using model-specific vocabularies and padded to a maximum length of 1024 residues to accommodate the longest proteins in typical datasets while maintaining computational efficiency.

Raimondi et al. [68] used ESM-1b embeddings to predict phase separation, achieving 91% accuracy by combining pre-trained representations with task-specific fine-tuning. The approach successfully predicted that several viral proteins (SARS-CoV-2 nucleocapsid, Ebola VP30) undergo phase separation, which was subsequently confirmed experimentally. The experimental validation employed multiple complementary techniques: turbidity measurements at 350 nm to detect droplet formation and quantify the propensity for phase separation, fluorescence recovery after photobleaching (FRAP) to assess droplet dynamics and internal rearrangement rates, and differential interference contrast (DIC) microscopy to visualize droplet morphology and confirm liquid-like behavior [69].

PLAAC utilizes hidden Markov models specifically designed to identify prion-like domains [70]. Figure 9 illustrates the PLAAC modeling approach, featuring its hidden Markov model architecture. The model defines states representing different amino acid preferences, with a core state that favors prion-like residues such as asparagine, glutamine, serine, and tyrosine, a boundary state with intermediate preferences that allow for transitions between core and background regions, and a background state representing general amino acid composition typical of non-prion domains. State transition probabilities are learned from known prion proteins, with emission probabilities specific to each state optimized for prion domain detection based on the amino acid preferences observed in experimentally validated prion-forming proteins.

Lancaster et al. [70] applied PLAAC to analyze over 200 yeast proteins, identifying 24 proteins with significant prion-like domains. Experimental validation confirmed that 19 of these proteins could form amyloid-like aggregates in vitro, demonstrating the predictive power of the computational approach [71]. Subsequent studies utilized PLAAC to identify prion-like domains in RNA-binding proteins associated with ALS and frontotemporal dementia, revealing potential therapeutic targets and highlighting the pathological relevance of prion-like aggregation in neurodegenerative diseases [72,73].

6. FuzDrop: Integrated Disorder and Droplet Prediction

FuzDrop combines multiple computational approaches to predict droplet-forming regions [74,75]. Figure 10 illustrates the integrated prediction system employed by FuzDrop. The method integrates disorder prediction using IUPred2A scores to identify disordered regions that are prerequisites for droplet formation, compositional analysis that calculates amino acid composition biases toward low-complexity sequences characteristic of phase-separating proteins, and linear motif identification that searches for known linear motifs using regular expressions and position weight matrices derived from experimentally characterized droplet-forming proteins [76].

The determination of sensitivity and specificity thresholds employed rigorous ROC curve analysis on a carefully curated training set of 150 validated droplet-forming proteins. The optimal threshold (score > 0.6) was selected to maximize the F1 score, which is calculated as F1 = 2 × (precision × recall)/(precision + recall), resulting in 92% sensitivity and 73% specificity. This threshold selection process involved the systematic evaluation of multiple cutoff values to identify the optimal balance between sensitivity and specificity for practical applications. Proteins with scores between 0.5 and 0.7 represent borderline cases, where experimental validation is particularly valuable, as these may indicate context-dependent droplet formation or proteins that require specific cofactors for phase separation. Examples of borderline cases include proteins that form droplets only under particular stress conditions or in the presence of RNA cofactors.

Statistical analysis of the correlation between predicted scores and experimental critical concentrations employed Spearman correlation analysis, yielding a correlation coefficient of ρ = 0.67 with p < 0.001, indicating a significant but moderate correlation. This correlation suggests that while FuzDrop scores correlate with experimental phase separation propensity, additional factors such as protein concentration, temperature, ionic strength, and cellular environment significantly influence critical concentrations and must be considered when interpreting predictions.

Erdős and Dosztányi [74] tested FuzDrop on 246 experimentally validated droplet-forming proteins, achieving 92% sensitivity and 73% specificity. The method correctly identified droplet-forming regions in p53, c-Myc, and several RNA-binding proteins [77]. Comparisons with experimental data showed a strong correlation between predicted scores and experimentally determined critical concentrations [78].

7. catGranule: Machine Learning for Stress Granule Proteins

catGranule uses gradient boosting to predict stress granule localization [79]. Figure 11 demonstrates the gradient-boosting ensemble approach. The ensemble method combines multiple weak learners (decision trees) according to the formula: F(x) = Σ(i = 1 to M) α_i × h_i(x), where h_i are weak learners, α_i are weights determined during training through the boosting algorithm, and M is the number of boosting iterations. Each decision tree learns from the residuals of previous predictions, progressively improving the overall model performance by focusing on examples that were previously misclassified.

Klim et al. [79] achieved 89% accuracy on a dataset of 570 stress granule proteins and 1140 controls. The method correctly predicted stress granule localization for TIA1, G3BP1, and CAPRIN1 and identified novel candidates that were subsequently validated experimentally through fluorescence microscopy and immunofluorescence staining [80].

8. LLPSDB and Database-Driven Approaches

The Liquid–Liquid Phase Separation Database (LLPSDB) has enabled systematic computational approaches [81]. Figure 12 shows the database-driven modeling approach that leverages curated experimental data. The database contains 1522 phase-separating proteins from 31 species, including experimental conditions such as temperature, pH, and salt concentration ranges that enable phase separation. It also provides structural and functional annotations that offer context for understanding phase separation mechanisms, as well as comprehensive literature references that facilitate detailed validation of predictions. Database-driven predictors use this curated information to train more robust models that can generalize across diverse protein families and experimental conditions [82]. The LLPS-Pred ensemble method combines multiple algorithms (Support Vector Machines, Random Forest, and Neural Networks) with weighted voting: P_final = Σ(i = 1 to N) w_i × P_i, where w_i are algorithm weights learned during training to optimize overall prediction accuracy and P_i are the individual algorithm predictions.

9. Physics-Informed Machine Learning

CADMOS combines coarse-grained simulations with neural networks to learn effective potentials [83,84,85]. Figure 13 illustrates the CADMOS physics-enhanced prediction system. The approach uses a physics-informed loss function that incorporates thermodynamic constraints: L_total = L_data + λ × L_physics, where L_data represents the standard data fitting loss that ensures agreement with experimental observations, L_physics ensures predictions satisfy known thermodynamic relationships such as detailed balance and thermodynamic consistency, and λ balances the contributions of both terms to achieve optimal performance (Figure 13).

Maristany et al. [83] applied CADMOS to study FUS phase separation, achieving 95% accuracy in predicting experimental phase diagrams while being 100× faster than traditional simulations. The method revealed that transient contact formation, which is not captured by mean-field theories, is crucial for accurately predicting phase behavior [86].

10. Computational Assessment of Interface Prediction

Recent developments in computational assessment tools for interface prediction in molecular complexes provide valuable frameworks for evaluating IDR interaction predictions. These tools are particularly relevant for benchmarking methods like FINCHES against experimental structural data when available, giving standardized metrics for comparing different computational approaches.

DockQv2 represents an advanced scoring system for evaluating the quality of predicted protein–protein interfaces [87]. The method combines multiple geometric and energetic criteria to assess interface accuracy, making it suitable for evaluating FINCHES predictions when experimental complex structures are available. The scoring function incorporates interface contact accuracy by comparing predicted and experimental residue–residue contact, geometric complementarity through shape correlation functions, and energetic favorability by assessing the thermodynamic stability of predicted interfaces. This comprehensive assessment approach provides multiple perspectives on prediction quality, enabling the identification of specific aspects that require improvement.

I-INF provides specialized metrics for assessing interface prediction accuracy, particularly valuable for evaluating the spatial accuracy of predicted interaction regions in IDR complexes [88]. The method focuses on identifying correctly predicted interface residues and quantifying the accuracy of predicted interaction surfaces using precision and recall metrics specifically designed for interface evaluation. This tool is particularly useful for evaluating the spatial resolution of predictions and pinpointing regions where computational methods excel or falter.

I-RMSD offers complementary interface assessment capabilities, focusing on the geometric accuracy of predicted binding sites [89]. This tool calculates root-mean-square deviations between predicted and experimental interface geometries, providing quantitative measures of structural accuracy that can be directly compared across different prediction methods. The approach is particularly valuable for assessing the geometric fidelity of predicted complexes and identifying systematic biases in computational approaches.

These assessment tools could be incorporated into FINCHES benchmarking protocols to provide standardized evaluation metrics for interface prediction accuracy, particularly when validating predictions against available experimental structures of IDR-containing complexes. Such integration would enable the systematic comparison of different computational approaches and provide confidence measures for predictions in the absence of experimental validation. The implementation of these tools would involve establishing standardized protocols for structure preparation, interface definition, and scoring interpretation that ensure consistent and meaningful comparisons across different studies and research groups.

11. FINCHES Methodology and Theoretical Foundation

FINCHES operates on the principle that the chemical physics underlying molecular force fields can be leveraged to predict intermolecular interactions analytically [4]. Figure 14 illustrates the comprehensive FINCHES computational framework, showing the sequence-to-interaction pipeline that processes input sequences through force field calculations. The framework uses established force fields, primarily Mpipi-GG and CALVADOS2, which have been extensively validated for describing IDR behavior [17,18]. The fundamental approach involves integrating force field functions over relevant distance ranges to extract effective interaction strengths that quantify the overall attractive or repulsive nature of interactions between residue types. This integration yields a mean-field parameter that captures the essential thermodynamics of protein interactions without requiring explicit simulations, enabling the rapid screening of large numbers of sequences and conditions (Figure 14).

11.1. Force Field Implementation and Selection Rationale

The framework implements two primary force fields with distinct advantages and specific application domains [90]. The Mpipi-GG force field combines electrostatic interactions calculated using Coulomb’s law with Debye–Hückel screening and Wang–Frenkel contributions that capture steric and hydrophobic effects. The electrostatic component includes salt-dependent screening effects, making it particularly suitable for studying how ionic strength influences IDR interactions [91,92]. This force field is preferentially applied to salt-dependent analyses within the ionic strength range of 0.01–1.0 M to systems with significant charge–charge interactions where electrostatic effects dominate the interaction landscape, and to studies focusing specifically on electrostatic contributions to binding, where a detailed understanding of charge-mediated interactions is required.

CALVADOS2 uses a different functional form but captures similar physical principles, employing temperature-dependent dielectric constants and Yukawa potentials for electrostatic screening [17]. This force field is preferentially applied to temperature-dependent studies within the physiological range of 285–310 K, phase separation investigations where accurate reproduction of experimental phase diagrams is critical, systems where hydrophobic interactions dominate the interaction landscape, and multi-component mixture analyses where multiple protein species interact simultaneously. The temperature dependence in CALVADOS2 makes it particularly suitable for studying thermal stability and temperature-induced phase transitions.

The selection between force fields follows specific guidelines based on experimental conditions and system characteristics. CALVADOS2 is recommended when experimental data indicates strong temperature dependence or when studying phase separation phenomena where accurate, critical temperature prediction is essential. Mpipi-GG is preferred for salt titration studies or when electrostatic interactions are expected to dominate based on sequence composition (high charge content) or experimental observations (strong ionic strength dependence). For systems where both temperature and salt effects are significant, a comparative analysis using both force fields can provide valuable insights into the dominant interaction mechanisms.

11.2. Sequence Context Corrections

A critical innovation in FINCHES is its implementation of sequence context corrections that account for local chemical environments [93]. These corrections recognize that the chemical environment of individual residues significantly influences their interaction propensities, moving beyond simple pairwise additivity assumptions that characterize many earlier approaches.

The charge weighting correction addresses the observation that clusters of like-charged residues exhibit reduced repulsion due to charge regulation effects and potential reorientation of side chains [94]. This correction is based on the physical understanding that high local charge density can lead to charge regulation through various mechanisms, including side chain reorientation, local pH shifts, and screening by mobile ions or polar groups.

For illustrative purposes, consider a sequence fragment “KKKDDD,” where the correction is calculated through the following systematic steps. The local fraction of charged residues (FCR) is calculated as the number of charged residues divided by the total number of residues in the window: FCR = 6/6 = 1.0 since all residues in this fragment are charged. The local net charge per residue (NCPR) is calculated as the absolute value of the net charge divided by the total number of residues: NCPR = |3−3|/6 = 0.0 since the net charge is zero (three positive lysines and three negative aspartates). The weighting factor is then applied as |NCPR/FCR| = |0.0/1.0| = 0.0, which effectively reduces repulsion between like charges in this balanced charged cluster, reflecting the physical reality that charge regulation can minimize electrostatic penalties in such arrangements through local charge neutralization and reorganization.

The current implementation uses static charge estimates based on physiological pH (7.4), assigning standard charges to ionizable residues (lysine and arginine +1, aspartate and glutamate −1, and histidine +0.1 to account for partial protonation). While pH-dependent protonation and deprotonation effects are not explicitly modeled through dynamic pKa calculations, the charge weighting correction partially accounts for charge regulation effects that occur when like-charged residues cluster together, providing a reasonable approximation for most physiological conditions where pH variations are typically modest.

The aliphatic weighting correction recognizes that hydrophobic residues require sufficient local density to form effective hydrophobic interfaces [95]. This correction is based on the physical principle that hydrophobic interactions become more favorable when multiple hydrophobic residues can cooperatively exclude water and form “dry” interaction surfaces. Aliphatic residues are classified into three categories based on local clustering within a window of seven residues centered on the target residue. Isolated hydrophobic residues, defined as having fewer than two hydrophobic neighbors in the window, receive no correction (1.0× multiplier) since they cannot form effective hydrophobic patches. Clustered hydrophobic residues with 2–3 hydrophobic residues in the window receive enhanced attraction (1.5× multiplier) to account for cooperative hydrophobic effects. Highly clustered hydrophobic residues with four or more hydrophobic residues in the window receive maximum enhancement (3.0× multiplier) to reflect the strong cooperative nature of extensive hydrophobic interactions.

The classification includes alanine, isoleucine, leucine, valine, phenylalanine, tryptophan, tyrosine, and methionine as hydrophobic residues based on their tendency to partition into hydrophobic environments and exclude water. The counting procedure includes the central residue in the cluster assessment, ensuring that the correction reflects the actual local hydrophobic environment experienced by each residue.

11.3. Mean-Field Calculation

The core calculation in FINCHES involves building a raw interaction matrix and processing it to obtain the final mean–field interaction parameter [96]. This process begins with the calculation of pairwise interaction energies between all residue types using the selected force field. The force field function returns the instantaneous potential energy (in kJ/mol) associated with specific inter-residue distances. To calculate inter-residue interaction parameters (ε), FINCHES integrates the force field function over all relevant distances, typically from the van der Waals contact distance to several times the Debye screening length for electrostatic interactions.

This integration yields a mean-field parameter that quantifies the overall attractive or repulsive nature of the interaction between residue types i and j. The integration limits are set based on the van der Waals radii (σ) of the interacting residues, ensuring that the calculation focuses on the relevant interaction range where meaningful contacts can occur. For electrostatic interactions, the upper integration limit is determined by the Debye screening length, beyond which electrostatic interactions become negligible due to the screening effect of mobile ions.

The resulting interaction matrix contains mean-field parameters for all 400 possible amino acid pairs (20 × 20), providing a comprehensive description of interaction preferences across the entire amino acid alphabet. These parameters are then used to calculate interaction energies for specific protein sequences by summing contributions from all relevant residue pairs, weighted by their sequence separation and local context corrections.

12. Comparative Analysis and Validation

A comprehensive comparison of computational methods reveals distinct strengths and limitations across different approaches, as summarized in Table 1. FINCHES demonstrates exceptional speed for variant analysis, requiring only 1 s per variant, while achieving a correlation coefficient of r = 0.91 for critical temperature prediction in FUS LCD systems. This combination of speed and accuracy makes it particularly valuable for high-throughput screening applications where large numbers of sequence variants must be evaluated rapidly. CALVADOS achieves slightly better accuracy (r = 0.89) for phase boundary prediction but requires 2 h per system, representing a significant computational burden for large-scale studies. All-atom molecular dynamics provides the highest structural accuracy (RMSD = 2.1 Å from experimental NMR data), but it demands 5 days per trajectory, making it impractical for large-scale studies. However, it is invaluable for a detailed mechanistic understanding of specific systems.

12.1. FUS Low-Complexity Domain

The FUS low-complexity domain has become a benchmark system for validating computational approaches due to its well-characterized phase separation behavior and availability of extensive experimental data [97]. This 163-residue region (amino acids 1–163) is rich in glycine, serine, glutamine, and tyrosine residues and readily undergoes liquid–liquid phase separation under physiological conditions [98]. The domain serves as an excellent test case for computational methods because its phase behavior can be precisely controlled through sequence modifications and environmental conditions.

FINCHES predicted that the wild-type FUS LCD has a strong, attractive interaction parameter (ε) value of −15.2 kJ/mol at physiological conditions [4]. This prediction captures the thermodynamic driving force for self-association that leads to phase separation. The framework correctly predicted that tyrosine-to-serine mutations would eliminate phase separation, with the ε value shifting to +8.7 kJ/mol for the 27Y→S variant [99]. This dramatic change from attractive to repulsive interactions reflects the loss of π–π interactions between tyrosine residues that are essential for FUS phase separation. Spatial interaction maps (intermaps) revealed that tyrosine-rich regions at positions 33–42 and 108–125 drive the strongest attractive interactions, providing molecular-level insights into the sequence determinants of phase separation [100].

Tesei et al. [17] performed microsecond-scale CALVADOS simulations of FUS LCD, reproducing experimental phase diagrams with quantitative accuracy. The simulations revealed that phase separation occurs through the formation of dynamic clusters stabilized by π–π interactions between tyrosine residues [101]. Critical temperatures were predicted within 2 K of experimental values, demonstrating the high accuracy achievable with properly parameterized coarse-grained models [102]. However, these simulations required substantial computational resources and expertise in molecular simulation techniques.

Palazzesi et al. [33] employed all-atom molecular dynamics (MD) to investigate FUS LCD phase separation, necessitating aggregate simulation times exceeding 100 μs across multiple systems. The simulations revealed detailed mechanisms of droplet formation, showing that tyrosine residues form transient π-stacks that nucleate liquid droplets [103]. However, the computational cost limited analysis to small systems (8–16 protein copies), preventing the direct simulation of macroscopic phase separation [104].

PSPredictor correctly classified FUS LCD as phase-separating with a confidence score of 0.94 [56]. The high score was attributed to the combination of low-complexity sequence composition and high aromatic content, which characterizes many phase-separating proteins [105]. However, the method cannot predict the effects of specific mutations without retraining on new data, limiting its utility for protein design applications [106].

12.2. DDX4 N-Terminal Domain

The DDX4 N-terminal domain (residues 1–113) contains a mixed charge distribution and has been extensively studied as a model system for understanding charge effects in phase separation [107,108]. This domain is particularly valuable for testing computational methods because its phase behavior can be dramatically altered by redistributing charges without changing the overall sequence composition, providing a stringent test of the methods’ ability to capture sequence-specific effects.

FINCHES analysis revealed that DDX4-NTD has a moderately attractive ε value of −8.1 kJ/mol [4]. The charge-shuffled variant, which redistributes the exact charges more evenly throughout the sequence, showed a dramatically different ε value of −18.7 kJ/mol, correctly predicting enhanced phase separation observed experimentally [109]. This prediction demonstrates FINCHES’s ability to capture the subtle effects of charge patterning on interaction strength. Spatial interaction maps identified specific regions (residues 20–35 and 65–80) as primary drivers of attractive interactions, guiding experimental mutagenesis studies [110].

Lin and Chan [111] used polymer field theory to study charge patterning effects in DDX4-NTD. The approach correctly predicted that charge segregation enhances phase separation, with calculated critical temperatures within 1 of experimental values [112]. The field-theoretic approach required 30 min of computation time compared to 1 s for FINCHES, highlighting the speed advantages of the analytical approach [113].

Brady et al. [114] performed detailed experimental characterization of DDX4-NTD phase behavior using single-molecule fluorescence experiments. These studies revealed that the wild-type protein forms dynamic droplets with internal exchange on timescales of seconds, while the charge-shuffled variant forms more stable condensates with reduced dynamics [115]. These experimental observations provide crucial validation for computational predictions and highlight the importance of dynamics in understanding IDR function.

12.3. p53 Transactivation Domain

The p53 transactivation domain (TAD1, residues 1–40) represents a classic example of a functional intrinsically disordered region (IDR) that undergoes disorder-to-order transitions upon binding to its targets [116,117]. This system presents a distinct type of validation challenge for computational methods, as it involves the formation of structured complexes rather than phase separation.

Rauscher et al. [32] performed extensive REMD simulations of p53 TAD1, revealing that the free protein samples are both compact and extended conformations. Upon binding to MDM2, the protein folds into an amphipathic helix [118]. The simulations accurately reproduced NMR chemical shift data and provided mechanistic insights into the folding process upon binding [119]. These detailed simulations required substantial computational resources but provided an atomic-level understanding of the binding mechanism.

FINCHES cannot directly model disorder-to-order transitions but can successfully predict attractive interactions between p53 TAD1 and MDM2 surface residues (ε = −12.4 kJ/mol) [4]. The spatial interaction map correctly identified the hydrophobic residues F19, L22, and W23 as key interaction drivers, consistent with experimental mutagenesis studies that showed these residues are essential for MDM2 binding [120]. While FINCHES cannot predict the structural details of the complex, it successfully identifies the thermodynamic driving forces for interaction.

Specialized predictors for protein–protein interactions (e.g., ANCHOR2) correctly identified the MDM2-binding region in p53 TAD1 based on sequence features [121]. However, these methods require prior knowledge of binding partners and cannot predict novel interactions, limiting their utility for discovery applications [122].

12.4. TDP-43 Low-Complexity Domain

The TDP-43 LCD (residues 267–414) is implicated in ALS and frontotemporal dementia and contains a conserved hydrophobic region that drives pathological aggregation [123,124]. This system provides important validation for methods aimed at understanding disease-related protein aggregation.

Lancaster et al. [70] applied PLAAC to TDP-43 LCD, identifying a prion-like region spanning residues 320–366. The method correctly predicted that ALS-associated mutations in this region increase the aggregation propensity [125]. PLAAC scores for disease mutations were significantly higher than for control variants, demonstrating the method’s ability to identify pathologically relevant sequence features [126].

FINCHES predicted that wild-type TDP-43 LCD has moderately attractive interactions (ε = −6.8 kJ/mol) but identified the conserved region as a hotspot for intermolecular contacts [4]. Phosphorylation simulations demonstrated that modification of the C-terminal serines significantly reduces attractive interactions, consistent with experimental observations that phosphorylation suppresses aggregation [127]. This demonstrates FINCHES’s utility for understanding the post-translational regulation of protein interactions.

Gruijs da Silva et al. [128] utilized CALVADOS to investigate the effects of TDP-43 phosphorylation, demonstrating that 12S→D mutations abolish phase separation. The simulations revealed that phosphomimetic mutations disrupt the delicate balance between attractive and repulsive interactions required for condensate formation [129]. These studies underscore the significance of post-translational modifications in modulating IDR interactions.

12.5. Speed and Scalability Analysis

Computational performance analysis reveals significant differences in scalability across methods, as detailed in Table 2. FINCHES demonstrates linear scalability with sequence length and requires only standard CPU hardware, making it accessible for routine use in any research laboratory. The method’s computational requirements scale as O(L²) for sequence length L when considering all pairwise interactions, but the constant factor is extremely small due to the analytical nature of the calculations.

In contrast, molecular dynamics simulations exhibit an exponential scaling with system size and require specialized computing clusters, thereby limiting their application to focused studies of specific systems. CALVADOS simulations scale quadratically with the number of proteins due to the need to calculate all pairwise interactions during the simulation. In contrast, all-atom MD simulations scale even more unfavorably, owing to the larger number of atoms and shorter timesteps required for numerical stability.

The speed advantages of FINCHES become particularly apparent for large-scale studies [130]. Analyzing all 19,702 human IDRs for phosphorylation effects required only 3.2 h using FINCHES, compared to an estimated 45 years for equivalent CALVADOS simulations or 890 years for all-atom molecular dynamics (MD) studies [131]. This dramatic difference in computational requirements makes FINCHES uniquely suited for proteome-scale analyses that would be impossible with traditional simulation approaches.

Accuracy comparisons across different prediction types reveal that different methods excel in other domains, as shown in Table 3. FINCHES achieves 87% correct classification for binary phase separation prediction and a correlation coefficient of r = 0.85 for critical temperature prediction, providing good overall performance across diverse applications.

CALVADOS achieves higher accuracy for phase separation prediction (94% vs. 87%) but at the cost of significantly longer computation times. All-atom molecular dynamics (MD) provides the highest accuracy for interface prediction (89% vs. 73%) but is limited to small systems. Machine learning methods show poor transferability to new mutations (45% accuracy), highlighting the importance of physics-based approaches for protein design applications.

13. Key Outputs and Interpretations

13.1. Mean-Field Interaction Parameter (ε)

The primary output of FINCHES is the mean-field interaction parameter ε, which quantifies the overall driving force for interaction between two sequences [4]. Negative ε values indicate net attractive interactions that favor complex formation or self-association, while positive values suggest repulsive interactions that disfavor association [132]. The magnitude of ε correlates with interaction strength, enabling quantitative comparisons between different sequence pairs and conditions [133]. This parameter provides a thermodynamic measure of interaction favorability that can be directly related to experimental observables such as binding constants and critical concentrations.

For homotypic interactions (self-association), ε values can predict the propensity for phase separation [134]. The framework has successfully reproduced experimental phase diagrams for numerous IDR systems, including FUS, TDP-43, and hnRNPA1, demonstrating that sequence-based ε calculations can capture the essential thermodynamics of protein condensation [135,136,137]. The relationship between ε values and experimental phase boundaries follows theoretical predictions from polymer physics, providing confidence in the physical basis of the projections.

13.2. Intermaps: Spatial Resolution of Interactions

Beyond global ε values, FINCHES generates intermaps that provide residue-level resolution of interaction landscapes [138]. These two-dimensional heat maps show which specific regions of two interacting sequences drive attractive or repulsive interactions [139]. Intermaps are calculated using a sliding window approach, where local ε values are computed for sequence segments of defined size (typically 13–31 residues) across all possible pairwise combinations [140]. This analysis reveals the spatial organization of interaction hotspots and enables the identification of specific sequence regions responsible for mediating interactions.

Representative intermaps for several well-characterized systems show clear, attractive regions corresponding to experimentally identified interaction hotspots [141]. The DDX4 N-terminal domain exhibits attractive regions that align with experimental binding studies, while charge-shuffled variants display dramatically altered interaction patterns that correlate with altered phase separation behavior [114]. These spatial predictions enable researchers to identify specific residues or regions responsible for mediating interactions, facilitating targeted mutagenesis experiments and functional studies [142]. The high spatial resolution of intermaps makes them particularly valuable for protein design applications where specific interaction interfaces need to be optimized.

13.3. Phase Diagram Predictions

FINCHES can predict liquid–liquid phase separation behavior by combining ε values with principles of polymer physics [143]. Using the Flory–Huggins framework adapted for protein solutions, the method generates phase diagrams showing critical temperatures and concentrations for phase separation [144]. These predictions have shown remarkable agreement with experimental measurements across diverse IDR systems, correctly capturing the effects of sequence mutations, salt concentration, and temperature changes [145]. The ability to predict complete phase diagrams from sequence information alone represents a significant advance in understanding the physical basis of protein phase separation.

14. Applications and Experimental Validation

14.1. Proteome-Scale Analysis

One of FINCHES’s most powerful applications is a large-scale analysis of interaction networks [146]. The framework has been applied to analyze all human IDRs longer than 100 amino acids, revealing global patterns in interaction propensities and identifying proteins with unusual interaction characteristics [147]. This analysis revealed that proteins with highly attractive homotypic interactions are significantly underrepresented among highly abundant cellular proteins, suggesting evolutionary pressure against promiscuous aggregation [148]. This observation offers valuable insights into the evolutionary constraints that shape protein sequence evolution and the mechanisms cells employ to prevent pathological protein aggregation.

Gene ontology enrichment analysis revealed that proteins with strong, attractive interactions are enriched in RNA processing and transcriptional regulation functions, consistent with their roles in forming membrane-less organelles, such as nuclear speckles and P-granules [149]. This large-scale analysis would be computationally prohibitive using traditional simulation approaches, but is readily achievable with FINCHES [150]. The ability to perform proteome-scale analyses opens new opportunities for understanding cellular organization and identifying novel therapeutic targets.

14.2. Post-Translational Modification Effects

FINCHES has proven particularly valuable for studying how post-translational modifications alter interaction landscapes [151]. Phosphorylation analysis of 19,702 human intrinsically disordered regions (IDRs) revealed distinct patterns depending on protein function [152]. RNA-binding proteins generally showed reduced homotypic interactions upon phosphorylation, consistent with phosphorylation serving as a mechanism to dissolve RNA–protein condensates [153]. In contrast, signaling proteins often showed enhanced interactions, suggesting phosphorylation-induced complex formation [154]. These findings provide molecular-level insights into how cells utilize post-translational modifications to regulate protein interactions and cellular organization dynamically.

The framework successfully predicted experimental observations for specific systems, including the dissolution of FUS condensates upon tyrosine phosphorylation and the enhanced interactions of phosphorylated transcriptional coactivators [155]. These predictions offer mechanistic insights into how cells utilize post-translational modifications to regulate protein interactions and phase behavior [156] in a dynamic manner. The rapid calculation speed of FINCHES makes it practical to analyze the effects of multiple modification sites simultaneously, enabling a comprehensive understanding of regulatory mechanisms.

14.3. Transcription Factor-Coactivator Interactions

FINCHES has been applied to understand the molecular basis of transcription factor activation domain function [157]. By calculating interactions between transcriptional activation domains and the structured binding domains of coactivator proteins, such as Mediator, the framework identified key sequence features that drive productive interactions [158]. The analysis revealed that effective activation domains must balance attractive interactions with coactivators against repulsive homotypic interactions that prevent self-association [159]. This balance is critical for proper transcriptional regulation, as excessive self-association can lead to the formation of inactive aggregates.

The relationship between activation domain strength and calculated interaction parameters shows that strong activation domains exhibit favorable interactions with Gal11/Med15 (negative ε values) while avoiding excessive self-association [160]. This analysis provided molecular-level insights into a decades-old puzzle in transcriptional regulation and demonstrated how FINCHES can illuminate the mechanistic basis of protein function [161]. The ability to predict transcriptional activity from sequence information has important implications for understanding gene regulation and designing synthetic transcriptional circuits.

14.4. Drug Discovery and Protein Design

The rapid calculation speed of FINCHES makes it suitable for screening applications in drug discovery and protein design [162]. The framework can quickly evaluate how sequence modifications affect interaction profiles, enabling the rational design of IDR variants with desired properties [163]. Several groups have used FINCHES predictions to design proteins with enhanced or reduced phase separation propensity, validating the predictive power of the approach [164]. Representative applications and validation results demonstrate the practical utility of FINCHES predictions across diverse systems, as detailed in Table 4.

The excellent agreement between FINCHES predictions and experimental observations across diverse systems demonstrates the robustness and general applicability of the approach. The method’s ability to predict mutation effects makes it particularly valuable for protein-engineering applications where specific interaction properties need to be optimized.

15. Limitations and Critical Assessment

15.1. Fundamental Assumptions and Consequences

While FINCHES represents a significant advance in IDR interaction prediction, it operates under several assumptions that limit its applicability and accuracy [166]. The most fundamental limitation is the mean-field approximation, which treats each residue as experiencing an average chemical environment rather than the specific, dynamic environments present in real proteins [167]. This approximation necessarily smooths over important heterogeneities and cannot capture cooperative effects that emerge from specific spatial arrangements of residues [168]. Real protein interactions often involve complex networks of contacts that exhibit significant cooperativity, where the formation of one contact facilitates the formation of additional contacts in the vicinity.

The framework assumes that IDR interactions can be decomposed into pairwise residue contributions, neglecting higher-order interactions that may be crucial for specificity [169]. Real protein interactions often involve complex networks of contacts that cannot be simply summed from individual residue pair contributions [170]. For example, the formation of β-sheet structures involves cooperative hydrogen bonding patterns that cannot be captured by pairwise additivity. Additionally, the current implementation cannot predict structured complexes formed by intrinsically disordered regions (IDRs), missing an important class of functional interactions where disorder-to-order transitions occur upon binding [171].

15.2. Temporal and Dynamic Limitations

Perhaps the most significant limitation of FINCHES is its static nature [172]. IDRs are inherently dynamic, existing as rapidly interconverting ensembles of conformations on timescales from nanoseconds to milliseconds [173]. The biological relevance of any interaction depends not only on its thermodynamic favorability but also on the kinetics of complex formation and dissociation [174]. FINCHES predictions represent equilibrium thermodynamic preferences but cannot address whether predicted interactions will occur on relevant biological timescales [175]. For example, two proteins might have favorable interaction energies but never encounter each other in the cell due to spatial separation or kinetic barriers.

The cellular environment is also highly crowded and heterogeneous, with local concentrations, pH, and ionic strength varying significantly from the bulk conditions typically assumed in calculations [176]. IDRs may experience very different chemical environments when localized to specific cellular compartments or when interacting with membranes, nucleic acids, or other cellular components [177]. FINCHES cannot account for these environmental complexities, potentially leading to predictions that fail to reflect in vivo behavior [178]. For instance, the presence of RNA can dramatically alter protein phase separation behavior through changes in effective valency and interaction strength.

15.3. Force Field Limitations

The accuracy of FINCHES predictions is fundamentally limited by the quality of the underlying force fields [179]. Current IDR force fields, while representing substantial improvements over earlier approaches, remain simplified representations of complex chemical interactions [180]. They may not accurately capture all aspects of aromatic–aromatic interactions, cation–π interactions, or the subtle effects of amino acid context on chemical properties [181]. For example, the strength of π–π interactions between aromatic residues can be significantly influenced by their local environment; however, current force fields use fixed parameters that may not accurately capture this variability.

The force fields used in FINCHES were primarily trained on single-chain properties, such as the radius of gyration and end-to-end distances, with intermolecular interaction parameters derived from limited experimental datasets [182]. The transferability of these parameters to the full diversity of IDR interactions found in biological systems remains an open question [183]. Additionally, the force fields do not account for sequence-specific effects that may arise from evolutionary selection for particular functional properties [184]. Proteins that have evolved specific functional interactions may exhibit sequence features that optimize their interaction properties in ways not captured by general force field parameters.

15.4. Experimental Validation Challenges

Validating FINCHES predictions presents significant experimental challenges [185]. Many of the systems used for validation involve artificially concentrated protein solutions, which may not accurately reflect physiological conditions [186]. Phase separation studies, although informative, often employ protein concentrations that are orders of magnitude higher than typical cellular levels [187]. The biological relevance of interactions observed under these conditions is not always clear [188]. Cellular protein concentrations are typically in the micromolar range, while many in vitro studies use millimolar concentrations to observe phase separation on experimentally convenient timescales.

Furthermore, most experimental validations focus on relatively simple, well-characterized systems [189]. The behavior of more complex, multi-domain proteins, which contain both structured and disordered regions, may not be accurately captured by current approaches [190]. The framework’s performance on proteins with multiple intrinsically disordered regions (IDRs), alternative splicing variants, or proteins undergoing complex regulatory modifications remains largely untested [191]. Many cellular proteins contain numerous functional domains that can influence each other’s behavior through allosteric mechanisms or competitive binding, effects that are difficult to capture in simplified experimental systems.

15.5. Comparative Limitations

Machine learning approaches face the fundamental challenge of limited training data [192]. The number of experimentally characterized IDR interactions is still relatively small compared to the diversity of possible sequences and conditions [193]. This leads to overfitting and poor generalization to new systems, particularly for sequences that differ significantly from training data [194]. The rapid growth in sequence databases far outpaces the accumulation of experimental data, creating an increasingly large gap between available sequence information and functional characterization.

Simulation-based methods, while providing detailed mechanistic information, face significant timescale limitations [195]. Most biologically relevant processes involving IDRs occur on timescales ranging from seconds to minutes, while simulations typically access timescales of microseconds to milliseconds [196]. Enhanced sampling methods help but cannot completely overcome this fundamental limitation [197]. This timescale gap means that many critical biological processes, such as stress granule assembly and disassembly, cannot be directly simulated with current computational resources.

All physics-based methods, including FINCHES, are fundamentally limited by the accuracy of underlying force fields [198]. Current IDR force fields, while much improved, still represent simplified descriptions of complex chemical interactions [199]. They may not accurately capture all aspects of aromatic–aromatic interactions, cation–π interactions, or context-dependent effects [200]. The development of more accurate force fields remains an active area of research, with ongoing efforts to incorporate quantum mechanical calculations and machine learning approaches to improve parameter accuracy.

16. Future Directions and Improvements

16.1. Integration of Multiple Approaches

Future developments should focus on integrating the strengths of different computational approaches [201]. Ensemble methods that combine FINCHES predictions with machine learning outputs and simulation results could provide more robust and accurate predictions [202]. For example, FINCHES could rapidly screen large numbers of sequences to identify promising candidates, which could then be studied in detail using more computationally intensive methods [203]. This hierarchical approach would leverage the speed of analytical methods for initial screening while providing detailed mechanistic insights through simulations of selected systems.

The integration of computational assessment tools, such as DockQv2, I-INF, and I-RMSD, into FINCHES validation protocols, could provide standardized benchmarking capabilities, making it particularly valuable for evaluating prediction accuracy against experimental complex structures. Such integration would enable a systematic comparison of different computational approaches and provide confidence measures for predictions in the absence of experimental validation. The development of standardized benchmarking protocols would facilitate method comparison and drive improvements in prediction accuracy across the field.

Enhanced environmental modeling represents a critical area for improvement [204]. This includes the explicit modeling of molecular crowding, pH variations, membrane proximity, and the presence of RNA or other cofactors that modulate IDR behavior [205]. The development of environment-specific corrections to existing methods could significantly improve biological relevance [206]. For example, incorporating the effects of RNA on protein phase separation could enable more accurate predictions of ribonucleoprotein granule formation and dynamics.

Incorporating kinetic information into prediction frameworks represents another important direction [207]. Methods that can predict not only thermodynamic favorability but also association and dissociation rates would provide more complete pictures of IDR function [208]. This might involve combining equilibrium predictions with kinetic modeling approaches or machine learning methods trained on time-resolved experimental data [209]. Understanding the kinetics of IDR interactions is crucial for predicting their biological function, as rapid dynamics are often essential for proper cellular regulation.

16.2. Experimental Integration

Future computational frameworks should be designed with better integration of experimental constraints in mind [210]. Methods that can incorporate data from NMR, SAXS, single-molecule fluorescence, and other biophysical techniques could provide more accurate and biologically relevant predictions [211]. The development of hybrid approaches that combine computational predictions with experimental constraints could leverage the strengths of both approaches while mitigating their limitations.

17. Conclusions

The landscape of computational methods for predicting IDR interactions has evolved rapidly, with each approach offering unique advantages and facing specific limitations [212]. FINCHES represents a significant advance by providing rapid, interpretable predictions based on established physical principles, making it particularly valuable for large-scale screening and hypothesis generation [4]. However, the static, mean-field nature of the approach limits its ability to capture the full complexity of dynamic IDR interactions [213].

Comparison with other methods reveals complementary strengths: all-atom simulations provide detailed mechanistic insights but are computationally prohibitive for large-scale studies [214]; machine learning approaches can capture complex patterns but suffer from limited training data and poor transferability [215]; coarse-grained simulations offer a balance between accuracy and speed but still require significant computational resources [216].

The integration of computational assessment tools for interface prediction provides additional validation capabilities that could enhance the reliability and standardization of IDR interaction predictions. The most promising direction for the field involves developing integrated approaches that combine the strengths of different methods while mitigating their limitations [217]. FINCHES excels at rapid screening and initial characterization, while more detailed methods are better suited for mechanistic studies of specific interactions [218]. Machine learning approaches may be particularly valuable for identifying complex sequence patterns not captured by current physics-based models [219].

As experimental techniques for studying IDRs continue to improve and datasets grow larger, computational methods will undoubtedly become more accurate and broadly applicable [220]. The key challenge will be developing approaches that can capture the dynamic, context-dependent nature of IDR interactions while remaining computationally tractable for the large-scale studies needed to understand these systems at the proteome level [221]. The future of the field lies in integrating multiple computational approaches with experimental data to provide a comprehensive understanding of IDR function in biological systems.

Funding

This research received no external funding.

Conflicts of Interest

The author is an advisor to multiple regulatory agencies and a developer of novel biological drugs.

References

Wright, P.E.; Dyson, H.J. Intrinsically disordered proteins in cellular signaling and regulation. Nat. Rev. Mol. Cell Biol. 2015, 16, 18–29. [Google Scholar] [CrossRef] [PubMed]
Holehouse, A.S.; Kragelund, B.B. The molecular basis for cellular function of intrinsically disordered protein regions. Nat. Rev. Mol. Cell Biol. 2024, 25, 187–211. [Google Scholar] [CrossRef] [PubMed]
van der Lee, R.; Buljan, M.; Lang, B.; Weatheritt, R.J.; Daughdrill, G.W.; Dunker, A.K.; Uversky, V.N.; Tompa, P.; Silvestri, E.; Diella, F.; et al. Classification of intrinsically disordered regions and proteins. Chem. Rev. 2014, 114, 6589–6631. [Google Scholar] [CrossRef] [PubMed]
Ginell, G.M.; Emenecker, R.J.; Lotthammer, J.M.; Usher, E.T.; Holehouse, A.S. Holehouse Direct prediction of intermolecular interactions driven by disordered regions. bioRxiv 2024. [Google Scholar] [CrossRef]
.Dignon, G.L.; Zheng, W.; Kim, Y.C.; Best, R.B.; Mittal, J. Sequence determinants of protein phase behavior from a coarse-grained model. PLoS Comput. Biol. 2018, 14, e1005941. [Google Scholar] [CrossRef]
Das, R.K.; Pappu, R.V. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc. Natl. Acad. Sci. USA USA 2013, 110, 13392–13397. [Google Scholar] [CrossRef]
Schuler, B.; Soranno, A.; Hofmann, H.; Nettels, D. Single-molecule FRET spectroscopy and the polymer physics of unfolded and intrinsically disordered proteins. Annu. Rev. Biophys. 2016, 45, 207–231. [Google Scholar] [CrossRef]
Forman-Kay, J.D.; Mittag, T. From sequence and forces to structure, function, and evolution of intrinsically disordered proteins. Structure 2013, 21, 1492–1499. [Google Scholar] [CrossRef]
Babu, M.M.; van der Lee, R.; de Groot, N.S.; Gsponer, J. Intrinsically disordered proteins: Regulation and disease. Curr. Opin. Struct. Biol. 2011, 21, 432–440. [Google Scholar] [CrossRef]
Uversky, V.N. Intrinsically disordered proteins and their environment: Effects of strong denaturants, temperature, pH, counter ions, membranes, binding partners, osmolytes, and macromolecular crowding. Protein J. 2013, 32, 203–227. [Google Scholar] [CrossRef]
Banani, S.F.; Lee, H.O.; Hyman, A.A.; Rosen, M.K. Biomolecular condensates: Organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 2017, 18, 285–298. [Google Scholar] [CrossRef] [PubMed]
Oldfield, C.J.; Dunker, A.K. Intrinsically disordered proteins and intrinsically disordered protein regions. Annu. Rev. Biochem. 2014, 83, 553–584. [Google Scholar] [CrossRef]
Tompa, P. Intrinsically disordered proteins: A 10-year recap. Trends Biochem. Sci. 2012, 37, 509–516. [Google Scholar] [CrossRef]
Shaw, D.E.; Grossman, J.P.; Bank, J.A.; Batson, B.; Butts, J.A.; Chao, J.C.; Deneroff, M.M.; Dror, R.O.; Even, A.; Fenton, C.H.; et al. Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA, 16–21 November 2014; pp. 41–53. [Google Scholar] [CrossRef]
Stone, J.E.; Hardy, D.J.; Ufimtsev, I.S.; Schulten, K. GPU-accelerated molecular modeling applications. J. Mol. Graph. Model. 2010, 29, 116–125. [Google Scholar] [CrossRef]
Salomon-Ferrer, R.; Götz, A.W.; Poole, D.; Le Grand, S.; Walker, R.C. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. J. Chem. Theory Comput. 2013, 9, 3878–3888. [Google Scholar] [CrossRef]
Tesei, G.; Schulze, T.K.; Crehuet, R.; Lindorff-Larsen, K. Accurate model of liquid–liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties. Proc. Natl. Acad. Sci. USA USA 2021, 118, e2111696118. [Google Scholar] [CrossRef]
Lotthammer, J.M.; Ginell, G.M.; Griffith, D.; Emenecker, R.J.; Holehouse, A.S. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat. Methods 2024, 21, 465–476. [Google Scholar] [CrossRef]
Uversky, V.N.; Gillespie, J.R.; Fink, A.L. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins Struct. Funct. Bioinform. 2000, 41, 415–427. [Google Scholar] [CrossRef]
Dunker, A.K.; Lawson, J.D.; Brown, C.J.; Williams, R.M.; Romero, P.; Oh, J.S.; Oldfield, C.J.; Campen, A.M.; Ratliff, C.M.; Hipps, K.W.; et al. Intrinsically disordered protein. J. Mol. Graph. Model. 2001, 19, 26–59. [Google Scholar] [CrossRef]
Romero, P.; Obradovic, Z.; Li, X.; Garner, E.C.; Brown, C.J.; Dunker, A.K. Sequence complexity of disordered protein. Proteins: Struct. Funct. Bioinform. 2001, 42, 38–48. [Google Scholar] [CrossRef]
Ward, J.J.; Sodhi, J.S.; McGuffin, L.J.; Buxton, B.F.; Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 2004, 337, 635–645. [Google Scholar] [CrossRef] [PubMed]
Dosztányi, Z.; Csizmok, V.; Tompa, P.; Simon, I. IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 2005, 21, 3433–3434. [Google Scholar] [CrossRef] [PubMed]
Linding, R.; Jensen, L.J.; Diella, F.; Bork, P.; Gibson, T.J.; Russell, R.B. Protein disorder prediction: Implications for structural proteomics. Structure 2003, 11, 1453–1459. [Google Scholar] [CrossRef]
Prilusky, J.; Felder, C.E.; Zeev-Ben-Mordehai, T.; Rydberg, E.H.; Man, O.; Beckmann, J.S.; Silman, I.; Sussman, J.L. FoldIndex©: A simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics 2005, 21, 3435–3438. [Google Scholar] [CrossRef]
Karplus, M.; McCammon, J.A. Molecular dynamics simulations of biomolecules. Nat. Struct. Biol. 2002, 9, 646–652. [Google Scholar] [CrossRef]
Best, R.B.; Zheng, W.; Mittal, J. Balanced protein-water interactions improve properties of disordered proteins and non-specific protein association. J. Chem. Theory Comput. 2014, 10, 5113–5124. [Google Scholar] [CrossRef]
Piana, S.; Donchev, A.G.; Robustelli, P.; Shaw, D.E. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J. Phys. Chem. B 2015, 119, 5113–5123. [Google Scholar] [CrossRef]
Lindorff-Larsen, K.; Piana, S.; Dror, R.O.; Shaw, D.E. How fast-folding proteins fold. Science 2011, 334, 517–520. [Google Scholar] [CrossRef]
Dror, R.O.; Dirks, R.M.; Grossman, J.P.; Xu, H.; Shaw, D.E. Biomolecular simulation: A computational microscope for molecular biology. Annu. Rev. Biophys. 2012, 41, 429–452. [Google Scholar] [CrossRef]
Shaw, D.E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R.O.; Eastwood, M.P.; Bank, J.A.; Jumper, J.M.; Salmon, J.K.; Shan, Y.; et al. Atomic-level characterization of the structural dynamics of proteins. Science 2010, 330, 341–346. [Google Scholar] [CrossRef]
Rauscher, S.; Gapsys, V.; Gajda, M.J.; Zweckstetter, M.; de Groot, B.L.; Grubmüller, H. Structural ensembles of intrinsically disordered proteins depend strongly on force field: A comparison to experiment. J. Chem. Theory Comput. 2015, 11, 5513–5524. [Google Scholar] [CrossRef] [PubMed]
Palazzesi, F.; Prakash, M.K.; Bonomi, M.; Barducci, A. Accuracy of current all-atom force-fields in modeling protein disordered states. J. Chem. Theory Comput. 2015, 11, 2–7. [Google Scholar] [CrossRef] [PubMed]
Sugita, Y.; Okamoto, Y. Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 1999, 314, 141–151. [Google Scholar] [CrossRef]
Laio, A.; Parrinello, M. Escaping free-energy minima. Proc. Natl. Acad. Sci. USA 2002, 99, 12562–12566. [Google Scholar] [CrossRef]
Hukushima, K.; Nemoto, K. Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 1996, 65, 1604–1608. [Google Scholar] [CrossRef]
Davtyan, A.; Schafer, N.P.; Zheng, W.; Clementi, C.; Wolynes, P.G.; Papoian, G.A. AWSEM-MD: Protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. J. Phys. Chem. B 2012, 116, 8494–8503. [Google Scholar] [CrossRef]
Chen, M.; Lin, X.; Zheng, W.; Onuchic, J.N.; Wolynes, P.G. Protein folding and structure prediction from the ground up: The atomistic associative memory, water mediated, structure and energy model. J. Phys. Chem. B 2016, 120, 8557–8565. [Google Scholar] [CrossRef]
Schafer, N.P.; Kim, B.L.; Zheng, W.; Wolynes, P.G. Learning to fold proteins using energy landscape theory. Isr. J. Chem. 2014, 54, 1311–1337. [Google Scholar] [CrossRef]
Chen, M.; Lin, X.; Lu, W.; Onuchic, J.N.; Wolynes, P.G. Protein Folding and Structure Prediction from the Ground Up II: AAWSEM for α/β Proteins. J. Phys. Chem. B 2017, 121, 3473–3482. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Zheng, W.; Schafer, N.P.; Wolynes, P.G. Frustration in the energy landscapes of multidomain protein misfolding. Proc. Natl. Acad. Sci. USA 2013, 110, 1680–1685. [Google Scholar] [CrossRef]
Zheng, W.; Dignon, G.L.; Brown, M.; Kim, Y.C.; Mittal, J. Hydropathy patterning complements charge patterning to describe conformational preferences of disordered proteins. J. Phys. Chem. Lett. 2020, 11, 3408–3415. [Google Scholar] [CrossRef] [PubMed]
Joseph, J.A.; Reinhardt, A.; Aguirre, A.; Chew, P.Y.; Russell, K.O.; Espinosa, J.R.; Garaizar, A.; Collepardo-Guevara, R. Physics-driven coarse-grained model for biomolecular phase separation with near-quantitative accuracy. Nat. Comput. Sci. 2021, 1, 732–743. [Google Scholar] [CrossRef] [PubMed]
Tesei, G.; Lindorff-Larsen, K. Improved predictions of phase behaviour of intrinsically disordered proteins by tuning the interaction range. Open Res. Eur. 2023, 2, 94. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Krainer, G.; Welsh, T.J.; Joseph, J.A.; Espinosa, J.R.; Wittmann, S.; de Csilléry, E.; Sridhar, A.; Toprakcioglu, Z.; Gudiškytė, G.; Ott, M.A.; et al. Reentrant liquid condensation of proteins is enhanced by RNA. Nat. Commun. 2021, 12, 1085. [Google Scholar] [CrossRef]
Welsh, T.J.; Krainer, G.; Espinosa, J.R.; Joseph, J.A.; Sridhar, A.; Jahnel, M.; Arter, W.E.; Saar, K.L.; Alberti, S.; Collepar-do-Guevara, R.; et al. Surface Electrostatics Govern the Emulsion Stability of Biomolecular Condensates. Nano Lett. 2022, 22, 612–621. [Google Scholar] [CrossRef] [PubMed Central]
Dignon, G.L.; Zheng, W.; Kim, Y.C.; Mittal, J. Temperature-Controlled Liquid-Liquid Phase Separation of Disordered Proteins. ACS Cent. Sci. 2019, 5, 821–830. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Emenecker, R.J.; Holehouse, A.S.; Strader, L.C. Biological Phase Separation and Biomolecular Condensates in Plants. Annu. Rev. Plant Biol. 2021, 72, 17–46. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Emenecker, R.J.; Griffith, D.; Holehouse, A.S. Metapredict: A fast, accurate, and easy-to-use predictor of consensus disorder and structure. Biophys. J. 2021, 120, 4312–4319. [Google Scholar] [CrossRef]
Ginell, G.M.; Emenecker, R.J.; Holehouse, A.S. Computational analysis of the sequence-ensemble relationship in intrinsically disordered proteins. In Methods in Molecular Biology; Springer: Berlin/Heidelberg, Germany, 2022; Volume 2141, pp. 209–245. [Google Scholar] [CrossRef]
Fredrickson, G.H. The Equilibrium Theory of Inhomogeneous Polymers; Oxford University Press: Oxford, UK, 2006; ISBN 978-0-19-856765-5. [Google Scholar]
Delaney, K.T.; Fredrickson, G.H. Recent Developments in Fully Fluctuating Field-Theoretic Simulations of Polymer Melts and Solutions. J. Phys. Chem. B 2016, 120, 7615–7634. [Google Scholar] [CrossRef] [PubMed Central]
Fredrickson, G.H.; Ganesan, V.; Drolet, F. Field-theoretic computer simulation methods for polymers and complex fluids. Macromolecules 2002, 35, 16–39. [Google Scholar] [CrossRef]
Flory, P.J. Thermodynamics of high polymer solutions. J. Chem. Phys. 1942, 10, 51–61. [Google Scholar] [CrossRef]
McCarty, J.; Delaney, K.T.; Danielsen, S.P.O.; Fredrickson, G.H.; Shea, J.E. Complete phase diagram for liquid–liquid phase separation of intrinsically disordered proteins. J. Phys. Chem. Lett. 2019, 10, 1644–1652. [Google Scholar] [CrossRef] [PubMed]
Lin, Y.H.; Forman-Kay, J.D.; Chan, H.S. Theories for sequence-dependent phase behaviors of biomolecular condensates. Biochemistry 2016, 57, 2499–2508. [Google Scholar] [CrossRef] [PubMed]
Danielsen, S.P.O.; McCarty, J.; Shea, J.E.; Delaney, K.T.; Fredrickson, G.H. Molecular design of self-coacervation phenomena in block polyampholytes. Proc. Natl. Acad. Sci. USA 2019, 116, 8224–8232. [Google Scholar] [CrossRef]
Orlando, G.; Silva, A.; Macedo-Ribeiro, S.; Raimondi, D.; Vranken, W.F. Accurate prediction of protein beta-aggregation with generalized statistical potentials. Bioinformatics 2019, 36, 2076–2081. [Google Scholar] [CrossRef]
Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: London, UK, 2013; ISBN 978-1-4757-3264-1. [Google Scholar]
Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2001; ISBN 978-0-262-19475-4. [Google Scholar]
Anderson, P.; Kedersha, N. Stress granules: The Tao of RNA triage. Trends Biochem. Sci. 2008, 33, 141–150. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; ISBN 978-0-262-03561-3. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Mészáros, B.; Erdős, G.; Szabó, B.; Schád, É.; Tantos, Á.; Abukhairan, R.; Horváth, T.; Murvai, N.; Kovács, O.P.; Kovács, M.; et al. PhaSePro: The database of proteins driving liquid–liquid phase separation. Nucleic Acids Res. 2020, 48, D360–D367. [Google Scholar] [CrossRef]
Protter, D.S.; Parker, R. Principles and properties of stress granules. Trends Cell Biol. 2016, 26, 668–679. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar] [CrossRef]
Rives, A.; Meier, J.; Sercu, T.; Goyal, S.; Lin, Z.; Liu, J.; Guo, D.; Ott, M.; Zitnick, C.L.; Ma, J.; et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 2021, 118, e2016239118. [Google Scholar] [CrossRef]
Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7112–7127. [Google Scholar] [CrossRef]
Raimondi, D.; Orlando, G.; Fariselli, P.; Moreau, Y. Insight into the protein solubility driving forces with neural attention. PLoS Comput. Biol. 2022, 18, e1010204. [Google Scholar] [CrossRef]
Savastano, A.; Ibáñez de Opakua, A.; Rankovic, M.; Zweckstetter, M. Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates. Nat. Commun. 2020, 11, 6041. [Google Scholar] [CrossRef]
Lancaster, A.K.; Nutter-Upham, A.; Lindquist, S.; King, O.D. PLAAC: A web and command-line application to identify proteins with prion-like amino acid composition. Bioinformatics 2014, 30, 2501–2502. [Google Scholar] [CrossRef]
Alberti, S.; Halfmann, R.; King, O.; Kapila, A.; Lindquist, S. A systematic survey of the intrinsic tendency of aggregation of the yeast proteome. Mol. Cell 2009, 35, 755–764. [Google Scholar] [CrossRef][Green Version]
Johnson, B.S.; Snead, D.; Lee, J.J.; McCaffery, J.M.; Shorter, J.; Gitler, A.D. TDP-43 is intrinsically aggregation-prone, and amyotrophic lateral sclerosis-linked mutations accelerate aggregation and increase toxicity. J. Biol. Chem. 2009, 284, 20329–20339. [Google Scholar] [CrossRef]
Mackenzie, I.R.; Rademakers, R.; Neumann, M. TDP-43 and FUS in amyotrophic lateral sclerosis and frontotemporal dementia. Lancet Neurol. 2010, 9, 995–1007. [Google Scholar] [CrossRef]
Erdős, G.; Dosztányi, Z. Analyzing protein disorder with IUPred2A. Curr. Protoc. Bioinform. 2020, 70, e99. [Google Scholar] [CrossRef] [PubMed]
Mészáros, B.; Erdős, G.; Dosztányi, Z. IUPred2A: Context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018, 46, W329–W337. [Google Scholar] [CrossRef] [PubMed]
Kumar, M.; Gouw, M.; Michael, S.; Sámano-Sánchez, H.; Pancsa, R.; Glavina, J.; Diakogianni, A.; Valverde, J.A.; Bukirova, D.; Čalyševa, J.; et al. ELM-the eukaryotic linear motif resource in 2020. Nucleic Acids Res. 2020, 48, D296–D306. [Google Scholar] [CrossRef]
Johansson, K.E.; Lindorff-Larsen, K.; Winther, J.R. Protein disorder and the evolution of molecular recognition by induced fit. Protein Sci. 2014, 23, 1274–1284. [Google Scholar] [CrossRef]
Santner, A.A.; Croy, C.H.; Vasanwala, F.H.; Uversky, V.N.; Van, Y.Y.J.; Dunker, A.K. Sweeping away protein aggregation with entropic bristles: Intrinsically disordered protein fusions enhance soluble expression. Biochemistry 2012, 51, 7250–7262. [Google Scholar] [CrossRef]
Klim, J.R.; Williams, L.A.; Limone, F.; Guerra San Juan, I.; Davis-Dusenbery, B.N.; Mordes, D.A.; Burberry, A.; Steinbaugh, M.J.; Gamage, K.K.; Kirchner, R.; et al. ALS-implicated protein TDP-43 sustains levels of STMN2, a mediator of motor neuron growth and repair. Nat. Neurosci. 2019, 22, 167–179. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Guillén-Boixet, J.; Kopach, A.; Holehouse, A.S.; Wittmann, S.; Jahnel, M.; Schlüßler, R.; Kim, K.; Trussina, I.R.E.A.; Wang, J.; Mateju, D.; et al. RNA-induced conformational switching and clustering of G3BP drive stress granule assembly by condensation. Cell 2020, 181, 346–361. [Google Scholar] [CrossRef]
Li, Q.; Peng, X.; Li, Y.; Tang, W.; Zhu, J.; Huang, J.; Qi, Y.; Jiang, T. LLPSDB: A database of proteins undergoing liquid–liquid phase separation in vitro. Nucleic Acids Res. 2020, 48, D320–D327. [Google Scholar] [CrossRef]
Ning, W.; Guo, Y.; Lin, S.; Mei, B.; Wu, Y.; Jiang, P.; Tan, X.; Zhang, W.; Chen, G.; Peng, D.; et al. DrLLPS: A data resource of liquid–liquid phase separation in eukaryotes. Nucleic Acids Res. 2020, 48, D288–D295. [Google Scholar] [CrossRef]
You, K.; Huang, Q.; Yu, C.; Shen, B.; Sevilla, C.; Shi, M.; Hermjakob, H.; Chen, Y.; Li, T. PhaSepDB: A database of liquid–liquid phase separation related proteins. Nucleic Acids Res. 2020, 48, D354–D359. [Google Scholar] [CrossRef] [PubMed]
Maristany, M.J.; Gonzalez, A.A.; Espinosa, J.R.; Huertas, J.; Collepardo-Guevara, R.; Joseph, J.A. Decoding phase separation of prion-like domains through data-driven scaling laws. eLife 2025, 13, RP99068. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
Espinosa, J.R.; Joseph, J.A.; Sanchez-Burgos, I.; Garaizar, A.; Frenkel, D.; Collepardo-Guevara, R. Liquid network connectivity regulates the stability and composition of biomolecular condensates with many components. Proc. Natl. Acad. Sci. USA 2020, 117, 13238–13247. [Google Scholar] [CrossRef]
Cornell, W.D.; Cieplak, P.; Bayly, C.I.; Gould, I.R.; Merz, K.M.; Ferguson, D.M.; Spellmeyer, D.C.; Fox, T.; Caldwell, J.W.; Kollman, P.A. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 1995, 117, 5179–5197. [Google Scholar] [CrossRef]
Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins Struct. Funct. Bioinform. 2006, 65, 712–725. [Google Scholar] [CrossRef]
Jorgensen, W.L.; Maxwell, D.S.; Tirado-Rives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc. 1996, 118, 11225–11236. [Google Scholar] [CrossRef]
MacKerell Jr, A.D.; Bashford, D.; Bellott, M.L.D.R.; Dunbrack Jr, R.L.; Evanseck, J.D.; Field, M.J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 1998, 102, 3586–3616. [Google Scholar] [CrossRef]
Debye, P.; Hückel, E. Zur theorie der elektrolyte. I. Gefrierpunktserniedrigung Und Verwandte Erscheinungen. Phys. Z. 1923, 24, 185–206. [Google Scholar]
Israelachvili, J.N. Intermolecular and Surface Forces; Academic Press: Cambridge, MA, USA, 2011; ISBN 978-0-12-375182-9. [Google Scholar]
Robustelli, P.; Piana, S.; Shaw, D.E. Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl. Acad. Sci. USA 2018, 115, E4758–E4766. [Google Scholar] [CrossRef] [PubMed]
Fossat, M.J.; Posey, A.E.; Pappu, R.V. Quantifying charge state heterogeneity for proteins with multiple ionizable residues. Biophys. J. 2021, 120, 5438–5453. [Google Scholar] [CrossRef] [PubMed]
Kozlowski, L.P. IPC--isoelectric point calculator. Biol. Direct 2016, 11, 55. [Google Scholar] [CrossRef]
Riback, J.A.; Bowman, M.A.; Zmyslowski, A.M.; Knoverek, C.R.; Jumper, J.M.; Hinshaw, J.R.; Green, E.B.; Regan, L.; Daggett, V.; Sosnick, T.R. Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science 2017, 358, 238–241. [Google Scholar] [CrossRef]
Ashbaugh, H.S.; Hatch, H.W. Natively unfolded protein stability as a coil-to-globule transition in charge/hydropathy space. J. Am. Chem. Soc. 2008, 130, 9536–9542. [Google Scholar] [CrossRef]
Das, R.K.; Ruff, K.M.; Pappu, R.V. Relating sequence encoded information to form and function of intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2015, 32, 102–112. [Google Scholar] [CrossRef]
Pappu, R.V.; Wang, X.; Vitalis, A.; Crick, S.L. A polymer physics perspective on driving forces and mechanisms for protein aggregation. Arch. Biochem. Biophys. 2008, 469, 132–141. [Google Scholar] [CrossRef]
Lin, Y.; Currie, S.L.; Rosen, M.K. Intrinsically disordered sequences enable modulation of protein phase separation through distributed tyrosine motifs. J. Biol. Chem. 2017, 292, 19110–19120. [Google Scholar] [CrossRef]
Schwartz, J.C.; Wang, X.; Podell, E.R.; Cech, T.R. RNA seeds higher-order assembly of FUS protein. Cell Rep. 2013, 5, 918–925. [Google Scholar] [CrossRef]
Wang, J.; Choi, J.M.; Holehouse, A.S.; Lee, H.O.; Zhang, X.; Jahnel, M.; Maharana, S.; Lemaitre, R.; Pozniakovsky, A.; Drechsel, D.; et al. A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell 2018, 174, 688–699. [Google Scholar] [CrossRef]
Martin, E.W.; Holehouse, A.S.; Peran, I.; Farag, M.; Incicco, J.J.; Bremer, A.; Grace, C.R.; Soranno, A.; Pappu, R.V.; Mittag, T. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science 2020, 367, 694–699. [Google Scholar] [CrossRef]
Choi, J.M.; Holehouse, A.S.; Pappu, R.V. Physical principles underlying the complex biology of intracellular phase transitions. Annu. Rev. Biophys. 2020, 49, 107–133. [Google Scholar] [CrossRef] [PubMed]
Bremer, A.; Farag, M.; Borcherds, W.M.; Peran, I.; Martin, E.W.; Pappu, R.V.; Mittag, T. Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains. Nat. Chem. 2022, 14, 196–207. [Google Scholar] [CrossRef] [PubMed]
Ruff, K.M.; Harmon, T.S.; Pappu, R.V. CAMELOT: A machine learning approach for coarse-grained simulations of aggregation of block-copolymeric protein sequences. J. Chem. Phys. 2015, 143, 243123. [Google Scholar] [CrossRef]
Chong, S.H.; Ham, S. Impact of chemical heterogeneity on protein self-assembly in water. Proc. Natl. Acad. Sci. USA 2019, 116, 13744–13753. [Google Scholar] [CrossRef]
Patel, A.; Lee, H.O.; Jawerth, L.; Maharana, S.; Jahnel, M.; Hein, M.Y.; Stoynov, S.; Mahamid, J.; Saha, S.; Franzmann, T.M.; et al. A liquid-to-solid phase transition of the ALS protein FUS accelerated by disease mutation. Cell 2015, 162, 1066–1077. [Google Scholar] [CrossRef]
Vernon, R.M.; Chong, P.A.; Tsang, B.; Kim, T.H.; Bah, A.; Farber, P.; Lin, H.; Forman-Kay, J.D. Pi-Pi contacts are an overlooked protein feature relevant to phase separation. eLife 2018, 7, e31486. [Google Scholar] [CrossRef]
Nott, T.J.; Petsalaki, E.; Farber, P.; Jervis, D.; Fussner, E.; Plochowietz, A.; Craggs, T.D.; Bazett-Jones, D.P.; Pawson, T.; Forman-Kay, J.D.; et al. Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell 2015, 57, 936–947. [Google Scholar] [CrossRef]
Elbaum-Garfinkle, S.; Kim, Y.; Szczepaniak, K.; Chen, C.C.H.; Eckmann, C.R.; Myong, S.; Brangwynne, C.P. The disordered P granule protein LAF-1 drives phase separation into droplets with tunable viscosity and dynamics. Proc. Natl. Acad. Sci. USA 2015, 112, 7189–7194. [Google Scholar] [CrossRef]
Pak, C.W.; Kosno, M.; Holehouse, A.S.; Padrick, S.B.; Mittal, A.; Ali, R.; Yunus, A.A.; Liu, D.R.; Pappu, R.V.; Rosen, M.K. Sequence determinants of intracellular phase separation by complex coacervation of a disordered protein. Mol. Cell 2016, 63, 72–85. [Google Scholar] [CrossRef]
Lin, Y.H.; Chan, H.S. Phase separation and single-chain compactness of charged disordered proteins are strongly correlated. Biophys. J. 2017, 112, 2043–2046. [Google Scholar] [CrossRef] [PubMed]
Choi, J.M.; Dar, F.; Pappu, R.V. LASSI: A lattice model for simulating phase transitions of multivalent proteins. PLoS Comput. Biol. 2019, 15, e1007028. [Google Scholar] [CrossRef] [PubMed]
Regy, R.M.; Dignon, G.L.; Zheng, W.; Kim, Y.C.; Mittal, J. Sequence dependent phase separation of protein-polynucleotide mixtures elucidated using molecular simulations. Nucleic Acids Res. 2021, 49, 12593–12603. [Google Scholar] [CrossRef] [PubMed]
Brady, J.P.; Farber, P.J.; Sekhar, A.; Lin, Y.H.; Huang, R.; Bah, A.; Nott, T.J.; Chan, H.S.; Baldwin, A.J.; Forman-Kay, J.D.; et al. Structural and hydrodynamic properties of an intrinsically disordered region of a germ cell-specific protein on phase separation. Proc. Natl. Acad. Sci. USA 2017, 114, E8194–E8203. [Google Scholar] [CrossRef]
Murthy, A.C.; Dignon, G.L.; Kan, Y.; Zerze, G.H.; Parekh, S.H.; Mittal, J.; Fawzi, N.L. Molecular interactions underlying liquid− liquid phase separation of the FUS low-complexity domain. Nat. Struct. Mol. Biol. 2019, 26, 637–648. [Google Scholar] [CrossRef]
Lee, C.W.; Ferreon, J.C.; Ferreon, A.C.; Arai, M.; Wright, P.E. Graded enhancement of p53 binding to CREB-binding protein (CBP) by multisite phosphorylation. Proc. Natl. Acad. Sci. USA 2010, 107, 19290–19295. [Google Scholar] [CrossRef]
Raj, N.; Attardi, L.D. The transactivation domains of the p53 protein. Cold Spring Harb. Perspect. Med. 2017, 7, a026047. [Google Scholar] [CrossRef]
Laptenko, O.; Shiff, I.; Freed-Pastor, W.; Zupnick, A.; Mattia, M.; Freulich, E.; Shamir, I.; Kadouri, N.; Kahan, T.; Manfredi, J.; et al. The p53 C terminus controls site-specific DNA binding and promotes structural changes within the central DNA binding domain. Mol. Cell 2015, 57, 1034–1046. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Krois, A.S.; Ferreon, J.C.; Martinez-Yamout, M.A.; Dyson, H.J.; Wright, P.E. Recognition of the disordered p53 transactivation domain by the transcriptional adapter zinc finger domains of CREB-binding protein. Proc. Natl. Acad. Sci. USA 2016, 113, E1853–E1862. [Google Scholar] [CrossRef]
Kussie, P.H.; Gorina, S.; Marechal, V.; Elenbaas, B.; Moreau, J.; Levine, A.J.; Pavletich, N.P. Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science 1996, 274, 948–953. [Google Scholar] [CrossRef]
Mészáros, B.; Simon, I.; Dosztányi, Z. Prediction of protein binding regions in disordered proteins. PLoS Comput. Biol. 2009, 5, e1000376. [Google Scholar] [CrossRef] [PubMed]
Davey, N.E.; Van Roey, K.; Weatheritt, R.J.; Toedt, G.; Uyar, B.; Altenberg, B.; Budd, A.; Diella, F.; Dinkel, H.; Gibson, T.J. Attributes of short linear motifs. Mol. BioSyst. 2012, 8, 268–281. [Google Scholar] [CrossRef] [PubMed]
Sreedharan, J.; Blair, I.P.; Tripathi, V.B.; Hu, X.; Vance, C.; Rogelj, B.; Ackerley, S.; Durnall, J.C.; Williams, K.L.; Buratti, E.; et al. TDP-43 mutations in familial and sporadic amyotrophic lateral sclerosis. Science 2008, 319, 1668–1672. [Google Scholar] [CrossRef] [PubMed]
Kabashi, E.; Valdmanis, P.N.; Dion, P.; Spiegelman, D.; McConkey, B.J.; Vande Velde, C.; Bouchard, J.P.; Lacomblez, L.; Pochigaeva, K.; Salachas, F.; et al. TARDBP mutations in individuals with sporadic and familial amyotrophic lateral sclerosis. Nat. Genet. 2008, 40, 572–574. [Google Scholar] [CrossRef]
Jiang, L.L.; Che, M.X.; Zhao, J.; Zhou, C.J.; Xie, M.Y.; Li, H.Y.; He, J.H.; Hu, H.Y. Structural transformation of the amyloidogenic core region of TDP-43 protein initiates its aggregation and cytoplasmic inclusion. J. Biol. Chem. 2013, 288, 19614–19624. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Conicella, A.E.; Zerze, G.H.; Mittal, J.; Fawzi, N.L. ALS mutations disrupt phase separation mediated by α-helical structure in the TDP-43 low-complexity C-terminal domain. Structure 2016, 24, 1537–1549. [Google Scholar] [CrossRef]
Wang, A.; Conicella, A.E.; Schmidt, H.B.; Martin, E.W.; Rhoads, S.N.; Reeb, A.N.; Nourse, A.; Ramirez, D.A.; Ragunathan, K.; Howcroft, J.; et al. A single N-terminal phosphomimic disrupts TDP-43 polymerization, phase separation, and RNA splicing. EMBO J. 2018, 37, e97452. [Google Scholar] [CrossRef]
Gruijs da Silva, L.A.; Simonetti, F.; Hutten, S.; Riemenschneider, H.; Sternburg, E.L.; Pietrek, L.M.; Gebel, J.; Dotsch, V.; Edbauer, D.; Baumeister, W.; et al. Disease-linked TDP-43 hyperphosphorylation suppresses TDP-43 condensation and aggregation. EMBO J. 2022, 41, e108443. [Google Scholar] [CrossRef]
McGurk, L.; Gomes, E.; Guo, L.; Mojsilovic-Petrovic, J.; Tran, V.; Kalb, R.G.; Shorter, J.; Bonini, N.M. Poly (ADP-ribose) prevents pathological phase separation of TDP-43 by promoting liquid demixing and stress granule localization. Mol. Cell 2018, 71, 703–717. [Google Scholar] [CrossRef]
Uversky, V.N. Intrinsically disordered proteins and their “mysterious” (meta) physics. Front. Phys. 2019, 7, 10. [Google Scholar] [CrossRef]
Hornbeck, P.V.; Zhang, B.; Murray, B.; Kornhauser, J.M.; Latham, V.; Skrzypek, E. PhosphoSitePlus, 2014: Mutations, PTMs and recalibrations. Nucleic Acids Res. 2015, 43, D512–D520. [Google Scholar] [CrossRef] [PubMed]
Jones, S.; Thornton, J.M. Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA 1996, 93, 13–20. [Google Scholar] [CrossRef] [PubMed]
Moal, I.H.; Fernández-Recio, J. SKEMPI: A Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models. Bioinformatics 2012, 28, 2600–2607. [Google Scholar] [CrossRef] [PubMed]
Mittal, J.; Best, R.B. Thermodynamics and kinetics of protein folding under confinement. Proc. Natl. Acad. Sci. USA 2008, 105, 20233–20238. [Google Scholar] [CrossRef]
Dill, K.A.; MacCallum, J.L. The protein-folding problem, 50 years on. Science 2012, 338, 1042–1046. [Google Scholar] [CrossRef]
Tanford, C. Protein denaturation. Adv. Protein Chem. 1968, 23, 121–282. [Google Scholar] [CrossRef]
Borcherds, W.; Bremer, A.; Borgia, M.B.; Mittag, T. How do intrinsically disordered protein regions encode a driving force for liquid-liquid phase separation? Curr. Opin. Struct. Biol. 2021, 67, 41–50. [Google Scholar] [CrossRef]
Monahan, Z.; Ryan, V.H.; Janke, A.M.; Burke, K.A.; Rhoads, S.N.; Zerze, G.H.; O’Meally, R.; Dignon, G.L.; Conicella, A.E.; Zheng, W.; et al. Phosphorylation of the FUS low-complexity domain disrupts phase separation, aggregation, and toxicity. EMBO J. 2017, 36, 2951–2967. [Google Scholar] [CrossRef]
Ryan, V.H.; Dignon, G.L.; Zerze, G.H.; Chabata, C.V.; Silva, R.; Conicella, A.E.; Amaya, J.; Burke, K.A.; Mittal, J.; Fawzi, N.L. Mechanistic view of hnRNPA2 low-complexity domain structure, interactions, and phase separation altered by mutation and arginine methylation. Mol. Cell 2018, 69, 465–479. [Google Scholar] [CrossRef]
Staller, M.V.; Holehouse, A.S.; Swain-Lenz, D.; Das, R.K.; Pappu, R.V.; Cohen, B.A. A high-throughput mutational scan of an intrinsically disordered acidic transcriptional activation domain. Cell Syst. 2018, 6, 444–455. [Google Scholar] [CrossRef]
Boija, A.; Klein, I.A.; Sabari, B.R.; Dall’Agnese, A.; Coffey, E.L.; Zamudio, A.V.; Li, C.H.; Shrinivas, K.; Manteiga, J.C.; Hannett, N.M.; et al. Transcription factors activate genes through the phase-separation capacity of their activation domains. Cell 2018, 175, 1842–1855. [Google Scholar] [CrossRef] [PubMed]
Sabari, B.R.; Dall’Agnese, A.; Boija, A.; Klein, I.A.; Coffey, E.L.; Shrinivas, K.; Abraham, B.J.; Hannett, N.M.; Zamudio, A.V.; Manteiga, J.C.; et al. Coactivator condensation at super-enhancers links phase separation and gene control. Science 2018, 361, eaar3958. [Google Scholar] [CrossRef] [PubMed]
Cho, W.K.; Spille, J.H.; Hecht, M.; Lee, C.; Li, C.; Grube, V.; Cisse, I.I. Mediator and RNA polymerase II clusters associate in transcription-dependent condensates. Science 2018, 361, 412–415. [Google Scholar] [CrossRef]
Guo, Y.E.; Manteiga, J.C.; Henninger, J.E.; Sabari, B.R.; Dall’Agnese, A.; Hannett, N.M.; Spille, J.H.; Afeyan, L.K.; Zamudio, A.V.; Shrinivas, K.; et al. Pol II phosphorylation regulates a switch between transcriptional and splicing condensates. Nature 2019, 572, 543–548. [Google Scholar] [CrossRef]
Qian, D.; Michaels, T.C.; Knowles, T.P. Analytical solution to the Flory-Huggins model. J. Phys. Chem. Lett. 2022, 13, 7853–7860. [Google Scholar] [CrossRef]
Huggins, M.L. Theory of solutions of high polymers. J. Am. Chem. Soc. 1942, 64, 1712–1719. [Google Scholar] [CrossRef]
Dignon, G.L.; Zheng, W.; Best, R.B.; Kim, Y.C.; Mittal, J. Relation between single-molecule properties and phase behavior of intrinsically disordered proteins. Proc. Natl. Acad. Sci. USA 2018, 115, 9929–9934. [Google Scholar] [CrossRef]
Tesei, G.; Trolle, A.I.; Jonsson, N.; Betz, J.; Knudsen, F.E.; Pesce, F.; Johansson, K.E.; Lindorff-Larsen, K. Conformational ensembles of the human intrinsically disordered proteome. Nature 2024, 626, 897–904. [Google Scholar] [CrossRef] [PubMed]
Ward, J.J.; McGuffin, L.J.; Bryson, K.; Buxton, B.F.; Jones, D.T. The DISOPRED server for the prediction of protein disorder. Bioinformatics 2004, 20, 2138–2139. [Google Scholar] [CrossRef]
Hein, M.Y.; Hubner, N.C.; Poser, I.; Cox, J.; Nagaraj, N.; Toyoda, Y.; Gak, I.A.; Weisswange, I.; Bauer, G.; Zarnack, K.; et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 2015, 163, 712–723. [Google Scholar] [CrossRef]
Hnisz, D.; Shrinivas, K.; Young, R.A.; Chakraborty, A.K.; Sharp, P.A. A phase separation model for transcriptional control. Cell 2017, 169, 13–23. [Google Scholar] [CrossRef] [PubMed]
Shin, Y.; Brangwynne, C.P. Liquid phase condensation in cell physiology and disease. Science 2017, 357, eaaf4382. [Google Scholar] [CrossRef] [PubMed]
Matlock, M.K.; Holehouse, A.S.; Naegle, K.M. ProteomeScout: A repository and analysis resource for post-translational modifications and proteins. Nucleic Acids Res. 2015, 43, D521–D530. [Google Scholar] [CrossRef] [PubMed]
Reimand, J.; Wagih, O.; Bader, G.D. The mutational landscape of phosphorylation signaling in cancer. Sci. Rep. 2013, 3, 2651. [Google Scholar] [CrossRef]
Gomes, E.; Shorter, J. The molecular language of membraneless organelles. J. Biol. Chem. 2019, 294, 7115–7127. [Google Scholar] [CrossRef]
Protter, D.S.; Rao, B.S.; Van Treeck, B.; Lin, Y.; Mizoue, L.; Rosen, M.K.; Parker, R. Intrinsically disordered regions can contribute promiscuous interactions to RNP granule assembly. Cell Rep. 2018, 22, 1401–1412. [Google Scholar] [CrossRef]
Bah, A.; Forman-Kay, J.D. Modulation of intrinsically disordered protein function by post-translational modifications. J. Biol. Chem. 2016, 291, 6696–6705. [Google Scholar] [CrossRef]
Staller, M.V.; Ramirez, E.; Kotha, S.R.; Holehouse, A.S.; Pappu, R.V.; Cohen, B.A. Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains. Cell Syst. 2022, 13, 334–345. [Google Scholar] [CrossRef]
Brzovic, P.S.; Heikaus, C.C.; Kisselev, L.; Vernon, R.; Herbig, E.; Pacheco, D.; Warfield, L.; Littlefield, P.; Baker, D.; Klevit, R.E.; et al. The acidic transcription activator Gcn4 binds the mediator subunit Gal11/Med15 using a simple protein interface forming a fuzzy complex. Mol. Cell 2011, 44, 942–953. [Google Scholar] [CrossRef]
Warfield, L.; Tuttle, L.M.; Pacheco, D.; Klevit, R.E.; Hahn, S. A sequence-specific transcription activator motif and powerful synthetic variants that bind Mediator using a fuzzy protein interface. Proc. Natl. Acad. Sci. USA 2014, 111, E3506–E3513. [Google Scholar] [CrossRef]
Tuttle, L.M.; Pacheco, D.; Warfield, L.; Luo, J.; Ranish, J.; Hahn, S.; Klevit, R.E. Gcn4-mediator specificity is mediated by a large and dynamic fuzzy protein-protein complex. Cell Rep. 2018, 22, 3251–3264. [Google Scholar] [CrossRef] [PubMed]
Piskacek, M.; Havelka, M.; Rezacova, M.; Knight, A. The 9aaTAD transactivation domains: From Gal4 to p53. PLoS ONE 2016, 11, e0162842. [Google Scholar] [CrossRef] [PubMed]
Crabtree, G.R.; Schreiber, S.L. Three-part inventions: Intracellular signaling and induced proximity. Trends Biochem. Sci. 2011, 36, 130–137. [Google Scholar] [CrossRef]
Schreiber, S.L. The rise of molecular glues. Cell 2021, 184, 3–9. [Google Scholar] [CrossRef]
Klein, I.A.; Boija, A.; Afeyan, L.K.; Hawken, S.W.; Fan, M.; Dall’Agnese, A.; Oksuz, O.; Henninger, J.E.; Shrinivas, K.; Sabari, B.R.; et al. Partitioning of cancer therapeutics in nuclear condensates. Science 2020, 368, 1386–1392. [Google Scholar] [CrossRef]
Schmidt, H.B.; Barreau, A. Phase separation-deficient TDP43 remains functional in splicing. Nat. Commun. 2019, 10, 4890. [Google Scholar] [CrossRef]
Kim, T.H.; Payliss, B.J.; Nosella, M.L.; Lee, I.T.; Toyama, Y.; Forman-Kay, J.D.; Kay, L.E. Interaction hot spots for phase separation revealed by NMR studies of a CAPRIN1 condensed phase. Proc. Natl. Acad. Sci. USA 2021, 118, e2104897118. [Google Scholar] [CrossRef]
Bugge, K.; Brakti, I.; Fernandes, C.B.; Dreier, J.E.; Lundsgaard, J.E.; Olsen, J.G.; Skriver, K.; Kragelund, B.B. Interactions by disorder--a matter of context. Front. Mol. Biosci. 2020, 7, 110. [Google Scholar] [CrossRef]
Best, R.B. Computational and theoretical advances in studies of intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2017, 42, 147–154. [Google Scholar] [CrossRef]
Borgia, A.; Borgia, M.B.; Bugge, K.; Kissling, V.M.; Heidarsson, P.O.; Fernandes, C.B.; Sottini, A.; Soranno, A.; Buholzer, K.J.; Nettels, D.; et al. Extreme disorder in an ultrahigh-affinity protein complex. Nature 2018, 555, 61–66. [Google Scholar] [CrossRef]
Kiefhaber, T.; Bachmann, A.; Jensen, K.S. Dynamics and mechanisms of coupled protein folding and binding reactions. Curr. Opin. Struct. Biol. 2012, 22, 21–29. [Google Scholar] [CrossRef] [PubMed]
Fuxreiter, M.; Tompa, P. Fuzzy complexes: A more stochastic view of protein function. Adv. Exp. Med. Biol. 2012, 725, 1–14. [Google Scholar] [CrossRef] [PubMed]
Mollica, L.; Bessa, L.M.; Hanoulle, X.; Jensen, M.R.; Blackledge, M.; Schneider, R. Binding mechanisms of intrinsically disordered proteins: Theory, simulation, and experiment. Front. Mol. Biosci. 2016, 3, 52. [Google Scholar] [CrossRef] [PubMed]
Henzler-Wildman, K.; Kern, D. Dynamic personalities of proteins. Nature 2007, 450, 964–972. [Google Scholar] [CrossRef]
Schreiber, G.; Haran, G.; Zhou, H.X. Fundamental aspects of protein− protein association kinetics. Chem. Rev. 2009, 109, 839–860. [Google Scholar] [CrossRef]
Zhou, H.X.; Bates, P.A. Modeling protein association mechanisms and kinetics. Curr. Opin. Struct. Biol. 2013, 23, 887–893. [Google Scholar] [CrossRef]
Okur, H.I.; Hladílková, J.; Rembert, K.B.; Cho, Y.; Heyda, J.; Dzubiella, J.; Cremer, P.S.; Jungwirth, P. Beyond the Hofmeister series: Ion-specific effects on proteins and their biological functions. J. Phys. Chem. B 2017, 121, 1997–2014. [Google Scholar] [CrossRef]
Rivas, G.; Minton, A.P. Macromolecular crowding in vitro, in vivo, and in between. Trends Biochem. Sci. 2016, 41, 970–981. [Google Scholar] [CrossRef]
Zhou, H.X.; Rivas, G.; Minton, A.P. Macromolecular crowding and confinement: Biochemical, biophysical, and potential physiological consequences. Annu. Rev. Biophys. 2008, 37, 375–397. [Google Scholar] [CrossRef]
Nerenberg, P.S.; Head-Gordon, T. New developments in force fields for biomolecular simulations. Curr. Opin. Struct. Biol. 2018, 49, 129–138. [Google Scholar] [CrossRef]
Piana, S.; Robustelli, P.; Tan, D.; Chen, S.; Shaw, D.E. Development of a force field for the simulation of single-chain proteins and protein--protein complexes. J. Chem. Theory Comput. 2020, 16, 2494–2507. [Google Scholar] [CrossRef] [PubMed]
Hunter, C.A.; Sanders, J.K. The nature of π-π interactions. J. Am. Chem. Soc. 1990, 112, 5525–5534. [Google Scholar] [CrossRef]
Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B.L.; Grubmüller, H.; MacKerell, A.D., Jr. CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods 2017, 14, 71–73. [Google Scholar] [CrossRef] [PubMed]
Vitalis, A.; Pappu, R.V. ABSINTH: A new continuum solvation model for simulations of polypeptides in aqueous solutions. J. Comput. Chem. 2009, 30, 673–699. [Google Scholar] [CrossRef]
Moses, D.; Yu, F.; Ginell, G.M.; Shamoon, N.M.; Koenig, P.S.; Holehouse, A.S.; Sukenik, S. Revealing the hidden sensitivity of intrinsically disordered proteins to their chemical environment. J. Phys. Chem. Lett. 2020, 11, 10131–10136. [Google Scholar] [CrossRef]
Wei, M.T.; Elbaum-Garfinkle, S.; Holehouse, A.S.; Chen, C.C.H.; Feric, M.; Arnold, C.B.; Priestley, R.D.; Pappu, R.V.; Brangwynne, C.P. Phase behaviour of disordered proteins underlying low density and high permeability of liquid organelles. Nat. Chem. 2017, 9, 1118–1125. [Google Scholar] [CrossRef]
Alberti, S.; Gladfelter, A.; Mittag, T. Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates. Cell 2019, 176, 419–434. [Google Scholar] [CrossRef]
McSwiggen, D.T.; Mir, M.; Darzacq, X.; Tjian, R. Evaluating phase separation in live cells: Diagnosis, caveats, and functional consequences. Genes Dev. 2019, 33, 1619–1634. [Google Scholar] [CrossRef]
Brangwynne, C.P.; Tompa, P.; Pappu, R.V. Polymer physics of intracellular phase transitions. Nat. Phys. 2015, 11, 899–904. [Google Scholar] [CrossRef]
Mittag, T.; Pappu, R.V. A conceptual framework for understanding phase separation and addressing open questions and challenges. Mol. Cell 2022, 82, 2201–2214. [Google Scholar] [CrossRef]
Darling, A.L.; Uversky, V.N. Intrinsic disorder and posttranslational modifications: The darker side of the biological dark matter. Front. Genet. 2018, 9, 158. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Zheng, W.; Freddolino, P.L.; Zhang, Y. MetaGO: Predicting gene ontology of non-homologous proteins through low-resolution protein structure prediction and protein--protein network mapping. J. Mol. Biol. 2018, 430, 2256–2265. [Google Scholar] [CrossRef] [PubMed]
Radivojac, P.; Clark, W.T.; Oron, T.R.; Schnoes, A.M.; Wittkop, T.; Sokolov, A.; Graim, K.; Funk, C.; Verspoor, K.; Ben-Hur, A.; et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 2013, 10, 221–227. [Google Scholar] [CrossRef] [PubMed]
Jiang, Y.; Oron, T.R.; Clark, W.T.; Bankapur, A.R.; D’Andrea, D.; Lepore, R.; Funk, C.S.; Kahanda, I.; Verspoor, K.M.; Ben-Hur, A.; et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 2016, 17, 184. [Google Scholar] [CrossRef]
Kmiecik, S.; Gront, D.; Kolinski, M.; Wieteska, L.; Dawid, A.E.; Kolinski, A. Coarse-grained protein models and their applications. Chem. Rev. 2016, 116, 7898–7936. [Google Scholar] [CrossRef]
Ingólfsson, H.I.; Lopez, C.A.; Uusitalo, J.J.; de Jong, D.H.; Gopal, S.M.; Periole, X.; Marrink, S.J. The power of coarse graining in biomolecular simulations. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2014, 4, 225–248. [Google Scholar] [CrossRef]
Bernardi, R.C.; Melo, M.C.; Schulten, K. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim. Biophys. Acta (BBA)-Gen. Subj. 2015, 1850, 872–877. [Google Scholar] [CrossRef]
González, M.A. Force fields and molecular dynamics simulations. École Thématique De La Société Française De La Neutron. 2011, 12, 169–200. [Google Scholar] [CrossRef]
Monticelli, L.; Tieleman, D.P. Force fields for classical molecular dynamics. In Biomolecular Simulations; Humana Press: Totowa, NJ, USA, 2013; pp. 197–213. [Google Scholar] [CrossRef]
Freddolino, P.L.; Harrison, C.B.; Liu, Y.; Schulten, K. Challenges in protein-folding simulations. Nat. Phys. 2010, 6, 751–758. [Google Scholar] [CrossRef]
Dill, K.A.; Ozkan, S.B.; Shell, M.S.; Weikl, T.R. The protein folding problem. Annu. Rev. Biophys. 2008, 37, 289–316. [Google Scholar] [CrossRef]
Zhang, Y. Progress and challenges in protein structure prediction. Curr. Opin. Struct. Biol. 2008, 18, 342–348. [Google Scholar] [CrossRef] [PubMed]
Moult, J.; Fidelis, K.; Kryshtafovych, A.; Schwede, T.; Tramontano, A. Critical assessment of methods of protein structure prediction (CASP)--Round XII. Proteins: Struct. Funct. Bioinform. 2018, 86, 7–15. [Google Scholar] [CrossRef] [PubMed]
McGuffee, S.R.; Elcock, A.H. Diffusion, crowding & protein stability in a dynamic molecular model of the bacterial cytoplasm. PLoS Comput. Biol. 2010, 6, e1000694. [Google Scholar] [CrossRef]
Feig, M.; Harada, R.; Mori, T.; Yu, I.; Takahashi, K.; Sugita, Y. Complete atomistic model of a bacterial cytoplasm for integrating physics, biochemistry, and systems biology. J. Mol. Graph. Model. 2015, 58, 1–9. [Google Scholar] [CrossRef]
Yu, I.; Mori, T.; Ando, T.; Harada, R.; Jung, J.; Sugita, Y.; Feig, M. Biomolecular interactions modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm. eLife 2016, 5, e19274. [Google Scholar] [CrossRef]
Levy, Y.; Wolynes, P.G.; Onuchic, J.N. Protein topology determines binding mechanism. Proc. Natl. Acad. Sci. USA 2004, 101, 511–516. [Google Scholar] [CrossRef]
Hammes, G.G.; Chang, Y.C.; Oas, T.G. Conformational selection or induced fit: A flux description of reaction mechanism. Proc. Natl. Acad. Sci. USA 2009, 106, 13737–13741. [Google Scholar] [CrossRef]
Voelz, V.A.; Bowman, G.R.; Beauchamp, K.; Pande, V.S. Molecular simulation of ab initio protein folding for a millisecond folder NTL9 (1–39). J. Am. Chem. Soc. 2010, 132, 1526–1528. [Google Scholar] [CrossRef]
Bonomi, M.; Branduardi, D.; Bussi, G.; Camilloni, C.; Provasi, D.; Raiteri, P.; Donadio, D.; Marinelli, F.; Pietrucci, F.; Broglia, R.A.; et al. PLUMED: A portable plugin for free-energy calculations with molecular dynamics. Comput. Phys. Commun. 2009, 180, 1961–1972. [Google Scholar] [CrossRef]
Roux, B.; Weare, J. On the statistical equivalence of restrained-ensemble simulations with the maximum entropy method. J. Chem. Phys. 2013, 138, 084107. [Google Scholar] [CrossRef]
Pancsa, R.; Tompa, P. Structural disorder in eukaryotes. PLoS ONE 2012, 7, e34687. [Google Scholar] [CrossRef] [PubMed]
Fisher, C.K.; Stultz, C.M. Constructing ensembles for intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2011, 21, 426–431. [Google Scholar] [CrossRef] [PubMed]
Levy, R.M.; Haris, S.A.; Karplus, M. Molecular dynamics simulations of an α-helix in water: Solvation and dynamical aspects. Biochim. Biophys. Acta (BBA)-Protein Struct. 1979, 577, 177–189. [Google Scholar]

Figure 1. A diagram of a mathematical function of molecular dynamics.

Figure 2. A diagram of energy efficiency AWSEM.

Figure 3. A diagram of Calvados representation.

Figure 4. A diagram of Mpipi Force field.

Figure 5. A diagram of field theory methods.

Figure 6. A diagram of PS-Predictor.

Figure 7. A diagram of the deep learning approach.

Figure 8. A diagram of the protein language model.

Figure 9. PLAAC modeling.

Figure 10. A diagram of FuzDrop.

Figure 11. A diagram of catGranule.

Figure 12. LLPSDB data-driven model.

Figure 13. A diagram of CADMOS.

Figure 14. The FINCHES system.

Table 1. Comprehensive method comparison with specific system applications.

Method	System Studied	Prediction Type	Experimental Validation	Accuracy/Agreement	Computational Time	Reference
FINCHES	FUS LCD variants	Phase diagrams	Y→S mutations prevent LLPS	r = 0.91 for Tc prediction	1 s per variant	[4]
CALVADOS	hnRNPA1 LCD	Phase behavior	Aromatic mutant effects	r = 0.89 for phase boundaries	2 h per system	[17]
All-atom MD	p53 TAD	Binding mechanism	NMR chemical shifts	RMSD = 2.1 Å from experiment	5 days per trajectory	[32]
PSPredictor	Stress granule proteins	Binary classification	Localization experiments	85% accuracy (239 proteins)	0.1 s per protein	[56]
PLAAC	TDP-43 variants	Prion-like propensity	Aggregation assays	79% for ALS mutations	0.05 s per protein	[70]
FuzDrop	RNA-binding proteins	Droplet regions	Fluorescence microscopy	92% sensitivity	1 s per protein	[74]
AWSEM	α-synuclein	Aggregation pathway	Fiber morphology	Correct fibril structure	12 h per trajectory	[38]
Field Theory	Elastin-like polypeptides	Critical temperature	Turbidity measurements	±5 K accuracy	10 min per system	[53]

Table 2. Computational performance comparison.

Method	System Size	Time per Prediction	Scalability	Hardware Requirements
FINCHES	Any sequence	0.001 s	Linear with the sequence length	Standard CPU
CALVADOS	1–10 proteins	1–10 h	Quadratic with system size	GPU recommended
All-atom MD	1–5 proteins	1–100 days	Exponential with system size	Specialized clusters
PSPredictor	Single protein	0.1 s	Constant	Standard CPU
PLAAC	Single protein	0.05 s	Linear	Standard CPU
FuzDrop	Single protein	1 s	Linear	Standard CPU
Field Theory	Parameter space	10–60 min	Linear with parameters	Standard CPU

Table 3. Accuracy assessment across different prediction types.

Prediction Type	FINCHES	CALVADOS	PSPredictor	Field Theory	All-Atom MD
Phase Separation Binary	87% correct	94% correct	85% correct	91% correct	N/A
Critical Temperature	r = 0.85	r = 0.91	N/A	r = 0.78	N/A
Interface Prediction	73% agreement	N/A	N/A	N/A	89% agreement
Mutation Effects	82% correct	88% correct	45% (poor)	N/A	N/A

Table 4. Representative FINCHES applications and validation results.

System	Experimental Observation	FINCHES Prediction	Agreement	Reference
FUS LCD variants	Y→S mutations prevent phase separation	Highly repulsive ε values for Y→S	Excellent	[135]
DDX4-NTD	R2K variant shows reduced condensation	Reduced attractive interactions	Good	[130]
TDP-43 LCD	Phosphorylation suppresses aggregation	Weakened attractive interactions	Excellent	[135]
hnRNPA1 LCD	Aromatic mutations alter phase behavior	Predicted phase diagram changes	Good	[137]
Transcription factors	AD strength correlates with coactivator binding	ε values with Gal11 correlate with activity	Good	[157]
CAPRIN-1	Salt enhances phase separation	Reduced repulsion at higher salt	Good	[165]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Niazi, S.K. FINCHES: A Computational Framework for Predicting Intermolecular Interactions in Intrinsically Disordered Proteins. Int. J. Mol. Sci. 2025, 26, 6246. https://doi.org/10.3390/ijms26136246

AMA Style

Niazi SK. FINCHES: A Computational Framework for Predicting Intermolecular Interactions in Intrinsically Disordered Proteins. International Journal of Molecular Sciences. 2025; 26(13):6246. https://doi.org/10.3390/ijms26136246

Chicago/Turabian Style

Niazi, Sarfaraz K. 2025. "FINCHES: A Computational Framework for Predicting Intermolecular Interactions in Intrinsically Disordered Proteins" International Journal of Molecular Sciences 26, no. 13: 6246. https://doi.org/10.3390/ijms26136246

APA Style

Niazi, S. K. (2025). FINCHES: A Computational Framework for Predicting Intermolecular Interactions in Intrinsically Disordered Proteins. International Journal of Molecular Sciences, 26(13), 6246. https://doi.org/10.3390/ijms26136246

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FINCHES: A Computational Framework for Predicting Intermolecular Interactions in Intrinsically Disordered Proteins

Abstract

1. Introduction

2. Computational Approaches for IDR Interaction Prediction

Physics-Based Approaches

3. Machine Learning Approaches

4. Deep Learning Approaches

5. Protein Language Model Applications

6. FuzDrop: Integrated Disorder and Droplet Prediction

7. catGranule: Machine Learning for Stress Granule Proteins

8. LLPSDB and Database-Driven Approaches

9. Physics-Informed Machine Learning

10. Computational Assessment of Interface Prediction

11. FINCHES Methodology and Theoretical Foundation

11.1. Force Field Implementation and Selection Rationale

11.2. Sequence Context Corrections

11.3. Mean-Field Calculation

12. Comparative Analysis and Validation

12.1. FUS Low-Complexity Domain

12.2. DDX4 N-Terminal Domain

12.3. p53 Transactivation Domain

12.4. TDP-43 Low-Complexity Domain

12.5. Speed and Scalability Analysis

13. Key Outputs and Interpretations

13.1. Mean-Field Interaction Parameter (ε)

13.2. Intermaps: Spatial Resolution of Interactions

13.3. Phase Diagram Predictions

14. Applications and Experimental Validation

14.1. Proteome-Scale Analysis

14.2. Post-Translational Modification Effects

14.3. Transcription Factor-Coactivator Interactions

14.4. Drug Discovery and Protein Design

15. Limitations and Critical Assessment

15.1. Fundamental Assumptions and Consequences

15.2. Temporal and Dynamic Limitations

15.3. Force Field Limitations

15.4. Experimental Validation Challenges

15.5. Comparative Limitations

16. Future Directions and Improvements

16.1. Integration of Multiple Approaches

16.2. Experimental Integration

17. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI