Polymer Models of Chromatin Imaging Data in Single Cells

: Recent super-resolution imaging technologies enable tracing chromatin conformation with nanometer-scale precision at the single-cell level. They revealed, for example, that human chromosomes fold into a complex three-dimensional structure within the cell nucleus that is essential to establish biological activities, such as the regulation of the genes. Yet, to decode from imaging data the molecular mechanisms that shape the structure of the genome, quantitative methods are required. In this review, we consider models of polymer physics of chromosome folding that we benchmark against multiplexed FISH data available in human loci in IMR90 ﬁbroblast cells. By combining polymer theory, numerical simulations and machine learning strategies, the predictions of the models are validated at the single-cell level, showing that chromosome structure is controlled by the interplay of distinct physical processes, such as active loop-extrusion and thermodynamic phase-separation.

In the last decade, powerful technologies based on super-resolution microscopy approaches enabled to probe the 3D conformation of the genome with nanometer-scale precision in single nuclei [14][15][16][17]. Those techniques revealed, for instance, that TADs exist at the single-cell level, they broadly vary from cell to cell and correspond to spatially segregated globular 3D conformations, confining, e.g., the activity of the regulators to their proper target genes [15].
Those recent experiments triggered questions on the nature of chromatin contacts and their origin: what are the mechanisms that shape genome 3D structure? What methods can be developed to identify them? In this review, we discuss the application of models from polymer physics to understand the machinery that establish chromosome architecture in the nucleus of the cells. In particular, we focus on two recently proposed models of folding that rely on two different physical processes, respectively, loop-extrusion and polymer phaseseparation. In the first process, spatial proximity between distal DNA sites is achieved by Algorithms 2022, 15, 330 2 of 8 molecular motors that stochastically bind to DNA and extrude a polymer loop in an out-ofequilibrium, active (e.g., ATP-dependent) physical process [18][19][20][21][22][23][24], and in the second, distal genomic sites are tethered together by interactions mediated, e.g., by diffusing cognate molecular particles, such as transcription factors, or by direct interactions produced, for example, by DNA-bound histone molecules [12,[25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40]. By taking as a case study a 2 Mb wide chromatin region in human IMR90 cells where super-resolution multiplexed FISH data are available [15], we combine those distinct physical mechanisms into polymer models that we investigate by massive computer simulations (see Methods). We show that both loop-extrusion and phase-separation significantly recapitulate microscopy data, hinting that both processes can reliably coexist to determine the structure of chromosomes at the single-cell level [28].

Methods
In the studied models, chromatin is represented as a polymer chain of non-overlapping beads, subject to standard physical potentials derived from classical studies of polymer simulations [41,42]. Specifically, adjacent beads along the polymer are tethered by an elastic FENE potential and their overlap is prevented by a repulsive Weeks-Chandler-Anderson (WCA) potential. In the loop-extrusion (LE) model, the extruding motors are modelled as harmonic springs that can extrude loops by a translocation along the polymer chain, i.e., at each simulation step the spring is updated from the bead pair (i,j) to (i − 1,j + 1). As broadly reported in the literature [18][19][20][21][22][23], those springs cannot pass through each other, their number is fixed and they halt translocation when they collide with another extruding motor or anchor sites with opposite orientation, or they stochastically unbind from the polymer. A typical parameter choice in the simulations is to set the LE spring energy constant equal to 10 k B T (k B is the Boltzmann constant and T the temperature) and its rest length to 1.1σ (σ is the bead diameter) [18,19,37]. Additionally, to unveil the roles of the LE ingredients beyond its minimal implementation, we examined a more refined version where LE boundaries are chosen to best reproduce population-averaged contact data and the model anchor sites are present with a specific, finite probability in a single-polymer molecule to model cell-to-cell variation [28,37]. The average domain structure of the model is thus reproduced from bulk data, but the ensemble of its single-molecule structures is validated against independent single-cell microscopy conformations (see below). Another key parameter controlling LE dynamics is the extruder processivity, i.e., the ratio between the extrusion velocity and the unbinding rate from the chain, whose values can range, e.g., from 80 kb up to 750 kb [19,23,37].
In the class of phase-separation based models, we focused on the strings and binders (SBS) model [30,39], in which a chromosome region is modeled as a self-avoiding polymer chain where different specific types of binding sites are located for diffusing cognate molecular particles (called binders). Binders and polymer sites are subject to a Brownian motion regulated by the Langevin equation with standard parameters (i.e., friction coefficient ζ = 0.5 [41]). The binders, via specific attractive interactions represented by a truncated Lennard-Jones potential [34], can bridge their cognate sites on the chain, hence guiding a micro-phase-separation of the polymer into different globules. As the number of the binders (or their energy affinity) increases above a given threshold, the system undergoes a thermodynamic phase-transition from a coil (i.e., randomly folded) to a phase-separated globule state where the polymer self-assembles into distinct, spatially segregated globules via a phase-separation mechanism [39]. Critical binder concentrations, for weak biochemical affinities, fall in the fractions of µmol/L range [39], which is consistent with typical transcription factor concentrations. The genomic locations of the binding sites of the SBS model are inferred by a machine learning procedure based on the PRISMR algorithm [12]. In brief, PRISMR is a recursive Monte Carlo procedure that identifies the minimal set of binding sites to best match input bulk (e.g., ensemble-averaged Hi-C or microscopy) data, as fully detailed in [12,29,39]. Finally, we also discuss a model where the LE and SBS models To generate a statistical ensemble of in silico single-molecule conformations, each model is investigated by massive molecular dynamics simulations until stationarity is fully reached (typically up to 10 8 MD time iteration steps [43]); Langevin dynamics is integrated via the Velocity-Verlet algorithm by using the free available LAMMPS [44] and HOOMD [45] software. All the scripts required to perform the simulations of the models are available at https://github.com/ehsanirani/PhaseSeparation-LoopExtrusion-MD (accessed on 1 August 2022) [28].

Results
In this section, we benchmark our polymer models against single-cell super-resolution microscopy data [15] available at 30 kb resolution in a 2 Mb wide genomic region (Chr21: 28-30 Mb) in human IMR90 cells. We show that the different models capture the complex pattern of chromatin contacts at the cell population-averaged level, as well as the observed 3D conformations of the imaged chromatin region in single DNA molecules [28].

The Models Recapitulate Ensemble-Averaged Microscopy Data
In a first validation of the models, we computed their median distance matrix that we compared against the corresponding map from multiplexed FISH data [15] (Figure 1). on the PRISMR algorithm [12]. In brief, PRISMR is a recursive Monte Carlo procedure that identifies the minimal set of binding sites to best match input bulk (e.g., ensembleaveraged Hi-C or microscopy) data, as fully detailed in [12,29,39]. Finally, we also discuss a model where the LE and SBS models are combined and act simultaneously in a singlepolymer molecule (hereafter indicated as the LE+SBS model).
To generate a statistical ensemble of in silico single-molecule conformations, each model is investigated by massive molecular dynamics simulations until stationarity is fully reached (typically up to 10 8 MD time iteration steps [43]); Langevin dynamics is integrated via the Velocity-Verlet algorithm by using the free available LAMMPS [44] and HOOMD [45] software. All the scripts required to perform the simulations of the models are available at https://github.com/ehsanirani/PhaseSeparation-LoopExtrusion-MD (accessed on 1 August 2022) [28].

Results
In this section, we benchmark our polymer models against single-cell superresolution microscopy data [15] available at 30 kb resolution in a 2 Mb wide genomic region (Chr21: 28-30 Mb) in human IMR90 cells. We show that the different models capture the complex pattern of chromatin contacts at the cell population-averaged level, as well as the observed 3D conformations of the imaged chromatin region in single DNA molecules [28].

The Models Recapitulate Ensemble-Averaged Microscopy Data
In a first validation of the models, we computed their median distance matrix that we compared against the corresponding map from multiplexed FISH data [15] (Figure 1).

Figure 1.
The median spatial distance matrix of the IMR90 locus (Chr21: 28-30 Mb) from multiplexed FISH microscopy [15] is compared against the corresponding in silico matrices of the considered polymer models [28]. Both loop-extrusion (LE) and phase-separation based models (SBS and LE + SBS) well recapitulate the complex experimental pattern of contacts. Adapted from [28].
The median distance map is the ensemble median of the single-molecule distance maps, which, by definition, are symmetric square matrices reporting the Euclidean distances between all pairs of polymer sites. Those matrices are visually represented as a 2D heatmap with a color bar scheme to highlight contacting regions that are closer in 3D space (e.g., TADs or loops, colored in red in Figure 1) or those that have no significant interactions (blue regions in Figure 1). To efficiently compute the distance matrices, we used built-in functions within the Python SciPy package [28]. To reduce the noise within imaging data, we applied a Gaussian filter on single-cell distance maps (standard deviation of the Gaussian kernel = 1) and excluded single-cell conformations whose 3D coordinates have >80% missing values. To quantitatively estimate the similarity between microscopy and model distance matrices, we used the genomic distance-corrected Pearson correlation coefficient, r′, which corrects the usual Pearson coefficient for genomic distance effects [12]. In brief, r′ is the Pearson coefficient computed on distance matrices where each entry is subtracted by the mean value of its diagonal. We found that the different models recapitulate the complex TAD patterns observed in microscopy data, as well as specific pointwise interactions, as highlighted by the high r' correlation values: r′  [15] is compared against the corresponding in silico matrices of the considered polymer models [28]. Both loop-extrusion (LE) and phase-separation based models (SBS and LE + SBS) well recapitulate the complex experimental pattern of contacts. Adapted from [28].
The median distance map is the ensemble median of the single-molecule distance maps, which, by definition, are symmetric square matrices reporting the Euclidean distances between all pairs of polymer sites. Those matrices are visually represented as a 2D heatmap with a color bar scheme to highlight contacting regions that are closer in 3D space (e.g., TADs or loops, colored in red in Figure 1) or those that have no significant interactions (blue regions in Figure 1). To efficiently compute the distance matrices, we used built-in functions within the Python SciPy package [28]. To reduce the noise within imaging data, we applied a Gaussian filter on single-cell distance maps (standard deviation of the Gaussian kernel = 1) and excluded single-cell conformations whose 3D coordinates have >80% missing values. To quantitatively estimate the similarity between microscopy and model distance matrices, we used the genomic distance-corrected Pearson correlation coefficient, r , which corrects the usual Pearson coefficient for genomic distance effects [12]. In brief, r is the Pearson coefficient computed on distance matrices where each entry is subtracted by the mean value of its diagonal. We found that the different models recapitulate the complex TAD patterns observed in microscopy data, as well as specific pointwise interactions, as highlighted by the high r' correlation values: r = 0.49, r = 0.77 and r = 0.70, respectively, for LE, SBS and LE+SBS [28]. To substantiate the statistical significance of the r' correlations of the models, we considered as null control model a self-avoiding chain with the same number of beads as the imaged conformations and found that it returns a significantly lower correlation value (r = 0.11, which is, respectively, four and seven times lower than the values found for LE and SBS/LE+SBS). That indicates that both active processes, like loop-extrusion, and passive mechanisms, e.g., thermodynamic polymer phase-separation, are consistent with the average structure of the considered IMR90 genomic region measured by bulk imaging data.
Next, we checked whether our models also explain local properties of chromatin structure. To this aim, we computed the boundary probability genomic function, i.e., the probability for each genomic position across the studied IMR90 locus to appear as boundary of a single-cell TAD domain [15] (Figure 2). = 0.49, r′ = 0.77 and r′ = 0.70, respectively, for LE, SBS and LE+SBS [28]. To substantiate the statistical significance of the r' correlations of the models, we considered as null control model a self-avoiding chain with the same number of beads as the imaged conformations and found that it returns a significantly lower correlation value (r′ = 0.11, which is, respectively, four and seven times lower than the values found for LE and SBS/LE+SBS). That indicates that both active processes, like loop-extrusion, and passive mechanisms, e.g., thermodynamic polymer phase-separation, are consistent with the average structure of the considered IMR90 genomic region measured by bulk imaging data.
Next, we checked whether our models also explain local properties of chromatin structure. To this aim, we computed the boundary probability genomic function, i.e., the probability for each genomic position across the studied IMR90 locus to appear as boundary of a single-cell TAD domain [15] (Figure 2).

Figure 2.
The genomic boundary probability function of the IMR90 locus is consistently recapitulated by the different models. Adapted from [28].
The boundary function has local peaks corresponding to the main TAD boundaries visible in the median distance map of the IMR90 locus and, interestingly, this function is non-zero across the entire imaged region, indicating a substantial cell-to-cell variability in the genomic position of TAD boundaries. We found that the different models return boundary probabilities with profiles consistent with imaging data, as quantified by the high Pearson correlations between models and experiment: r = 0.83, r = 0.63 and r = 0.65, respectively, for LE, SBS and LE + SBS [28].
Taken together, these analyses show that both loop-extrusion and phase-separation processes can quantitatively explain chromatin structure at the cell population-averaged level. In particular, loop-extrusion (LE) is the best to capture the boundary probability function, while phase-separation based models (SBS and LE + SBS) return overall higher correlation values with ensemble-averaged distance data.

All-against-All Comparison between Single-Molecule Imaged and Model-Derived 3D Structures
As a further step, we aimed to investigate the structural predictions of our polymer models at the single-molecule level. In particular, to assess whether our models do provide a bona-fide representation of the imaged chromatin structures, we used a computational method based on the root-mean-square deviation (RMSD) criterion [39,46]. In brief, the algorithm performs a roto-translational alignment of two conformations (e.g., experimental and model derived) by minimizing the RMSD of their particle positions; in this way, each experimental 3D structure from imaging is univocally associated to a corresponding best-matching conformation of the models by searching for the minimum RMSD of their coordinates. To fairly compare model and imaged 3D conformations, a zscore is performed on both sets of coordinates. To efficiently run the RMSD comparison, we used the MDAnalysis Python library, which employs the fast quaternion-based characteristic polynomial (QCP) algorithm to calculate the least RMSD between two structures [47]. In Figure 3 we report an example of best-matching conformations identified by the RMSD method: in this case, for instance, the single-cell imaged distance map is characterized by two main TAD-like structures, corresponding to distinct globules in 3D space, which are consistently recapitulated by the different models ( Figure 3). The boundary function has local peaks corresponding to the main TAD boundaries visible in the median distance map of the IMR90 locus and, interestingly, this function is non-zero across the entire imaged region, indicating a substantial cell-to-cell variability in the genomic position of TAD boundaries. We found that the different models return boundary probabilities with profiles consistent with imaging data, as quantified by the high Pearson correlations between models and experiment: r = 0.83, r = 0.63 and r = 0.65, respectively, for LE, SBS and LE + SBS [28].
Taken together, these analyses show that both loop-extrusion and phase-separation processes can quantitatively explain chromatin structure at the cell population-averaged level. In particular, loop-extrusion (LE) is the best to capture the boundary probability function, while phase-separation based models (SBS and LE + SBS) return overall higher correlation values with ensemble-averaged distance data.

All-against-All Comparison between Single-Molecule Imaged and Model-Derived 3D Structures
As a further step, we aimed to investigate the structural predictions of our polymer models at the single-molecule level. In particular, to assess whether our models do provide a bona-fide representation of the imaged chromatin structures, we used a computational method based on the root-mean-square deviation (RMSD) criterion [39,46]. In brief, the algorithm performs a roto-translational alignment of two conformations (e.g., experimental and model derived) by minimizing the RMSD of their particle positions; in this way, each experimental 3D structure from imaging is univocally associated to a corresponding best-matching conformation of the models by searching for the minimum RMSD of their coordinates. To fairly compare model and imaged 3D conformations, a z-score is performed on both sets of coordinates. To efficiently run the RMSD comparison, we used the MDAnalysis Python library, which employs the fast quaternion-based characteristic polynomial (QCP) algorithm to calculate the least RMSD between two structures [47]. In Figure 3 we report an example of best-matching conformations identified by the RMSD method: in this case, for instance, the single-cell imaged distance map is characterized by two main TAD-like structures, corresponding to distinct globules in 3D space, which are consistently recapitulated by the different models (Figure 3).
To check the statistical significance of the RMSD analysis, we considered as control the RMSD distribution between random pairs of imaged conformations (Figure 4). For each type of model, we found that the experiment-model best-match distribution is statistically distinguishable from the control (Figure 4a, two-sided Mann-Whitney test p-value = 0), with less than 5% of the entries of the former distribution that fall above the first decile of control (Figure 4b) [28]. Overall, the all-against-all RMSD analysis show that both loop- Example of experiment-model best-matching 3D structures, along with their corresponding single-molecule distance maps, as identified by the RMSD method. Adapted from [28].
To check the statistical significance of the RMSD analysis, we considered as control the RMSD distribution between random pairs of imaged conformations (Figure 4). For each type of model, we found that the experiment-model best-match distribution is statistically distinguishable from the control (Figure 4a, two-sided Mann-Whitney test p-value = 0), with less than 5% of the entries of the former distribution that fall above the first decile of control (Figure 4b) [28]. Overall, the all-against-all RMSD analysis show that both loop-extrusion and phase-separation single-molecule conformations significantly represent the ensemble of single-cell imaged 3D structures of the studied IMR90 cell region.

Discussion
In this work, we discussed the application of polymer physics models to investigate the mechanisms that establish the complex 3D structure of the genome as observed by recent single-cell imaging data [15]. We focused on two main, distinct physical processes that are supported by growing experimental evidence, i.e., loop-extrusion and phase-separation. In the first mechanism, an active motor (e.g., Cohesin or Condensin) extrudes DNA loops between specific anchor sites (envisaged as CTCF binding sites with opposite orientation) in an out-of-equilibrium process; in the second, chromatin contacts between distal genomic sites are established by diffusing molecular agents (such as transcription factors) or direct DNA interactions that, sustained by the thermal bath, spontaneously bridge their cognate sites.
By using as benchmark super-resolution microscopy data available for a 2 Mb wide chromosome region in human fibroblast cells, we showed that both mechanisms can quantitatively recapitulate single-cell imaged conformations, indicating that they can coexist in shaping chromosome folding [28]. By allowing a deeper understanding of the mechanisms driving the organization of the genome, models from polymer physics can  Example of experiment-model best-matching 3D structures, along with their corresponding single-molecule distance maps, as identified by the RMSD method. Adapted from [28].
To check the statistical significance of the RMSD analysis, we considered as control the RMSD distribution between random pairs of imaged conformations (Figure 4). For each type of model, we found that the experiment-model best-match distribution is statistically distinguishable from the control (Figure 4a, two-sided Mann-Whitney test p-value = 0), with less than 5% of the entries of the former distribution that fall above the first decile of control (Figure 4b) [28]. Overall, the all-against-all RMSD analysis show that both loop-extrusion and phase-separation single-molecule conformations significantly represent the ensemble of single-cell imaged 3D structures of the studied IMR90 cell region.

Discussion
In this work, we discussed the application of polymer physics models to investigate the mechanisms that establish the complex 3D structure of the genome as observed by recent single-cell imaging data [15]. We focused on two main, distinct physical processes that are supported by growing experimental evidence, i.e., loop-extrusion and phase-separation. In the first mechanism, an active motor (e.g., Cohesin or Condensin) extrudes DNA loops between specific anchor sites (envisaged as CTCF binding sites with opposite orientation) in an out-of-equilibrium process; in the second, chromatin contacts between distal genomic sites are established by diffusing molecular agents (such as transcription factors) or direct DNA interactions that, sustained by the thermal bath, spontaneously bridge their cognate sites.
By using as benchmark super-resolution microscopy data available for a 2 Mb wide chromosome region in human fibroblast cells, we showed that both mechanisms can quantitatively recapitulate single-cell imaged conformations, indicating that they can coexist in shaping chromosome folding [28]. By allowing a deeper understanding of the mechanisms driving the organization of the genome, models from polymer physics can

Discussion
In this work, we discussed the application of polymer physics models to investigate the mechanisms that establish the complex 3D structure of the genome as observed by recent single-cell imaging data [15]. We focused on two main, distinct physical processes that are supported by growing experimental evidence, i.e., loop-extrusion and phaseseparation. In the first mechanism, an active motor (e.g., Cohesin or Condensin) extrudes DNA loops between specific anchor sites (envisaged as CTCF binding sites with opposite orientation) in an out-of-equilibrium process; in the second, chromatin contacts between distal genomic sites are established by diffusing molecular agents (such as transcription factors) or direct DNA interactions that, sustained by the thermal bath, spontaneously bridge their cognate sites.
By using as benchmark super-resolution microscopy data available for a 2 Mb wide chromosome region in human fibroblast cells, we showed that both mechanisms can quantitatively recapitulate single-cell imaged conformations, indicating that they can coexist in shaping chromosome folding [28]. By allowing a deeper understanding of the mechanisms driving the organization of the genome, models from polymer physics can be used for real-world applications in real experimental contexts [48]. For example, validated polymer models can be efficiently employed to impute missing values or reduce noise effects in large imaging datasets. Additionally, and importantly, they can be employed to predict in silico the structural effects of disease-associated mutations, linked, for instance, to congenital disorders [11,12] or cancer [13,49].
Overall, those studies show that novel data from microscopy can be complemented with quantitative models from physics to understand the mechanisms and function of