The Spatial Organization of Bacterial Transcriptional Regulatory Networks

The transcriptional regulatory network (TRN) is the central pivot of a prokaryotic organism to receive, process and respond to internal and external environmental information. However, little is known about its spatial organization so far. In recent years, chromatin interaction data of bacteria such as Escherichia coli and Bacillus subtilis have been published, making it possible to study the spatial organization of bacterial transcriptional regulatory networks. By combining TRNs and chromatin interaction data of E. coli and B. subtilis, we explored the spatial organization characteristics of bacterial TRNs in many aspects such as regulation directions (positive and negative), central nodes (hubs, bottlenecks), hierarchical levels (top, middle, bottom) and network motifs (feed-forward loops and single input modules) of the TRNs and found that the bacterial TRNs have a variety of stable spatial organization features under different physiological conditions that may be closely related with biological functions. Our findings provided new insights into the connection between transcriptional regulation and the spatial organization of chromosome in bacteria and might serve as a factual foundation for trying spatial-distance-based gene circuit design in synthetic biology.


Introduction
A transcriptional regulatory network (TRN) is composed of the regulatory relationships between transcription factors (TF) and their target genes and serves as the information processing hub to respond to intracellular and environmental signals [1][2][3]. The regulatory relationships in TRN could be positive (activation) or negative (repression) or both (conditionally activation or repression), and the whole TRN is usually scale-free with small numbers of highly connected nodes [4]. Since a TRN is a directed graph, it is usually organized in a hierarchical structure, and genes of different levels in the hierarchy usually bear different functions [5][6][7][8][9]. There are building blocks in TRNs called network motifs that are connected subgraphs that appear more frequently in real networks than in the corresponding randomized networks. Typical network motifs in TRNs are negative autoregulation, feed-forward loop (FFL) and single input module (SIM). The specific topologies of these network motifs give them unique dynamic functions; for example, FFLs have the function of filtering noisy signals and speeding up responses, while SIMs can switch ON/OFF multiple target genes in a temporal sequential order, often acting as a timer during biological development [10]. In a TRN, various network motifs are intertwined to produce rich kinetic functions, which jointly regulate the growth and development of an organism.
In early studies of kinetic modeling, the distribution of genes in space and the time required for transcription factors to search for target genes were often ignored [11]. In recent years, with the development of single-molecule fluorescence tracer technology, it has been found that transcription factors search for their target genes much longer than expected, and the time required for a single lacI protein to find its target gene is 3-5 min [12], while a single dCas9 protein finding its target gene takes even 5 h [13]. This is because the higher the specificity of a transcription factor to its binding site, the lower the search speed, and a compromise is needed between these two. In addition, the spatial distance between a transcription factor and a target gene significantly affects the rate and reliability of transcriptional regulation in bacterial cells, and shorter search time leads to higher regulation efficiency [14]. When a transcription factor is far away from its target gene, the transcription factor may be unable to find the target gene [15]. Moreover, functionally related genes such as the genes whose products participate in protein-protein interactions or in the same metabolic pathways were found spatially clustered in cells [16,17]. The spatial distance between genes also affects the dynamics of gene network. Van et al. studied the dynamics of three-node Repressilator and found that different distribution patterns of the three genes in space will affect whether the oscillation behavior can appear and stably exist [18]. It has been acknowledged that the chromosome architecture affects the basic biological processes of bacteria such as replication, transcription and gene regulation [19,20]. On the other hand, these processes also in turn affect the chromosome architecture through evolutionary pressure [21][22][23].
In recent years, chromatin interaction data of bacteria such as Escherichia coli [24,25] and Bacillus subtilis [26][27][28] have been published, which makes it possible to study the spatial organization of transcriptional regulation on the scale of the entire transcriptional regulatory network. E. coli and B. subtilis are model strains of Gram-negative and Gram-positive bacteria, respectively, with clear genetic background and relatively complete transcriptional regulation data. At the same time, the 3D architecture of chromosomes, the spatial distribution of RNA polymerase and ribosomes and the growth and replication processes are quite different for these two bacterial species [19,29]. In this study, E. coli and B. subtilis were used as the representative species of bacteria, and their TRNs were constructed and combined with the chromatin interaction data under multiple culture conditions. The spatial organization of TRNs was explored in terms of different TRN features such as regulatory direction (positive and negative), central nodes, network hierarchy and network motifs. We found that there are stable organizational features of TRNs under various conditions for the two bacterial species. The results provided new insights into the organization principles of TRN in 3D physical space and might guide spatial-distance-based gene circuit design in synthetic biology.

Chromatin Interaction Data
The chromatin interaction data of E. coli were derived from the original interaction matrix of 3C-seq for wild-type MG1655 strain by Lioy et al. [25]; the dataset corresponds to 5 culture conditions: LB medium early exponential phase at 30 • C and 37 • C (E-LB30, E-LB37), minimal medium early exponential phase (OD 600 = 0.2) at 22 • C and 30 • C (E-MM22, E-MM30) and minimal medium stationary phase (OD 600 = 2) at 30 • C (E-sMM30). The chromatin interaction data of B. subtilis were derived from the Hi-C dataset of wildtype Bacillus subtilis HM1320 by Marbouty et al. [26]; the dataset corresponds to 3 culture conditions: LB medium (B-LB), minimal medium (B-MM) and LB medium cultured then rifampicin treated for 20 min (B-Rif); all cells were grown at 30 • C and harvested during the exponential phase (OD 600 = 0.3). The original chromatin interaction matrixes of E. coli and B. subtilis were normalized using sequential component normalization (SCN) [30]. The normalized interaction matrixes are shown in Figure S1.

Definition of Bins and Localization of Genes
The spatial distance of two genes is represented by the chromatin interaction frequency between two bins (a bin is defined as a DNA fragment of fixed length) that the genes belong to. The bin size is 5 Kb for E. coli and 4 Kb for B. subtilis, respectively. When the gene length is less than the length of a single bin, if a gene has more than a half of its total length in a bin, then the gene belongs to that bin; when the gene length is greater than the length of a single bin, if the gene length in a bin exceeds half of the bin length, then the gene is considered to contain that bin. For the interaction frequency between two genes, if either of them (or both) span multiple bins, the average interaction frequency over all the bins the two genes involved is taken as the interaction frequency between the two genes. In data comparison, the statistical significance test was performed using the function wilcox.test in the "stats" package of R. The following convention symbols indicating significance levels were used: ns: p > 0.05; *: p ≤ 0.05; **: p ≤ 0.01; ***: p ≤ 0.001; ****: p ≤ 0.0001. The default significance level is 0.05 unless otherwise specified.

Constructing 3D Models of Bacterial Chromosomes
The normalized interaction matrixes were input into EVR [31] for the reconstruction of 3D chromosome structure models, and the default parameters were adopted: smoothing factor = 2, alpha = 0.5, itemsize = 20.0, mis_dis = 0.001, max_dis = 0.2, iter_num = "auto", number of threads = "auto", seed = "auto". The generated 3D structure models of bacterial chromosomes were visualized using PyMol (https://pymol.org/, accessed on 26 October 2022). The spatial distance between genes is defined as the Euclidean distance calculated based on the 3D coordinates of genes (bins) in space.

Construction of Transcriptional Regulatory Networks
The TRN of E. coli was constructed based on the transcriptional regulation experimental dataset from RegulonDB (version 9.0) [32]. After combining all regulatory relationships, the resulting TRN contained 2273 protein-coding genes (207 transcription factors, 7 sigma factors) and 6788 regulatory edges. The transcriptional regulation data of B. subtilis were adopted from SubtiWiki 2.0 [33], and the extracted gene type was protein-coding or pseudogenes. The resulting TRN contained 2400 nodes (including 192 transcription factors) and 5026 regulatory edges. The reconstructed TRNs (Supplementary Data) of E. coli and B. subtilis are illustrated in Figure S2. Since chromatin interaction is what we were concerned with, a regulatory edge was abandoned if its two nodes (regulator and target) were located in the same bin of chromatin partition. Consequently, there were 4562 positive (activation) and 1959 negative (repression) regulatory edges in the E. coli TRN; there were 3255 positive and 1714 negative regulatory edges in the B. subtilis TRN.

Determination of Network Hierarchy
This study used the method from a previous publication [34] to divide the bacterial TRNs into four layers: top, middle, bottom, and target. The division algorithm is as follows: (1) The genes with no out-degree in the TRN are classified as the target layer (TG), which are mainly the genes encoding enzymes and structural components of the cell; removal of all genes in the target layer resulted in a TRN containing only transcription factors.
(2) In the resulting TRN of only transcription factors, the gene set with "strongly connected component (SCC)" attribute is collapsed into a single super node, where a SCC is a fully connected subnetwork in which each pair of nodes u and v must have u to v and v to u edges; self-regulation loop is also a kind of SCC. The edges of other nodes in the network that are connected to the genes in SCCs are replaced by the connections of the former to the super nodes, thereby resulting in a directed acyclic graph. (3) In the directed acyclic graph, the nodes with no in-degrees are classified as the top layer (T), the nodes with no out-degrees are classified as the bottom layer (B), and the isolated transcription factors are also classified into this layer; the remaining nodes are classified into the middle layer (M).

Discovery of Network Motifs
The software mfinder1.21 [35] was used in searching FFLs in the E. coli and B. subtilis TRNs with parameter settings: r = 1000 times. In the search results, FFLs were identified as network motifs with occurrence frequency = 4599 and z-score = 4.88 for E. coli and occurrence frequency = 2066 and z-score = 9.50 for B. subtilis. SIM was defined according to Konagurthu and Lesk [36]: only one parent (regulating) node and at least two children (regulated) nodes and the in-degree (number of incoming edges) for each child node strictly equal one within the full network not just within the sub-graph. According to this definition, 30 and 58 SIMs were identified in the E. coli and B. subtilis TRNs, respectively.

Chromatin Interaction within TRN
To explore the spatial organization of bacterial TRN, the chromatin interaction data and TRNs for the two model organisms Escherichia coli and Bacillus subtilis were utilized. For E. coli, the chromatin interaction data were measured at 5 culture conditions [25] and denoted as E-LB30, E-LB37, E-MM22, E-MM30 and E-sMM30, correspondingly; for B. subtilis, the chromatin interaction data were measured at 3 culture conditions [26] and denoted as B-LB, B-MM and B-Rif, correspondingly; see Methods for details. The TRNs were constructed based on the data from RegulonDB (version 9.0) [32] for E. coli and from SubtiWiki2.0 [33] for B. subtilis (see Methods for details). First, the chromatin interaction frequencies within TRN (namely, between the regulators and their target genes in the two bacterial TRNs) (denoted as TRN) were calculated and then compared with the chromatin interaction frequencies among all the gene pairs in the whole genome (denoted as All). As shown in Figure 1 and Table S1, TRN is significantly higher than All, meaning that the genes with regulation relationships tend to be closer to each other in the 3D space. For another control, "All" was replaced with "Random" where the regulator-target genepairing relationships in TRN were randomly shuffled, and the comparison result shows exactly the same trend ( Figure S3). These results extend our previous finding that genes in the same regulon or pathway of E. coli tend to minimize their spatial distances [17] to the genes in the TRNs of both Gram-negative (E. coli) and Gram-positive (B. subtilis) bacterial species.
We compared the chromatin interaction frequencies of positive and negative regulations with TRN in the two bacterial TRNs ( Figure 1 and Table S1). We found that the frequencies of positive regulation in almost all culture conditions were significantly lower than the frequency within TRN, while the frequencies of negative regulation were significantly higher than the frequency within TRN. It suggests that negative regulation (repression) prefers shorter spatial distance, while positive regulation (activation) is just the opposite, although the underlying mechanism is still not clear.

Chromatin Interaction of Central Nodes in TRN
In a TRN, different nodes are of different importance, and centrality can measure the significance of a node's impact on the network topology or biological function. We calculated the in-degree, out-degree, betweenness and closeness centrality values for each node and selected the nodes with the top 20% highest centrality values as the central nodes (42 for E. coli and 38 for B. subtilis), denoted as in-hubs, out-hubs, bottlenecks and centers, respectively. We examined the chromatin interaction frequencies intra the central nodes and between the central nodes and other connected nodes in TRN. Compared with the chromatin interaction frequency of TRN ( Figure S4), we found that the chromatin interaction frequencies intra in-hubs, out-hubs and bottlenecks were usually higher than that of the TRN, while the chromatin interaction frequencies between the out-hubs, bottlenecks and their corresponding connected partners were usually lower than that of the TRN. The frequency of chromatin interaction that centers participate in seldom had significant difference with that of TRN for most of the culture conditions of the two species ( Figure S4).
The TFs in out-hubs and bottlenecks need to regulate a large number of genes and usually have a high expression level. For instance, sigma factors are the essential TFs for transcription initiation [37] and are involved in more than a quarter of regulation edges. Most of the sigma factors (6/7 in E. coli and 13/20 in B. subtilis) are attributed as central nodes. Shorter spatial distances between central nodes imply a clustering trend of these genes (TFs) in space, which may benefit from the so-called transcription factories [38] in bacteria to fulfill cost-effective transcription. On the other hand, a huge number of genes are regulated by central nodes, which means these genes are widely distributed in the genome, resulting in relatively lower (than TRN) interaction frequencies between the central nodes and their connected partners. We compared the chromatin interaction frequencies of positive and negative regulations with TRN in the two bacterial TRNs ( Figure 1 and Table S1). We found that the frequencies of positive regulation in almost all culture conditions were significantly lower than the frequency within TRN, while the frequencies of negative regulation were significantly higher than the frequency within TRN. It suggests that negative regulation (repression) prefers shorter spatial distance, while positive regulation (activation) is just the opposite, although the underlying mechanism is still not clear.

Chromatin Interaction of Central Nodes in TRN
In a TRN, different nodes are of different importance, and centrality can measure the significance of a node's impact on the network topology or biological function. We calculated the in-degree, out-degree, betweenness and closeness centrality values for each node and selected the nodes with the top 20% highest centrality values as the central nodes (42 for E. coli and 38 for B. subtilis), denoted as in-hubs, out-hubs, bottlenecks and centers, respectively. We examined the chromatin interaction frequencies intra the central nodes and between the central nodes and other connected nodes in TRN. Compared with the chromatin interaction frequency of TRN ( Figure S4), we found that the chromatin interaction frequencies intra in-hubs, out-hubs and bottlenecks were usually higher than that of Under almost all culture conditions, the chromatin interaction frequency within TRN (denoted as TRN) is significantly higher than that of all the gene pairs in the whole genome (denoted as All); the interaction frequency of positive regulation (denoted as P) is significantly lower than TRN; the interaction frequency of negative regulation (denoted as N) is significantly higher than TRN. The symbols on the figure indicate statistical significance levels: ns: p > 0.05; *: p ≤ 0.05; **: p ≤ 0.01; ***: p ≤ 0.001; ****: p ≤ 0.0001.

Chromatin Interaction in the Hierarchy of TRN
According to the network topology, TRNs can be divided into four layers: top, middle, bottom and target (see Methods for details). Except for in the target layer, the genes are transcription factors. In the top, middle and bottom layers ( Figure S5), there are 8, 75 and 131 genes for E. coli and 6, 38 and 148 genes for B. subtilis, respectively, and the two bacterial TRNs show a pyramidal hierarchy. We examined the regulatory relationships within and between the layers and found that the majority of the target layer genes in E. coli and B. subtilis are directly regulated by the middle layer genes and that the regulations intra the middle layer are also dense. In contrast, the bottom layer, even if the number of genes in this layer is more than the middle layer, has less intra regulations. There are many self-regulating edges (76/89 in E. coli and 76/86 in B. subtilis) in the bottom layer and much fewer self-regulating edges (53/335 in E. coli and 24/127 in B. subtilis) in the middle layer. It is evident that the middle layer is the information processing and transmission hub which can make complex decisions. As a mediator level, the bottom layer plays a cascading role to enrich the regulation forms of target genes.
The chromatin interaction frequencies within and between hierarchical levels are compared with that of TRN ( Figure 2, Table S2). Interestingly, the spatial organization of the middle-bottom-target hierarchical structure shows a high degree of stability in all culture conditions of E. coli and B. subtilis: the chromatin interaction frequencies intra the middle layer (middle-middle) and intra the bottom layer (bottom-bottom) and between the bottom-target layers are significantly higher than that of the TRN; the chromatin interaction frequencies between middle-bottom layers and between middle-target layers are significantly lower than that of the TRN; others are of no significant difference compared with TRN.

Chromatin Interaction in the Network Motifs of TRN
Network motifs are subgraphs with specific dynamic functions that are significantly more enriched in real networks than in random networks [10]. A total of 4599 FFLs and 30 SIMs were found in the E. coli TRN; 2066 FFLs and 58 SIMs were found in the B. subtilis TRN (see Methods for details). Only the FFLs with chromatin interaction frequencies on all three regulation edges and the SIMs with frequencies on at least two regulation edges were kept for analysis. The frequencies averaged over all the edges in FFLs and SIMs were calculated and compared with the frequency of the TRN. Under all culture conditions of E. coli, the chromatin interaction frequency within FFLs was significantly higher than that of TRN, while the frequency within SIMs was of no significant difference from that of the TRN (Figure 3, Table S3); under all culture conditions of B. subtilis, the chromatin interaction frequencies within FFLs and SIMs were significantly higher than that of TRN. FFLs and SIMs are network subgraphs that perform specific functions, and the stronger interaction among their nodes (relative to other genes in the TRN) indicates closer spatial distances among them, which means that the spatial distance is indeed a mechanism to ensure the performance of gene circuit function [18].

Chromatin Interaction in the Network Motifs of TRN
Network motifs are subgraphs with specific dynamic functions that are significantly more enriched in real networks than in random networks [10]. A total of 4599 FFLs and 30 SIMs were found in the E. coli TRN; 2066 FFLs and 58 SIMs were found in the B. subtilis TRN (see Methods for details). Only the FFLs with chromatin interaction frequencies on all three regulation edges and the SIMs with frequencies on at least two regulation edges were kept for analysis. The frequencies averaged over all the edges in FFLs and SIMs were calculated and compared with the frequency of the TRN. Under all culture conditions of E. coli, the chromatin interaction frequency within FFLs was significantly higher than that of TRN, while the frequency within SIMs was of no significant difference from that of the TRN (Figure 3, Table S3); under all culture conditions of B. subtilis, the chromatin interaction frequencies within FFLs and SIMs were significantly higher than that of TRN. FFLs and SIMs are network subgraphs that perform specific functions, and the stronger interaction among their nodes (relative to other genes in the TRN) indicates closer spatial distances among them, which means that the spatial distance is indeed a mechanism to ensure the performance of gene circuit function [18]. When the chromatin interaction frequencies on the three edges in FFLs were considered separately, we found that the frequencies of the XY and XZ edges were significantly lower than that of the TRN, while that of the YZ edge was significantly higher than that of the TRN (Figure 4, Table S4), which means that gene X was farther away from genes Y and Z and the spatial distance between genes Y and Z was closer. This mode was highly stable and kept for almost all the physiological conditions of E. coli and B. subtilis. When the chromatin interaction frequencies on the three edges in FFLs were considered separately, we found that the frequencies of the XY and XZ edges were significantly lower than that of the TRN, while that of the YZ edge was significantly higher than that of the TRN (Figure 4, Table S4), which means that gene X was farther away from genes Y and Z and the spatial distance between genes Y and Z was closer. This mode was highly stable and kept for almost all the physiological conditions of E. coli and B. subtilis.
By definition, a SIM has one regulator and at least two target genes and can turn ON/OFF the expression of the target genes in a sequential manner owing to the temporal change in the concentration (expression level) of the regulator. As stated above, the chromatin interaction frequency within SIMs (averaged on all edges) was of no significant difference from that of the TRN in E. coli, which indicates that SIMs may have no special requirement for the absolute spatial distance between a regulator and its target genes. Nonetheless, it was found that the regulation edges within a SIM have similar chromatin interaction frequencies. The standard deviation (SD) of chromatin interaction frequencies on the edges within a SIM was calculated and compared with that of TRN. The statistical results are shown in Figure 5. Under all culture conditions of the two bacteria, the SD of chromatin interaction frequency within SIMs is less than that of the TRN. These results show that the chromatin interaction frequency within a SIM is of higher uniformity, which indicates that a regulator in a SIM has similar spatial distances to its target genes, the target genes encounter similar regulator concentrations at the same time point and the order of state (ON/OFF) switching of target genes is merely determined by the binding affinity between the regulator and the targets. In almost all cases, the chromatin interaction frequency between X and Y/Z in the feed-forward loop is significantly lower than TRN, while the interaction frequency between Y and Z is significantly higher than TRN. Significance level: p < 0.05.
By definition, a SIM has one regulator and at least two target genes and can turn ON/OFF the expression of the target genes in a sequential manner owing to the temporal change in the concentration (expression level) of the regulator. As stated above, the chromatin interaction frequency within SIMs (averaged on all edges) was of no significant difference from that of the TRN in E. coli, which indicates that SIMs may have no special requirement for the absolute spatial distance between a regulator and its target genes. Nonetheless, it was found that the regulation edges within a SIM have similar chromatin interaction frequencies. The standard deviation (SD) of chromatin interaction frequencies on the edges within a SIM was calculated and compared with that of TRN. The statistical results are shown in Figure 5. Under all culture conditions of the two bacteria, the SD of chromatin interaction frequency within SIMs is less than that of the TRN. These results show that the chromatin interaction frequency within a SIM is of higher uniformity, which indicates that a regulator in a SIM has similar spatial distances to its target genes, the target genes encounter similar regulator concentrations at the same time point and the order of state (ON/OFF) switching of target genes is merely determined by the binding affinity between the regulator and the targets. In almost all cases, the chromatin interaction frequency between X and Y/Z in the feed-forward loop is significantly lower than TRN, while the interaction frequency between Y and Z is significantly higher than TRN. Significance level: p < 0.05.

3D Modeling of the Bacterial Chromosome Structures
All the above results were based on chromatin interaction frequencies, what if using real 3D structure models of chromosomes? To answer this question, the 3D structure models of E. coli and B. subtilis chromosomes were constructed using EVR, a 3D modeling software tool particularly developed for bacterial chromosomes with high accuracy and robustness [31]. Based on the same set of chromatin interaction data ( Figure S1), the 3D

3D Modeling of the Bacterial Chromosome Structures
All the above results were based on chromatin interaction frequencies, what if using real 3D structure models of chromosomes? To answer this question, the 3D structure models of E. coli and B. subtilis chromosomes were constructed using EVR, a 3D modeling software tool particularly developed for bacterial chromosomes with high accuracy and robustness [31]. Based on the same set of chromatin interaction data ( Figure S1), the 3D chromosome structure models were constructed ( Figure 6). The analyses of the global TRN (including positive and negative regulations) (Figure 1), the hierarchical structure of TRN ( Figure 2) and the network motifs in TRN (Figures 3-5) were also conducted based on the spatial distances in the 3D chromosome models (Figures S6-S10). It could be found that the comparison results based on the 3D spatial distance are quite consistent with those based on the chromatin interaction frequency, although not identical. What should be noticed is the inverse relationship between chromatin interaction frequency and the spatial distance, namely, the higher the interaction frequency, the shorter the spatial distance. This inverse relationship is reflected in the opposite trends in the corresponding figures. Since 3D models are structural representations of interaction data, it is not surprising that the corresponding results are consistent. In these 3D models, the spatial arrangements of high-level regulatory genes in the 3D architecture of bacterial chromosomes are illustrated.

Spatial Effect in the Dynamic Function of Network Motifs
To conceptually illustrate the effect of spatial organization in gene regulation, we focused on network motifs and developed spatial models for their dynamic functions. By introducing the spatial positions of genes into the dynamic equations, we studied the effect of spatial distance between genes in network motifs. For FFL, the positions of three genes X, Y, Z can determine a plane, and the concentrations of the gene products (proteins) in the two-dimensional space are denoted as X(x, y), Y(x, y), Z(x, y), respectively. Without losing generality, we use a type-I coherent FFL (C1-FFL) [10] with AND gate to illustrate the spatial effect. For such a FFL, gene Y can be activated by gene X when the concentration of the gene X product at the position of gene Y is higher than a threshold δXY, and the gene Z can be activated by a combination of gene X and gene Y (namely, AND gate) when the concentrations of the products of gene X and gene Y at the position of gene Z are higher than the corresponding thresholds δXZ and δYZ, respectively. Considering the diffusion Figure 6. The distribution of high-level regulatory genes in the 3D architecture of bacterial chromosomes. In the 3D models of E. coli and B. subtilis chromosomes, the genes of the top (orange) and middle (blue) layers in the TRN hierarchy are shown as spheres, and the bigger red balls indicate the positions of oriC loci. The E. coli chromosome resembles a letter C, and the top-layer genes (orange) are mainly distributed at both ends of the letter C. The B. subtilis chromosome is spiral-shaped, and the top-layer genes (orange) are mainly distributed in the middle and one end of the spiral. The middle-layer genes (blue) seem evenly distributed in the 3D architecture.

Spatial Effect in the Dynamic Function of Network Motifs
To conceptually illustrate the effect of spatial organization in gene regulation, we focused on network motifs and developed spatial models for their dynamic functions. By introducing the spatial positions of genes into the dynamic equations, we studied the effect of spatial distance between genes in network motifs. For FFL, the positions of three genes X, Y, Z can determine a plane, and the concentrations of the gene products (proteins) in the two-dimensional space are denoted as X (x, y) , Y (x, y) , Z (x, y) , respectively. Without losing generality, we use a type-I coherent FFL (C1-FFL) [10] with AND gate to illustrate the spatial effect. For such a FFL, gene Y can be activated by gene X when the concentration of the gene X product at the position of gene Y is higher than a threshold δ XY , and the gene Z can be activated by a combination of gene X and gene Y (namely, AND gate) when the concentrations of the products of gene X and gene Y at the position of gene Z are higher than the corresponding thresholds δ XZ and δ YZ , respectively. Considering the diffusion process, the product of a source gene (regulator) needs to spread to the target gene, and the further the distance between the two genes, the lower the concentration of the product of source gene when it reaches the position of the target gene. Therefore, the dynamic equations of such a "C1-FFL with AND gate" are as follows: In this equation set, α X /β X /k X , α Y /β Y /k Y , α Z /β Z /k Z are the production/degradation/ diffusion rates of the products of genes X, Y, Z, respectively; S X is the signal to instantaneously transform the gene X product from inactive state to active state; and S XY , S XZ and S YZ are the activation signals for gene Y and gene Z, which depend on the corresponding thresholds as listed in Equations (4)- (6). Suppose the Euclidean distances between genes are denoted as D X_Y , D X_Z and D Y_Z . To illustrate the influence of spatial distance on FFL function, we designed two FFLs with different spatial organization. As shown in Figure 7A, for FFL-1 (grey edges), suppose D X_Y1 = D X_Z1 = 2 √ 2, D Y1_Z1 = 4; for FFL-2 (yellow edges), suppose Y2, Z2 are relatively far from X (D X_Y2 = 2 √ 5, D X_Z2 = 3 √ 2) and close to each other (D Y2_Z2 = √ 2). Except the spatial distribution, the other parameters for these two FFLs are identical. By solving the equation set (1)-(6) (see Table S5 for parameter setting), the time-course curves for the two FFLs are shown in Figure 7A. As can be seen, for the two pulse (S X ) signals of shorter and longer duration, FFL-1 responds to both the two signals, while FFL-2 just responds to the longer one, which means the longer spatial distances between X_Y2 and X_Z2 in FFL-2 help it filter the signal, but meanwhile, the shorter spatial distance between Y2 and Z2 can accelerate the response of Z2.
Similarly, for SIM with P (parent) as the regulator and S (son) as the target gene, we have the following dynamic equations: ∂S (x,y)  Table S5 for the coordinates of X, Y1, Z1, Y2, Y2). FFL-1 responds to both signals, while FFL-2 just responds to the longer one, which means different spatial distributions of genes in FFL can affect dynamic function. (B). In SIM-1 (grey edges), the distances between the three child nodes (S1, S2, S3) and the parent node P are assumed to be identical, and the three child nodes can be regulated in a correct time order. In SIM-2 (yellow edges), the distances between the child nodes and the parent node are randomly arranged, and the original time order disappears. This result indicates that the spatial distribution of genes in SIM can affect dynamic function. Time could have arbitrary unit because it is just for conceptual illustration.
Similarly, for SIM with P (parent) as the regulator and S (son) as the target gene, we have the following dynamic equations:   Table S5 for the coordinates of X, Y1, Z1, Y2, Y2). FFL-1 responds to both signals, while FFL-2 just responds to the longer one, which means different spatial distributions of genes in FFL can affect dynamic function. (B). In SIM-1 (grey edges), the distances between the three child nodes (S 1 , S 2 , S 3 ) and the parent node P are assumed to be identical, and the three child nodes can be regulated in a correct time order. In SIM-2 (yellow edges), the distances between the child nodes and the parent node are randomly arranged, and the original time order disappears. This result indicates that the spatial distribution of genes in SIM can affect dynamic function. Time could have arbitrary unit because it is just for conceptual illustration.
The parameters in this equation set have the same meaning as the corresponding parameters in FFL except that X and Y are replaced with P and S. SIM has the function of sequentially switching ON/OFF the target genes in a temporal order. To illustrate the effect of spatial distance on this function, we designed two SIMs with different spatial distributions. As shown in Figure 7B, for SIM-1 (grey edges), the spatial distances between the regulator P and the targets S i (i = 1, 2, 3) are assumed to be identical (D P_S1 = D P_S2 = D P_S3 ), while for SIM-2 (yellow edges), the spatial distances are assumed to be all different (D P_S1 > D P_S2 > D P_S3 ). Suppose the activation thresholds of the target genes are as follows: δ P_S1 = δ P_S1 < δ P_S2 = δ P_S2 < δ P_S3 = δ P_S3 . By solving the equation set (7)-(9) (see Table S5 for parameter setting), the time-course curves for the two SIMs are shown in Figure 7B. As can be seen, SIM-1 switches on the targets in a correct time order (namely, 1-2-3 sequentially), while the time order for SIM-2 is not correct, which means the uniformity of distances of the regulation edges is important for SIMs to function properly. Although these models are quite rough in parameter settings, the results of dynamic behavior analysis for FFL and SIM clearly indicate that the spatial distribution of genes in network motifs has great effects on their biological functions.

Discussion
Previous studies on 3D genomes mainly focused on the organization and function of chromatin conformation [39]. Although it has been noticed that the spatial distance between a TF and its target gene may affect the efficiency and reliability of transcriptional regulation [12][13][14][15], these studies had to be concentrated on several or a small set of genes due to the lack of genome-wide information for gene spatial position. This work for the first time combined chromatin interaction data with the TRNs of E. coli and B. subtilis and disclosed the spatial organization features of the bacterial TRNs, which provided new insights into prokaryotic 3D genomics and systems biology.
The result that chromatin interaction between genes with regulation relationships is stronger than that of the genomic overall (or random pairing) is in line with our previous finding that chromatin interaction within biological pathways is not random [17] and supports the connection between 3D genome organization and gene regulation [40]. It was found that positive regulation requires a larger but negative regulation requires a smaller spatial distance compared with the TRN, which indicates that different regulation types have different requirements on the spatial arrangement of genes in 3D space to better fulfill their functions. It was also found that the chromatin interactions within FFLs are even stronger than the chromatin interaction of TRN and the chromatin interaction on the Y to Z edge is stronger than that on the other two edges of a FFL, which is quite stable among different experimental conditions. Previous work has shown that FFLs, especially coherent FFLs, can enhance the robustness of TRN [41], which coupled with our findings in this work indicates that FFLs are important for the organization and maintenance of bacterial TRNs. Although the chromatin interaction frequency within SIM is of no significant difference from that of TRN, it has higher uniformity, which may be important for a SIM to function as a timer with the least interference from the spatial distance differences between the regulator and all its target genes. The spatial models of FFL and SIM dynamics were developed to rationalize the above findings (explanations), and the simulation results clearly demonstrate the effects of spatial distance (namely, the diffusion process of regulator) on the functions of these network motifs.
It has long been recognized that genes are not randomly distributed on the linear genome and that genes with regulation relationships have optimized relative positions to each other [42][43][44]. Here we showed that genes in bacterial TRNs have optimized spatial arrangements, which extends the previous observations from 1D to 3D space. What should be emphasized is that the 1D and 3D arrangements of genes are not mutually exclusive but complementary. Indeed, the 3D arrangement could be another layer of optimization in addition to the linear arrangement of genes in the genome. One may expect that during long-term evolution, the gene positions in the linear genome might be optimized by rearrangements, but before that happens, the gene positions in the 3D space might be optimized first to better fulfill the need of biological functions. Such an argument might be explored in the future when more data are available for evolutionary analysis.
Our results are based on the chromatin interaction data and the TRNs of two bacteria species. Similar analysis may be extended to other prokaryotes and eukaryotes when data are available to examine if these features are general among organisms. Actually, our previous work in yeast has found opposite overall trends compared with the current bacterial results [45], possibly due to the obvious difference between eukaryotes and prokaryotes (namely, the existence of karyotheca), but more species should be studied to obtain general conclusions. Meanwhile, our findings suggest that the spatial arrangement of genes has important effects on their regulation relationships and thus their biological functions, and the spatial effect may be exploited in practice to modulate gene expression based on 3D distance through genome editing approaches such as TALEN or CRISPR [46,47] to alter gene positions, which lays a factual foundation for trying spatial-distance-based gene circuit design in synthetic biology [48].
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/microorganisms10122366/s1, Tables S1-S4: Statistical tests for the comparisons of chromatin interaction frequencies. Table S5: Parameter setting for the conceptual models to demonstrate the spatial effect in the dynamic function of network motifs. Figure S1: The normalized chromatin interaction frequency matrixes used in this study. Figure S2: The reconstructed transcriptional regulatory networks (TRNs) of Escherichia coli and Bacillus subtilis. Figure S3: Comparison of chromatin interaction frequencies in global TRN. Figure S4: Chromatin interaction frequencies of central nodes compared with TRN using Wilcoxon rank-sum tests. Figure S5: The hierarchical structures of Escherichia coli and Bacillus subtilis TRNs. Figure S6: Comparison of spatial distances between gene pairs in global TRN. Figure S7: The spatial organization of TRN hierarchy based on 3D distance. Figure S8: Comparison of spatial distances between gene pairs in network motifs with TRN. Figure S9: Comparison of spatial distances between gene pairs in FFL edges with TRN. Figure S10: The distribution of standard deviation (SD) of spatial distance within SIMs (violin plot) and its comparison with that of TRN (red dashed line). Supplementary Data: The reconstructed TRNs of Escherichia coli and Bacillus subtilis, and the statistical data for Figures 1 and 3.