Abstract
Since the regulatory relationship between genes is usually non-stationary, the homogeneity assumption cannot be satisfied when modeling with dynamic Bayesian networks (DBNs). For this reason, the homogeneity assumption in dynamic Bayesian networks should be relaxed. Various methods of combining multiple changepoint processes and DBNs have been proposed to relax the homogeneity assumption. When using a non-homogeneous dynamic Bayesian network to model a gene regulatory network, it is inevitable to infer the changepoints of the gene data. Based on this analysis, this paper first proposes a data-based birth move (ED-birth move). The ED-birth move makes full use of the potential information of data to infer the changepoints. The greater the Euclidean distance of the mean of the data in the two components, the more likely this data point will be selected as a new changepoint by the ED-birth move. In brief, the selection of the changepoint is proportional to the Euclidean distance of the mean on both sides of the data. Furthermore, an improved Markov chain Monte Carlo (MCMC) method is proposed, and the improved MCMC introduces the Pearson correlation coefficient (PCCs) to sample the parent node-set. The larger the absolute value of the Pearson correlation coefficient between two data points, the easier it is to be sampled. Compared with other classical models on Saccharomyces cerevisiae data, synthetic data, RAF pathway data, and Arabidopsis data, the PCCs-ED-DBN proposed in this paper improves the accuracy of gene network reconstruction and further improves the convergence and stability of the modeling process.
1. Introduction
With the rapidly decreasing cost of genome sequencing technology and the accelerated acquisition of biological experimental data, one of the key challenges in systems biology is to deduce gene regulatory networks from gene expression data. Gene regulatory networks are of great significance in biological development, maintenance of homeostasis, and the occurrence and development of diseases [,,,]. Although a large number of known regulatory relationships in organisms have been documented in various databases, they are still far from the number of interactions and complex relationships that actually exist in biological systems. Experiments are generally able to measure the abundance of elements, but it is difficult to directly discover the complex relationships among them []. Structural learning of dynamic Bayesian networks (DBNs) plays an important role in the construction of gene regulatory networks []. The traditional (homogeneous) dynamic Bayesian network models assume the network parameters to stay constant across time. This can lead to biased results and wrong conclusions, as cellular regulatory processes can change over time. Although there have been various methods to relax the homogeneity assumption of the undirected graphical model [,], relaxing this restriction in DBN is still a popular research topic [,,,]. Various authors have proposed a combination of multiple changepoint processes and DBNs to relax the homogeneity assumption of DBNs [,]. Each time series segment is delimited by two changepoints. The parameters of DBNs are node specific, so the conditional probability of the parameters varies from segment to segment. In certain regularity conditions, the outstanding advantage of the above methods is the parameter independence and conjugacy of the prior; the parameters can be integrated out in the closed form in the likelihood. Therefore, the inference task is simplified to sample the network structure and the number and location of changepoints from the posterior distribution, which can be influenced by reversible jump Markov chain Monte Carlo (RJMCMC) [,,,].
Early, the Bayesian regression model (BR-DBN), proposed by Lèbre et al., became the basic probabilistic model for non-homogeneous DBNs []. However, the disadvantage of the BR-DBN model is that the network structure varies from segment to segment, which leads to overfitting and exaggerated inference uncertainty for short time series. Grzegorczyk et al. proposed various variants of BR-DBN. The network structure between different segments is fixed, and the parameters are changed [,,]. However, these above-mentioned variable point processes combined with DBN have limitations: data points from different segments must be divided into different components. If the allocation scheme for eight time data points is [11223311], the earlier allocation scheme can only approximate it as [11223344]. Unlike CPS-DBN with changepoints, MIX-DBN can assign data points to different components without the above restrictions [,]. However, it does not consider the time series of data points for time series data. Adjacent data points are more likely to be assigned to the same component than distant data points.
Subsequently, Grzegorczyk et al. proposed a non-homogeneous DBN with a hidden Markov model between changepoints (HMM-DBN). The HMM-DBN not only considers the time sequence of data points but also does not impose restrictions on the distribution of data points []. First, HMM-DBN introduces two pairs of new complementary MCMC moves—Gibbs sampler move and complementary inclusion move—to improve the assignment sampler, and second, assumes a first-order hidden Markov dependency structure for transition point inference. Based on the research of HMM-DBN, this paper makes full use of the latent prior knowledge hidden in the data to improve the accuracy of the changepoint and network inference and then improves the accuracy and the stability of the network structure and the convergence of the model.
Based on the above points of view, this paper first explores the relationship between each time data point as a changepoint and time-series data points of the component. Moreover, suppose that the larger the Euclidean distance of the data means on both sides of the time data point, the more likely it is to be a changepoint. Moreover, this idea is applied to the birth move of the changepoint to improve the rationality of the conversion point birth, and then the RJMCMC sampling time data point allocation is used. Second, the causal relationship between the Pearson correlation coefficient and the edge between the node data is discussed. Suppose that the higher the Pearson correlation coefficient of the node data, the more likely there is an edge. Finally, the accuracy and stability of the network structure and the convergence of the model are improved.
2. Bayesian Regression Model
A non-homogeneous DBN is an extension of a DBN in processing nonstationary time series data. A traditional dynamic Bayesian network generally contains two critical assumptions [].
(1) First-order Markov hypothesis: Assuming that the edges between nodes cannot span a time component, the value of a node at time is only related to the value of other nodes at that time and the node at time .
(2) Homogeneity hypothesis: The stable distribution of time data points generated by a homogeneous Markov chain requires that the model’s structure and parameters cannot change over time.
However, in the actual process, most of the time-series data are nonstationary, and the homogeneity assumption described above cannot be satisfied. Therefore, traditional dynamic Bayesian networks lose the modeling function of nonstationary data. To deal with nonstationary time series data, the changepoint process is added to the traditional dynamic Bayesian network. That is, the changepoint is added to the time sequence of time length , and it is divided into components.
The hierarchical structure of the non-homogeneous DBN proposed in this paper is shown in Figure 1, and the regression equation is:
Figure 1.
Hierarchy of PCCs-ED-DBN.
In each component k of non-homogeneous dynamic Bayes, where , is the number of nodes; is assigned to the observation vector of component k, the regression coefficient matrix of the regression model, is the set of parent nodes of node g in component k, is the observation matrix of the parent node set of node g in component k, is the noise parameter of the regression model, the mean is 0, and the variance is (Table 1 shows the actual meaning of each symbol). Then, the regression model likelihood is:
Table 1.
Hyperparameters and symbols.
For, , and impose a Gaussian prior and conjugated gamma prior, respectively:
The level-2 hyperparameter is fixed. Then, samples can be generated from the posterior distribution through Gibbs sampling [].
Assuming that the time data points have been allocated, is known. Then, the conditional distribution of and can be obtained as:
where is the maximum number of states allocated by node g, is the number of parent nodes of node g, and the inverse variance hyperparameter can also be sampled from the conditional distribution:
Keeping the parent node set and the component fixed, the MCMC sampling according to Equation (9) and Algorithm 1 can generate samples from the posterior distribution and use Equations (6)–(8) to update the hyperparameters.
| Algorithm 1: Pseudocode for updating the signal-to-noise ratio hyperparameter |
| For each node Input: , , Output: |
| MCMC iteration: ① Sampling a concrete variance hyperparameter from Equation (8) ② Sampling regression parameter vectors ,from Equation (7) set: ③ Sampling a new SNR hyperparameter from Equation (6), and output: |
3. PCCs-ED-DBN Model
The above inference of SNR hyperparameters assumes that the network structure and component vectors are fixed; in fact, these need to be inferred. In this section, the inference of the network structure and component vectors is divided into two parts for description. First, PCCs-ED-DBN infers network structure based on PCCs of data points and assumes fixed component vectors. Second, PCCs-ED-DBN infers component vectors based on Euclidean distances of data points.
3.1. Network Structure M Inference Based on PCCs of Data Points
When inferring the network structure, it is still assumed that the component vector is fixed, and the probability distribution of the network structure is set as:
Infer the parent node set of each node g, that is, obtain the entire network structure. For each node, the conditional probability of its parent node set is:
According to Equation (12), Metropolis–Hastings (M-H) keeps and fixed and moves from the current parent node set to a new set . The move is accepted with probability:
If accepted, set; otherwise, .
This paper introduces the Pearson correlation coefficient [] to explore the causal relationship between nodes. represent nodes, and represents the slack variable; in this paper, . When the parent node is sampled by the Markov chain Monte Carlo sampling method, the node with a high Pearson correlation coefficient is more likely to be sampled. Obtain the according to Equation (13). Algorithm 2 describes the pseudocode of M-H sampling:
| Algorithm 2: Pseudocode for updating the parent node sets |
| For each node Input: , , Output: |
| MCMC iteration: ① Get the system of parent sets : Randomly select node , ,, a = rand(1), if (i) adding the node to else (ii) deleting the node from (iii) exchanging a node for a node . Randomly select a new candidate parent set from ② According to the probability Equation (13). If accepted, set: from . Otherwise, set . Output: |
3.2. Component Vector Vg Infer Based on Euclidean Distance of Data Points
In the above sampling process, it is assumed that the component vector is fixed but in the actual process, needs to be sampled. Figure 2 lists the non-homogeneous dynamic Bayesian network with two changepoints divided into three components, namely, . Suppose that the network structure in different components is the same, but the parameters are different.
Figure 2.
Example of a non-homogeneous dynamic Bayesian network with two changepoints.
3.2.1. Component Transition
The component transition of the time data point is determined by the birth move, death move, and inclusion and exclusion move of the changepoint. The following describes the component transition in detail, and Figure 3 gives a specific example.
Figure 3.
Component transition example.
Birth move: Randomly select a component k, randomly select one of the data points allocated to component k, and reallocate the data points allocated to component k to a known new component.
Death move: Randomly select two components, k = 1 and k = 3, and assign the data points of component k = 3 to component k = 1.
Inclusion and exclusion move: It is recommended to redistribute the component vector to k = 1. This is because the surrounding time points and are all assigned to the state k = 1.
Therefore, if the potential prior knowledge in the data can be fully mined, it is more likely to accurately find the position of the conversion point, that is, to infer a correct component vector with node g, and ultimately improve the inferred accuracy of the network structure and model stability.
3.2.2. Birth Move Based on the Euclidean Distance
The experimental results found that the Euclidean distance of the mean on both sides of the changepoint is generally larger than that of the nonchanged point. Based on this finding, it is not difficult to conclude that when the Euclidean distance of the mean on both sides of a data point is large, it may be the real changepoint. Based on this conclusion, this paper proposes an ED-birth move whose changepoint possibility is proportional to the Euclidean distance on both sides of the data point. Algorithm 3 shows the ED-birth move algorithm flow.
| Algorithm 3: Pseudocode for changepoint birth move detection based on the Euclidean distance of data points |
| Input: The component vector of the current node g and the maximum number of changepoint Output: , |
| ① for for u = rand (0,1), if break; end end ② Change the component of all data points with state after to a new component , and update and to calculate the acceptance rate . |
respectively, represent the acceptance rates of the birth move, death move, inclusion, and exclusion move actions. The RJ-MCMC algorithm steps for updating the changepoint are shown in Algorithm 4.
| Algorithm 4: Pseudocode of RJ-MCMC sampling changepoint based on Euclidean distance of data points |
| Input: The component vector of the current node and the maximum number of changepoint , network M Output: , |
| ① For each sampling process, calculate based on the current number of conversion points ② Gibbs Sampler move A = rand (0,1) If A < birth move according to Algorithm 3 If A < death move If A < Inclusion and Exclusion move ③ Output: , |
The whole algorithm flow of the non-homogeneous DBN with multiple changepoints based on PPCs and Euclidean distance of data points is shown in Algorithm 5.
| Algorithm 5: MCMC sampling pseudocode for the PCCs-ED-DBN model |
| Input: MCMC samples the current state: Output: New MCMC status: |
| ① Keep the current , fixed, and update to according to Algorithm 1. ② Keep the current and fixed, and update to according to Algorithm 2. ③ Keep the current , , fixed, and update to according to Algorithm 4. |
4. Empirical Results
4.1. Evaluation Standard
4.1.1. Convergence Evaluation Criteria
Assuming that the current number of MCMC simulations is , the burning rate is burn_in, and indicates that there is edge when the number of iterations is ; otherwise, . Perform Q independent replicates of MCMC sampling. Plots of a scatterplot with values as the vertical axis and values as the horizontal axis.
4.1.2. Network Structure Accuracy Evaluation Criteria
indicates that there is an edge , while indicates that there is no edge . Define as the set of all edges whose posterior probability exceeds the threshold for each edge. Calculate true positive , false-positive and false negative for each . Plot a precision-recall (PR) curve with as the ordinate and as the abscissa. A larger area under the PR curve (PR-AUC) [] value indicates better network reconstruction accuracy.
4.1.3. Criteria for Model Stability
Assume the accuracy of the network structure obtained from different MCMC iteration times , denoted as , can be calculated. Perform P independent experiments to obtain different , and then calculate the variance of all , denoted as . A smaller variance means that the network structure inferred from each independent experiment is similar, i.e., the model is more stable. Draw a variance iteration curve with as the ordinate and as the abscissa. The stability of the network structure can be measured by comparing the curves.
4.2. Experimental Results
4.2.1. Saccharomyces Cerevisiae
The Saccharomyces cerevisiae data containing five gene nodes is a small network structure designed by Cantone et al. []. The authors measured the expression levels of these genes in vivo by real-time quantitative polymerase chain reaction over 37-time points. Cantone et al. changed the carbon source from galactose to glucose during the experiment. There are 16 measurements in galactose and 21 measurements in glucose, and the observed value of g at each node is recorded. Since there is an error in washing when changing glycogen, the two first measurement values are removed to obtain a 5 × 35 data set. Figure 4 shows the network structure of Saccharomyces cerevisiae.
Figure 4.
The network structure of Saccharomyces cerevisiae.
It can be seen from Figure 5 that when the number of MCMC iterations is 10,000, the edge scores simulated by 20 independent MCMC simulations are almost the same, and the convergence is almost reached. With the same number of iterations, the convergence of the PCCs-ED-DBN model is better than that of the HMM-DBN.
Figure 5.
Saccharomyces cerevisiae edge convergence scatter plot under HMM-DBN and PCCs-ED-DBN.
In the experiment, this paper follows the setting of Grzegorczyk et al. for hyperparameters. Set MCMC iteration: 10,000, the MCMC sampling results are saved once for each iteration, and 10,000 network structures are obtained. One hundred independent MCMC sampling results in 100 network structure accuracies, and the average value is used to obtain the final network structure accuracy (PR-AUC), as shown in Figure 6.
Figure 6.
Accuracy comparison among different models on Saccharomyces cerevisiae dataset.
Figure 6 shows that the non-homogeneous DBN (PCCs-ED-DBN, HMM-DBN, CPS-DNM, MIX-DBN) [,,] can achieve higher network reconstruction accuracy than a homogeneous DBN (HOM-DBN). The PR-AUC value of PCC-ED-DBN is about 15% higher than that of the homogeneous dynamic Bayesian network (HOM-DBN), and compared with other non-homogeneous dynamic Bayesian networks (MIX-DBN, CPS -DBN, HMM-DBN) increases were 12%, 6%, and 4%.
The homogeneous dynamic Bayesian network (HOM-DBN) follows the Markov assumption, the regulation network does not change with time and the regulation intensity obeys the same distribution during the modeling process. However, when the living environment of the organism changes, it is obviously unrealistic to assume that the distribution of gene regulation strength remains unchanged. The non-homogeneous dynamic Bayesian network (PCCs-ED-DBN, HMM-DBN, CPS-DBN, MIX-DBN) constructs a regulating network with the same network structure and different parameter distributions by combining the multiple changepoint processes. In this way, the model can better reflect the actual situation of natural biological development, and the network reconstruction ability is better.
Figure 7a shows the network structure accuracy of PCCs-ED-DBN and HMM-DBN under different times of MCMC sampling. Figure 7b shows the variance comparisons of the network structure, and Table 2 gives some specific numerical comparisons. Comparing Figure 7 and Table 2, it can be found that PCCs-ED-DBN has better network structure accuracy compared with HMM-DBN. Moreover, the network structure inferred under the same MCMC sampling times, compared with HMM-DBN, PCCs-ED-DBN inferred network structure accuracy variance is smaller, so the model is more stable than HMM-DBN.
Figure 7.
PR-AUC and variance under HMM-DBN and PCCs-ED-DBN. The line graph in panel (a) shows the relationship between the network reconstruction accuracy in terms of PR-AUC and the number of MCMC iterations. Line graph in panel (b) showing model stability in terms of PR-AUC variance versus number of MCMC iterations.
Table 2.
The specific value of network structure variance under different models.
In addition, ED-birth is also applied to the globally coupled NH-DBN [] and partially coupled EWC NH-DBN [] models for comparative experiments. In the experiment, this paper follows the setting of Grzegorczyk et al. for hyperparameters. Set MCMC iterations: 20,000, the MCMC sampling results are saved once for each iteration, and 20,000 network structures are obtained. Five hundred independent MCMC sampling results in 500 network structure accuracies, and the average value is used to obtain the final network structure accuracy (PR-AUC), as shown in Figure 8a.
Figure 8.
PR-AUC and variance under different models. Panel (a) shows the network reconstruction accuracy in terms of PR-AUC scores incorporating the proposed ED-birth into the Globally Coupled NH-DBN and EWC NH-DBN. Panel (b) shows the relationship between the network reconstruction accuracy in terms of PR-AUC and the number of MCMC iterations under Globally Coupled NH-DBN. Panel (c) shows model stability in terms of PR-AUC variance under globally coupled NH-DBN.
From Figure 8a, it can be concluded that the non-homogeneous dynamic Bayesian networks of ED-birth move are applied, and the network structure sampled by MCMC can obtain higher accuracy. In the EWC NH-DBN models, the effect is more obvious, but the global coupled NH-DBN network structure accuracy (PR-AUC) improvement is not significant.
Figure 8b show the network structure accuracy of the ED-birth move under different numbers of MCMC samplings (Globally Coupled NH-DBN). Figure 8c show the variance comparisons of the network structure, and Table 3 gives some specific numerical comparisons.
Table 3.
The specific value of network structure variance (Globally Coupled NH-DBN).
Comparing Figure 8 and Table 3, it can be found that the ED-birth move has better network structure accuracy in the globally coupled NH-DBN compared with the birth move. The network structure is inferred under the same MCMC sampling times. Compared with the birth move, the ED-birth move inferred network structure accuracy variance is smaller, so the model is more stable.
4.2.2. Synthetic Yeast Data
This paper generated synthetic yeast data for the K = 4 segment. Comparative experiments between HMM-DBN [] and PCCs-ED-DBN are performed using this dataset.
We analyzed the experimental results of the synthetic yeast dataset under the HMM model. Figure 9a shows the average AUC score, and Figure 9b shows the change in the AUC difference as the data point increases. With the increase in data points, PCCs-ED-DBN has better results for the detection of changepoints.
Figure 9.
PR-AUC of synthetic dataset. Panel (a) shows the network reconstruction accuracy in terms of PR-AUC scores at different data point lengths. Panel (b) shows the difference in network reconstruction accuracy in terms of PR-AUC scores at different data point lengths.
4.2.3. Gene Regulatory Network in Arabidopsis
Plants are well-suited experimental systems to study the mechanistic basis of developmental dynamics, given that they are more amenable to in vivo manipulation than, for example, animals. Constructing the Arabidopsis gene regulatory network is currently topical research [,,]. Figure 10 shows that the convergence effect of the MCMC iteration number of 50,000 under the PCCs-ED-DBN model is approximately the same as the convergence effect of the MCMC iteration number of 200,000 under the HMM-DBN model. This means that to achieve the same convergence effect, PCCs-ED-DBN saves more than half the time overhead compared to HMM-DBN. Figure 11. Arabidopsis gene regulatory network with marginal probability greater than 0.5 inferred using the PCCs-ED-DBN model. Since the gene regulatory network of Arabidopsis has not been fully documented in the biological literature, the network construction accuracy cannot be calculated. However, known edges given in some biological literature are marked with bold lines in Figure 11 (GI→CCA1 [], GI→TOC1 [], ELF3→TOC1 [], ELF3→CCA1 [], ELF3→PRR9 [], TOC1→LHY [], LHY→TOC1 [], ELF4→PRR9 []).
Figure 10.
Scatter plot of Arabidopsis edge convergence under HMM-DBN and PCCs-ED-DBN.
Figure 11.
Arabidopsis gene regulatory network inferred by the PCCs-ED-DBN model.
4.2.4. Simulated Data from the RAF Pathway
Figure 12 shows the RAF protein signaling pathway as described by Sachs et al. [] consists of 11 proteins (pip3, plcg, pip2, pkc, p38, raf, pka, jnk, mek, erk, and akt), and the edges represent protein interactions. Figure 13 shows the experimental comparison of network reconstruction accuracy on the dataset provided by Marco Grzegorczyk []. Compared with CPS-DBN and MIX-DBN, the PR-AUC value of PCCs-ED-DBN is improved significantly. However, in data 1, data 2, and data 3, the PR-AUC values of PCCs-ED-DBN were only 2%, 3%, and 4% higher than that of HMM-DBN, respectively. However, in data 4, the increase was more obvious, about 8%.
Figure 12.
RAF pathway.
Figure 13.
Accuracy comparison of different models on four RAF pathway datasets.
4.2.5. Time Overhead
Compared with HMM-DBN, PCCs-ED-DBN has improved network reconstruction accuracy, convergence, and stability, but this inevitably adds additional time overhead. Table 4 gives a comparison of the additional time overhead during the fourth part of the experiment. The simulation platform is ① Processor Intel Core i5-9500, CPU 3.0 GHz. ② Installed memory (RAM) 8 GB. ③ Hard disk: 1 TB. ④ Tool MATLAB R2018b.
Table 4.
time overhead comparison between HMM-DBN and PCCs-ED-DBN.
5. Conclusions
This paper makes two improvements compared to the HMM-DBN model. First, the changepoint sampling method based on the Euclidean distance of data points proposed in this paper fully mines the prior knowledge between data points. Second, we explore the causal relationship between gene expression data and the Pearson correlation coefficient between genes and apply this relationship to the selection of candidate parent nodes. In addition, the advantages of the PCCs-ED-DBN can be described in detail from the following three aspects.
Network reconstruction accuracy:
On the Saccharomyces cerevisiae dataset, the PR-AUC value of PCC-ED-DBN is about 15% higher than that of the homogeneous dynamic Bayesian network (HOM-DBN), and compared with other non-homogeneous dynamic Bayesian networks (MIX-DBN, CPS -DBN, HMM-DBN) increases were 12%, 6%, 4%. On the four datasets of the RAF pathway, the PR-AUC value of PCC-ED-DBN is more than 10% higher than that of MIX-DBN and CPS-DBN, but compared with HMM-DBN, in data_1, data_2, data_3, with only 2%, 3%, and 4% improvement, and 8% improvement in data_4.
Convergence:
On Saccharomyces cerevisiae data and Arabidopsis data, PCCs-ED-DBN has a better convergence effect than HMM-DBN, especially on Arabidopsis data, the improvement of convergence is more obvious. The convergence effect of HMM-DBN with 200,000 MCMC iterations is basically the same as that of PCCs-ED-DBN with 50,000 MCMC iterations. Although PCCs-ED-DBN has more time consumption in a single iteration than HMM-DBN, it can still reduce the time consumption by more than half.
Model stability:
The network reconstruction accuracy (PR-AUC) inferred in multiple independent MCMC simulations is experimentally verified, and the variance of PCCs-ED-DBN is smaller than that of HMM-DBN, which means that the model proposed in this paper is more stable. Finally, the ED-birth move proposed in this paper is applied to the coupled model (Globally Coupled NH-DBNs, EWC NH-DBNs) in the experiment, and the network reconstruction accuracy is also improved, but the improvement effect is not as good as that of the uncoupled model. This is because coupling parameters are added to the coupled model. Through the action of the coupling parameters, the regression parameters in the coupled components can influence each other, thereby adjusting the regression parameters in the components. This means that even if the component assignment deviates from the actual situation, it is still possible to infer regression parameters that are close to the actual situation.
This paper only proposes a method to find the changepoint using Euclidean distance. In future work, I hope to fully exploit the underlying prior knowledge of the data to infer component vectors. The convergence of MCMC sampling is also a topic worthy of study. I hope that the methods I explore in the future can improve the convergence of the model and express the problem of proving convergence mathematically.
Author Contributions
software, Q.Z. writing—original draft preparation, J.Z.; writing—review and editing, C.H. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the following grants: National Natural Science Foundation of China (General Program) 61772321, Natural Science Foundation of Hefei 2021035, Hefei University Graduate Innovation and Entrepreneurship Program (21YCXL25,21YCXL18).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Ajmal, H.B.E.; Madden, M.G. Dynamic Bayesian Network Learning to Infer Sparse Models from Time Series Gene Expression Data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021. [Google Scholar] [CrossRef] [PubMed]
- Che, D.; Guo, S.; Jiang, Q.; Chen, L. PFBNet: A Priori-Fused Boosting Method for Gene Regulatory Network Inference. BMC Bioinform. 2020, 21, 308. [Google Scholar] [CrossRef] [PubMed]
- Shafiee Kamalabad, M.; Grzegorczyk, M. A New Bayesian Piecewise Linear Regression Model for Dynamic Network Reconstruction. BMC Bioinform. 2021, 22, 196. [Google Scholar] [CrossRef]
- Timmermann, T.; González, B.; Ruz, G.A. Reconstruction of a Gene Regulatory Network of the Induced Systemic Resistance Defense Response in Arabidopsis Using Boolean Networks. BMC Bioinform. 2020, 21, 142. [Google Scholar] [CrossRef] [PubMed]
- Zhao, M.; He, W.; Tang, J.; Zou, Q.; Guo, F. A Comprehensive Overview and Critical Evaluation of Gene Regulatory Network Inference Technologies. Brief. Bioinform. 2021, 22, bbab009. [Google Scholar] [CrossRef] [PubMed]
- Friedman, N.; Linial, M.; Nachman, I.; Pe’Er, D. Using Bayesian Networks to Analyze Expression Data. J. Comput. Biol. 2000, 7, 601–620. [Google Scholar] [CrossRef]
- Talih, M.; Hengartner, N. Structural Learning with Time-Varying Components: Tracking the Cross-Section of Financial Time Series. J. R. Stat. Soc. Ser. B 2005, 67, 321–341. [Google Scholar] [CrossRef]
- Xuan, X.; Murphy, K. Modeling Changing Dependency Structure in Multivariate Time Series. In Proceedings of the 24th International Conference on Machine Learning, New York, NY, USA, 20–24 June 2007. [Google Scholar]
- Lebre, S. Stochastic Process Analysis for Genomics and Dynamic Bayesian Networks Inference. Master’s Thesis, Université d’Evry-Val d’Essonne, Évry-Courcouronnes, France, 2007. [Google Scholar]
- Robinson, J.; Hartemink, A. Non-stationary Dynamic Bayesian Networks. In Advances in Neural Information Processing Systems 21 (NIPS 2008); Curran Associates Inc.: Red Hook, NY, USA, 2008. [Google Scholar]
- Robinson, J.W.; Hartemink, A.J.; Ghahramani, Z. Learning Non-Stationary Dynamic Bayesian Networks. J. Mach. Learn. Res. 2010, 11, 3647–3680. [Google Scholar]
- Kolar, M.; Song, L.; Xing, E. Sparsistent Learning of Varying-Coefficient Models with Structural Changes. Adv. Neural Inf. Processing Syst. 2009, 22, 1006–1014. [Google Scholar]
- Aderhold, A.; Husmeier, D.; Grzegorczyk, M. Statistical Inference of Regulatory Networks for Circadian Regulation. Stat. Appl. Genet. Mol. Biol. 2014, 13, 227–273. [Google Scholar] [CrossRef]
- Shafiee Kamalabad, M.; Heberle, A.M.; Thedieck, K.; Grzegorczyk, M. Partially Non-Homogeneous Dynamic Bayesian Networks Based on Bayesian Regression Models with Partitioned Design Matrices. Bioinformatics 2019, 35, 2108–2117. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, A.; Xing, E.P. Recovering Time-Varying Networks of Dependencies in Social and Biological Studies. Proc. Natl. Acad. Sci. USA 2009, 106, 11878–11883. [Google Scholar] [CrossRef] [PubMed]
- Dong, M.; He, D. A Segmental Hidden Semi-Markov Model (HSMM)-Based Diagnostics and Prognostics Framework and Methodology. Mech. Syst. Signal Processing 2007, 21, 2248–2266. [Google Scholar] [CrossRef]
- Dondelinger, F.; Lebre, S.; Husmeier, D. Heterogeneous Continuous Dynamic Bayesian Networks with Flexible Structure and Inter-Time Segment Information Sharing. In International Conference on Machine Learning (ICML); Furnkranz, J., Joachims, T., Eds.; Omnipress: Haifa, Israel, 2010; pp. 303–310. [Google Scholar]
- Dondelinger, F.; Lèbre, S.; Husmeier, D. Non-Homogeneous Dynamic Bayesian Networks with Bayesian Regularization for Inferring Gene Regulatory Networks with Gradually Time-Varying Structure. Mach. Learn. 2013, 90, 191–230. [Google Scholar] [CrossRef]
- Lèbre, S.; Becq, J.; Devaux, F.; Stumpf, M.P.; Lelandais, G. Statistical Inference of the Time-Varying Structure of Gene-Regulation Networks. BMC Syst. Biol. 2010, 4, 130. [Google Scholar] [CrossRef]
- Grzegorczyk, M.; Husmeier, D. Non-Homogeneous Dynamic Bayesian Networks for Continuous Data. Mach. Learn. 2011, 83, 355–419. [Google Scholar] [CrossRef]
- Grzegorczyk, M.; Husmeier, D. Regularization of Non-Homogeneous Dynamic Bayesian Networks with Global Information-Coupling Based on Hierarchical Bayesian Models. Mach. Learn. 2013, 91, 105–154. [Google Scholar] [CrossRef]
- Grzegorczyk, M.; Husmeier, D. A Non-Homogeneous Dynamic Bayesian Network with Sequentially Coupled Interaction Parameters for Applications in Systems and Synthetic Biology. Stat. Appl. Genet. Mol. Biol. 2012, 11, 1–62. [Google Scholar] [CrossRef]
- Grzegorczyk, M.; Husmeier, D.; Edwards, K.D.; Ghazal, P.; Millar, A.J. Modelling Non-stationary Gene Regulatory Processes with a Non-homogeneous Bayesian Network and the Allocation Sampler. Bioinformatics 2008, 24, 2071–2078. [Google Scholar] [CrossRef]
- Grzegorczyk, M.; Husmeier, D. Modelling Non-stationary Gene Regulatory Processes with a Non-homogeneous Dynamic Bayesian Network and the Change Point Process. In Proceedings of the 6th International Workshop on Computational Systems Biology, Aarhus, Denmark, 10–12 June 2009. [Google Scholar]
- Grzegorczyk, M. A Non-homogeneous Dynamic Bayesian Network with a Hidden Markov Model Dependency Structure among the Temporal Data Points. Mach. Learn. 2016, 102, 155–207. [Google Scholar] [CrossRef]
- Grzegorczyk, M.; Husmeier, D. Non-stationary Continuous Dynamic Bayesian Networks. In Advances in Neural Information Processing Systems 22 (NIPS 2009); Curran Associates Inc.: Red Hook, NY, USA, 2009. [Google Scholar]
- Cohen, I.; Juang, Y.; Chen, J.; Benesty, J. Pearson Correlation Coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
- Davis, J.; Goadrich, M. The Relationship between Precision-Recall and Roc Curves. In Proceedings of the 23rd International Conference on Machine Learning, New York, NY, USA, 25–29 June 2006. [Google Scholar]
- Cantone, I.; Marucci, L.; Iorio, F.; Ricci, M.A.; Belcastro, V.; Bansal, M.; Santini, S.; di Bernardo, M.; di Bernardo, D.; Cosma, M.P. A Yeast Synthetic Network for In Vivo Assessment of Reverse-Engineering and Modeling Approaches. Cell 2009, 137, 172–181. [Google Scholar] [CrossRef] [PubMed]
- Shafiee Kamalabad, M.; Grzegorczyk, M. Non-Homogeneous Dynamic Bayesian Networks with Edge-Wise Sequentially Coupled Parameters. Bioinformatics 2020, 36, 1198–1207. [Google Scholar] [CrossRef] [PubMed]
- Aluru, M.; Shrivastava, H.; Chockalingam, S.P.; Shivakumar, S.; Aluru, S. EnGRaiN: A Supervised Ensemble Learning Method for Recovery of Large-Scale Gene Regulatory Networks. Bioinformatics 2021, 38, 1312–1319. [Google Scholar] [CrossRef] [PubMed]
- Dávila-Velderrain, J.; Caldú-Primo, J.L.; Martínez-García, J.C.; Álvarez-Buylla Roces, M.A. Gene Regulatory Network Dynamical Logical Models for Plant Development. In Plant Systems Biology; Springer: Berlin/Heidelberg, Germany, 2022; Volume 2395, pp. 59–77. [Google Scholar]
- Monga, I.; Randhawa, V.; Dhanda, S.K. Connecting the Dots: Using Machine Learning to Forge Gene Regulatory Networks from Large Biological Datasets. At the Intersection of GRNs: Where System Biology Meets Machine Learning. In Machine Learning and Systems Biology in Genomics and Health; Springer: Berlin/Heidelberg, Germany, 2022; pp. 103–121. [Google Scholar] [CrossRef]
- Miwa, K.; Serikawa, M.; Suzuki, S.; Kondo, T.; Oyama, T. Conserved Expression Profiles of Circadian Clock-related Genes in Two Lemna Species Showing Long-day and Short-day Photoperiodic Flowering Responses. Plant Cell Physiol. 2006, 47, 601–612. [Google Scholar] [CrossRef]
- Dixon, L.E.; Knox, K.; Kozma-Bognar, L.; Southern, M.M.; Pokhilko, A.; Millar, A.J. Temporal Repression of Core Circadian Genes Is Mediated through EARLY FLOWERING 3 in Arabidopsis. Curr. Biol. 2011, 21, 120–125. [Google Scholar] [CrossRef]
- Chow, B.Y.; Helfer, A.; Nusinow, D.A.; Kay, S.A. ELF3 Recruitment to the PRR9 Promoter Requires Other Evening Complex Members in the Arabidopsis Circadian Clock. Plant Signal. Behav. 2012, 7, 170–173. [Google Scholar] [CrossRef]
- Locke, J.C.W.; Kozma-Bognár, L.; Gould, P.D.; Fehér, B.; Kevei, É.; Nagy, F.; Turner, M.S.; Hall, A.; Millar, A.J. Experimental Validation of a Predicted Feedback Loop in the Multi-Oscillator Clock of Arabidopsis Thaliana. Mol. Syst. Biol. 2006, 2, 59. [Google Scholar] [CrossRef]
- Herrero, E.; Kolmos, E.; Bujdoso, N.; Yuan, Y.; Wang, M.; Berns, M.C.; Uhlworm, H.; Coupland, G.; Saini, R.; Jaskolski, M.; et al. EARLY FLOWERING4 Recruitment of EARLY FLOWERING3 in the Nucleus Sustains the Arabidopsis Circadian Clock. Plant Cell 2012, 24, 428–443. [Google Scholar] [CrossRef]
- Sachs, K.; Perez, O.; Pe’Er, D.; Lauffenburger, D.A.; Nolan, G.P. Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Science 2005, 308, 523–529. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).