A Non-Binary Approach to Super-Enhancer Identification and Clustering: A Dataset for Tumor- and Treatment-Associated Dynamics in Mouse Tissues
Abstract
1. Summary
2. Data Description
2.1. Subset 1: SE Locus Consolidation Information
- Locus ID (se_locus_id): A unique identifier assigned to each consolidated SE locus.
- Genomic Coordinates (chr, start, end): The chromosomal position (chromosome, start, end) of the consolidated SE locus.
- Original SEs (se_id_list): A comma-separated list of SE identifiers that contributed to the consolidated locus, along with their originating samples.
- Sample-Specific Presence Indicators (SE_11_0077, SE_12_0118, SE_12_0449, SE_12_0450): Binary values (0 or 1) indicating whether the SE locus is present (1) or absent (0) in each sample.
2.2. Subset 2: Presence of Super-Enhancers and Typical Enhancers in Consolidated SE Loci
- Locus ID (se_locus_id): The identifier of the consolidated SE locus where the enhancer is located.
- Sample ID (cell_id): The originating sample in which the enhancer is detected.
- Enhancer ID (ste_id): A unique identifier for the SE/TE within the sample.
- Genomic Coordinates (ste_chr, ste_start, ste_end): The chromosomal position (chromosome, start, end) of the enhancer.
- Rank (ste_rank): The ranking of the enhancer within the consolidated SE locus, where a lower rank typically indicates higher activity.
- ChIP-seq Signal (avg_rpm_diff): The average signal intensity of the enhancer, normalized against the control, representing enhancer activity.
- Overlap (overlap): The extent of overlap between the enhancer and the consolidated SE locus.
- Weight Within Locus (ste_weight_within_locus): The relative contribution of the enhancer to the weighted average signal intensity of the consolidated locus, calculated as the ratio of the enhancer’s overlap with the locus to the total sum of overlaps between the locus and all enhancers in the sample.
- From Sample_12_0450: one super-enhancer (SE_12_045000567).
- From Sample_12_0449: three typical enhancers (TE_12_044906558, TE_12_044906059, TE_12_044900944).
- From Sample_12_0118: three typical enhancers (TE_12_011809209, TE_12_011810352, TE_12_011800952).
- From Sample_11_0077: no enhancer elements were detected within this SE locus.
2.3. Subset 3: Features of Consolidated Super-Enhancer Loci
- Locus ID (se_locus_id): The identifier of the consolidated SE locus.
- Sample ID (cell_id): The originating sample in which the SE locus is analyzed.
- Max ChIP-seq Signal (avg_rpm_diff__max): The maximum signal intensity (avgRPM) observed within the SE locus in the given sample.
- Weighted Mean ChIP-seq Signal (avg_rpm_diff__weighted): The weighted average ChIP-seq signal (avgRPM) across all enhancers within the SE locus, where weights are determined by the enhancer-locus overlap.
- Max Enhancer Rank (max_rank): The highest (worst) rank among all enhancers within the SE locus in the given sample.
- Min Enhancer Rank (min_rank): The lowest (best) rank among all enhancers within the SE locus in the given sample.
- Binary SE Presence Indicator (active_SE): A binary value (1 or 0) indicating whether the SE locus contains at least one element classified as a super-enhancer by the ROSE algorithm in the given sample.
- Active SE Count (active_SE_count): The number of super-enhancers detected within the SE locus in the given sample.
- Active TE Count (active_TE_count): The number of typical enhancers identified within the SE locus in the given sample.
2.4. Subset 4: Preprocessing for Clustering
- Locus ID (se_locus_id): The identifier of the consolidated SE locus.
- Binary Presence Indicators (SE_11_0077_is, SE_12_0118_is, SE_12_0449_is, SE_12_0450_is): Binary values (0 or 1) indicating whether the SE locus is present (1) or absent (0) in each sample.
- Raw ChIP-seq Signal (SE_11_0077, SE_12_0118, SE_12_0449, SE_12_0450): The original activity values (avgRPM) of the SE locus across the given samples.
- Median-Normalized Signal (SE_11_0077_medianNormalized, SE_12_0118_medianNormalized, SE_12_0449_medianNormalized, SE_12_0450_medianNormalized): The activity values after median normalization, which adjusts the distribution to reduce sample-specific biases.
- Imputed Median-Normalized Signal (SE_11_0077_medianNormalized_imputed, SE_12_0118_medianNormalized_imputed, SE_12_0449_medianNormalized_imputed, SE_12_0450_medianNormalized_imputed): Median-normalized values with imputed data to replace missing values (if any).
- Log-transformed Normalized Median Signal (SE_11_0077_medianNormalized_imputed_log1p, SE_12_0118_medianNormalized_imputed_log1p, SE_12_0449_medianNormalized_imputed_log1p, SE_12_0450_medianNormalized_imputed_log1p):
- Z-scaled Log-transformed Normalized Median Signal (SE_11_0077_medianNormalized_log1p_zscaled, SE_12_0118_medianNormalized_log1p_zscaled, SE_12_0449_medianNormalized_log1p_zscaled, SE_12_0450_medianNormalized_log1p_zscaled): The log-transformed imputed median-normalized values further standardized using Z-score normalization, ensuring that each sample has a mean of 0 and standard deviation of 1 for comparability across datasets.
2.5. Clustering and Gene Associations
- Locus ID (se_locus_id): A unique identifier assigned to each consolidated SE locus.
- Cluster Assignment (louvain_module): The cluster number assigned to the SE locus using the Louvain community detection algorithm, which groups SE loci based on shared enhancer activity patterns.
- Degree Centrality (degree_centrality): The proportion of other loci that the SE locus is directly connected to within the network.
- Betweenness Centrality (betweenness_centrality): A measure of how often the SE locus lies on the shortest path between other loci, indicating its importance in network connectivity.
- Closeness Centrality (closeness_centrality): The inverse of the average shortest path distance from the SE locus to all other loci, representing how central it is within the network.
- Gene Associations (SE_11_0077__gene_closest_active, SE_12_0118__gene_closest_active, SE_12_0449__gene_closest_active, SE_12_0450__gene_closest_active): If an SE was associated with the locus in a sample, its closest active gene is indicated in the subset.
3. Methods
3.1. Raw Data Collection
3.2. Consolidation of Super-Enhancer Loci Across Sample Groups
3.3. Identification of Enhancer-Based Elements Within Consolidated Super-Enhancer Loci
3.4. Feature Matrix Construction and Dimensionality Reduction
3.5. Clustering and Gene Association
3.6. Functional Analysis of Protein Interaction Network
3.7. Software Environment
4. User Notes
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Youngblood, M.W.; Erson-Omay, Z.; Li, C.; Najem, H.; Coșkun, S.; Tyrtova, E.; Montejo, J.D.; Miyagishima, D.F.; Barak, T.; Nishimura, S.; et al. Super-Enhancer Hijacking Drives Ectopic Expression of Hedgehog Pathway Ligands in Meningiomas. Nat. Commun. 2023, 14, 41926. [Google Scholar] [CrossRef] [PubMed]
- Koutsi, M.A.; Pouliou, M.; Champezou, L.; Vatsellas, G.; Giannopoulou, A.-I.; Piperi, C.; Agelopoulos, M. Typical Enhancers, Super-Enhancers, and Cancers. Cancers 2022, 14, 4375. [Google Scholar] [CrossRef]
- Whyte, W.A.; Orlando, D.A.; Hnisz, D.; Abraham, B.J.; Lin, C.Y.; Kagey, M.H.; Rahl, P.B.; Lee, T.I.; Young, R.A. Master Transcription Factors and Mediator Establish Super-Enhancers at Key Cell Identity Genes. Cell 2013, 153, 307–319. [Google Scholar] [CrossRef] [PubMed]
- Hnisz, D.; Abraham, B.J.; Lee, T.I.; Lau, A.; Saint-André, V.; Sigova, A.A.; Hoke, H.A.; Young, R.A. Super-Enhancers in the Control of Cell Identity and Disease. Cell 2013, 155, 934–947. [Google Scholar] [CrossRef]
- Lovén, J.; Hoke, H.A.; Lin, C.Y.; Lau, A.; Orlando, D.A.; Vakoc, C.R.; Bradner, J.E.; Lee, T.I.; Young, R.A. Selective Inhibition of Tumor Oncogenes by Disruption of Super-Enhancers. Cell 2013, 153, 320–334. [Google Scholar] [CrossRef]
- Shin, H.Y. Targeting Super-Enhancers for Disease Treatment and Diagnosis. Mol. Cells 2018, 41, 506–514. [Google Scholar] [CrossRef] [PubMed]
- Grosveld, F.; van Staalduinen, J.; Stadhouders, R. Transcriptional Regulation by (Super)Enhancers: From Discovery to Mechanisms. Annu. Rev. Genom. Hum. Genet. 2021, 22, 127–146. [Google Scholar] [CrossRef]
- Blobel, G.A.; Higgs, D.R.; Mitchell, J.A.; Notani, D.; Young, R.A. Testing the Super-Enhancer Concept. Nat. Rev. Genet. 2021, 22, 749–755. [Google Scholar] [CrossRef] [PubMed]
- Gartlgruber, M.; Sharma, A.K.; Quintero, A.; Dreidax, D.; Jansky, S.; Park, Y.G.; Kreth, S.; Meder, J.; Doncevic, D.; Saary, P.; et al. Super-Enhancers Define Regulatory Subtypes and Cell Identity in Neuroblastoma. Nat. Cancer 2021, 2, 114–128. [Google Scholar] [CrossRef]
- Kai, Y.; Li, B.E.; Zhu, M.; Li, G.Y.; Chen, F.; Han, Y.; Cha, H.J.; Orkin, S.H.; Cai, W.; Huang, J.; et al. Mapping the Evolving Landscape of Super-Enhancers during Cell Differentiation. Genome Biol. 2021, 22, 196. [Google Scholar] [CrossRef]
- Sengupta, S.; George, R.E. Super-Enhancer-Driven Transcriptional Dependencies in Cancer. Trends Cancer 2017, 3, 268–281. [Google Scholar] [CrossRef]
- Bal, E.; Kumar, R.; Hadigol, M.; Holmes, A.B.; Hilton, L.K.; Loh, J.W.; Dreval, K.; Wong, J.C.H.; Vlasevska, S.; Corinaldesi, C.; et al. Super-Enhancer Hypermutation Alters Oncogene Expression in B Cell Lymphoma. Nature 2022, 607, 808–815. [Google Scholar] [CrossRef]
- Jia, Q.; Chen, S.; Tan, Y.; Li, Y.; Tang, F. Oncogenic Super-Enhancer Formation in Tumorigenesis and Its Molecular Mechanisms. Exp. Mol. Med. 2020, 52, 713–723. [Google Scholar] [CrossRef]
- Wang, X.; Cairns, M.J.; Yan, J. Super-Enhancers in Transcriptional Regulation and Genome Organization. Nucleic Acids Res. 2019, 47, 11481–11496. [Google Scholar] [CrossRef]
- Li, G.; Kang, Y.; Feng, X.; Wang, G.; Yuan, Y.; Li, Z.; Du, L.; Xu, B. Dynamic Changes of Enhancer and Super-Enhancer Landscape in Degenerated Nucleus Pulposus Cells. Life Sci. Alliance 2023, 6, e202201854. [Google Scholar] [CrossRef]
- Yamagata, K.; Nakayamada, S.; Tanaka, Y. Critical Roles of Super-Enhancers in the Pathogenesis of Autoimmune Diseases. Inflamm. Regen. 2020, 40, 25. [Google Scholar] [CrossRef]
- He, Y.; Long, W.; Liu, Q. Targeting Super-Enhancers as a Therapeutic Strategy for Cancer Treatment. Front. Pharmacol. 2019, 10, 361. [Google Scholar] [CrossRef]
- Niederriter, A.R.; Varshney, A.; Parker, S.C.J.; Martin, D.M. Super-Enhancers in Cancers, Complex Disease, and Developmental Disorders. Genes 2015, 6, 1183–1200. [Google Scholar] [CrossRef]
- Qu, J.; Ouyang, Z.; Wu, W.; Li, G.; Wang, J.; Lu, Q.; Li, Z. Functions and Clinical Significance of Super-Enhancers in Bone-Related Diseases. Front. Cell Dev. Biol. 2020, 8, 534. [Google Scholar] [CrossRef]
- Liu, S.; Dai, W.; Jin, B.; Jiang, F.; Huang, H.; Hou, W.; Lan, J.; Jin, Y.; Peng, W.; Pan, J. Effects of Super-Enhancers in Cancer Metastasis: Mechanisms and Therapeutic Targets. Mol. Cancer 2024, 23, 122. [Google Scholar] [CrossRef]
- Tang, S.C.; Vijayakumar, U.; Zhang, Y.; Fullwood, M.J. Super-Enhancers, Phase-Separated Condensates, and 3D Genome Organization in Cancer. Cancers 2022, 14, 2866. [Google Scholar] [CrossRef] [PubMed]
- Qian, H.; Zhu, M.; Tan, X.; Zhang, Y.; Liu, X.; Yang, L. Super-Enhancers and the Super-Enhancer Reader BRD4: Tumorigenic Factors and Therapeutic Targets. Cell Death Discov. 2023, 9, 171. [Google Scholar] [CrossRef] [PubMed]
- Kravchuk, E.V.; Ashniev, G.A.; Gladkova, M.G.; Orlov, A.V.; Vasileva, A.V.; Boldyreva, A.V.; Burenin, A.G.; Skirda, A.M.; Nikitin, P.I.; Orlova, N.N. Experimental Validation and Prediction of Super-Enhancers: Advances and Challenges. Cells 2023, 12, 1191. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Y.; Qian, F.; Bai, X.; Liu, Y.; Wang, Q.; Ai, B.; Han, X.; Shi, S.; Zhang, J.; Li, X.; et al. SEdb: A Comprehensive Human Super-Enhancer Database. Nucleic Acids Res. 2019, 47, D235–D243. [Google Scholar] [CrossRef]
- Quinlan, A.R.; Hall, I.M. BEDTools: A Flexible Suite of Utilities for Comparing Genomic Features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef]
- Goh, W.W.-B.; Wong, L. Advanced Bioinformatics Methods for Practical Applications in Proteomics. Brief. Bioinform. 2019, 20, 346–359. [Google Scholar] [CrossRef]
- Abdi, H.; Williams, L.J. Principal Component Analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
- Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
- Lancichinetti, A.; Fortunato, S. Consensus Clustering in Complex Networks. Sci. Rep. 2012, 2, 336. [Google Scholar] [CrossRef]
- Hagberg, A.A.; Schult, D.A.; Swart, P.J. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference, Pasadena, CA, USA, 21 August 2008. [Google Scholar] [CrossRef]
- McInnes, L.; Healy, J.; Saul, N.; Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
- STRING Database. Available online: https://string-db.org/ (accessed on 28 February 2025).
Sample Set | Sample Type | Tissue Type | Species | SEdb 2.0 ID | Experimental Condition |
---|---|---|---|---|---|
Set 1 | Tissue | Mammary gland | Mouse | Sample_12_0136 | Normal mammary gland (untreated) |
Set 1 | Tissue | Mammary gland | Mouse | Sample_12_0137 | Mammary tumor (untreated) |
Set 1 | Tissue | Mammary gland | Mouse | Sample_12_0138 | Mammary tumor (DMSO-treated) |
Set 1 | Tissue | Mammary gland | Mouse | Sample_12_0139 | Mammary tumor (C646-treated) |
Set 2 | Tissue | Lung | Mouse | Sample_11_0077 | Lung postnatal (day 0) |
Set 2 | Tissue | Lung | Mouse | Sample_12_0118 | Lung tumor |
Set 2 | Tissue | Lung | Mouse | Sample_12_0449 | Lung adenocarcinoma cell (Nkx2-1-positive) |
Set 2 | Tissue | Lung | Mouse | Sample_12_0450 | Lung adenocarcinoma cell (Nkx2-1-negative) |
Set 3 | Cell line | NMuMG cells | Mouse | Sample_12_0763 | Untreated |
Set 3 | Cell line | NMuMG cells | Mouse | Sample_12_0764 | TGF-β-treated (4 h) |
Set 3 | Cell line | NMuMG cells | Mouse | Sample_12_0765 | TGF-β-treated (24 h) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Osintseva, E.D.; Ashniev, G.A.; Orlov, A.V.; Nikitin, P.I.; Zaitseva, Z.G.; Volkov, V.V.; Orlova, N.N. A Non-Binary Approach to Super-Enhancer Identification and Clustering: A Dataset for Tumor- and Treatment-Associated Dynamics in Mouse Tissues. Data 2025, 10, 74. https://doi.org/10.3390/data10050074
Osintseva ED, Ashniev GA, Orlov AV, Nikitin PI, Zaitseva ZG, Volkov VV, Orlova NN. A Non-Binary Approach to Super-Enhancer Identification and Clustering: A Dataset for Tumor- and Treatment-Associated Dynamics in Mouse Tissues. Data. 2025; 10(5):74. https://doi.org/10.3390/data10050074
Chicago/Turabian StyleOsintseva, Ekaterina D., German A. Ashniev, Alexey V. Orlov, Petr I. Nikitin, Zoia G. Zaitseva, Vladimir V. Volkov, and Natalia N. Orlova. 2025. "A Non-Binary Approach to Super-Enhancer Identification and Clustering: A Dataset for Tumor- and Treatment-Associated Dynamics in Mouse Tissues" Data 10, no. 5: 74. https://doi.org/10.3390/data10050074
APA StyleOsintseva, E. D., Ashniev, G. A., Orlov, A. V., Nikitin, P. I., Zaitseva, Z. G., Volkov, V. V., & Orlova, N. N. (2025). A Non-Binary Approach to Super-Enhancer Identification and Clustering: A Dataset for Tumor- and Treatment-Associated Dynamics in Mouse Tissues. Data, 10(5), 74. https://doi.org/10.3390/data10050074