Mining Proteome Research Reports: A Bird’s Eye View
Abstract
:1. Introduction
2. Materials and Methods
2.1. Collection of Data
2.2. Preprocessing and Scientometric Analysis
2.3. Keyword Mining
2.4. Mining of Bioconcepts
3. Results
4. Discussion
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Weeber, M.; Klein, H.; Aronson, A.R.; Mork, J.G.; Berg, L.T.D.J.-V.D.; Vos, R. Text-based discovery in biomedicine: The architecture of the DAD-system. Proc. AMIA Symp. 2000, 2000, 903–907. [Google Scholar]
- Cohen, K.B.; Hunter, L. Artificial Intelligence Methods and Tools for Systems Biology; Natural language processing and systems biology; Springer: Dordrecht, The Netherlands, 2004; Volume 5, pp. 147–173. [Google Scholar]
- Raja, K.; Patrick, M.; Gao, Y.; Madu, D.; Yang, Y.; Tsoi, L.C. A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries. Int. J. Genom. 2017, 2017, 6213474. [Google Scholar] [CrossRef] [Green Version]
- Singha, D.L.; Sahu, J. Gazing at The PubMed Reports on CRISPR Tools in Medical Research: A Text-Mining Study. Mol. Genet. Med. 2019, 13, 1. [Google Scholar]
- Yeh, A.S.; Hirschman, L.; Morgan, A.A. Evaluation of text data mining for database curation: Lessons learned from the KDD Challenge Cup. Bioinformatics 2003, 19, i331–i339. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Liang, Y.; Wishart, D. PolySearch2: A significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res. 2015, 43, W535–W542. [Google Scholar] [CrossRef]
- Perera, N.; Dehmer, M.; Emmert-Streib, F. Named Entity Recognition and Relation Detection for Biomedical Information Extraction. Front. Cell Dev. Biol. 2020, 8, 673. [Google Scholar] [CrossRef]
- Verspoor, K.; Cohen, K.B. Natural Language Processing. In Encyclopedia of Systems Biology; Springer: New York, NY, USA, 2013; pp. 1495–1498. [Google Scholar]
- DuPree, E.J.; Jayathirtha, M.; Yorkey, H.; Mihasan, M.; Petre, B.A.; Darie, C.C. A Critical Review of Bottom-Up Proteomics: The Good, the Bad, and the Future of this Field. Proteomes 2020, 8, 14. [Google Scholar] [CrossRef] [PubMed]
- Alaoui-Jamali, M.A.; Xu, Y.-J. Proteomic technology for biomarker profiling in cancer: An update. J. Zhejiang Univ. Sci. B 2006, 7, 411–420. [Google Scholar] [CrossRef] [Green Version]
- Siitari, H.; Koivistoinen, H. Proteomics—Challenges and possibilities in Finland. National Technology Agency. Technol. Rev. 2004, 157, 1–36. [Google Scholar]
- Chandramouli, K.; Qian, P.-Y. Proteomics: Challenges, Techniques and Possibilities to Overcome Biological Sample Complexity. Hum. Genom. Proteom. 2009, 1, 239204. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Inf. 2017, 11, 959–975. [Google Scholar] [CrossRef]
- Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016. [Google Scholar]
- Lex, A.; Gehlenborg, N.; Strobelt, H.; Vuillemot, R.; Pfister, H. UpSet: Visualization of Intersecting Sets. IEEE Trans. Vis. Comput. Graph. 2014, 20, 1983–1992. [Google Scholar] [CrossRef] [PubMed]
- Van Eck, N.J.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wei, C.-H.; Allot, A.; Leaman, R.; Lu, Z. PubTator central: Automated concept annotation for biomedical full text articles. Nucleic Acids Res. 2019, 47, W587–W593. [Google Scholar] [CrossRef] [Green Version]
- Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An Open Source Software for Exploring and Manipulating Networks. Available online: https://gephi.org/publications/gephi-bastian-feb09.pdf (accessed on 9 June 2021).
- Tanabe, L.; Scherf, U.; Smith, L.; Lee, J.; Hunter, L.; Weinstein, J. MedMiner: An Internet Text-Mining Tool for Biomedical Information, with Application to Gene Expression Profiling. Biotechniques 1999, 27, 1210–1217. [Google Scholar] [CrossRef] [PubMed]
- Blaschke, C.; Andrade, M.A.; Ouzounis, C.; Valencia, A. Automatic extraction of biological information from scientific text: Protein-protein interactions. Proc. Int. Conf. Intell. Syst. Mol. Boil. 1999, 1999, 60–67. [Google Scholar]
- Srinivasan, P.; Rindflesch, T. Exploring text mining from MEDLINE. Proc. AMIA Symp. 2002, 2002, 722–726. [Google Scholar]
- Leaman, R.; Gonzalez, G. Banner: An Executable Survey of Advances in Biomedical Named Entity Recognition. In Proceedings of the 13th Pacific Symposium on Biocomputing (PSB ′08), Kohala Coast, HI, USA, 4–8 January 2008; 2008; pp. 652–663. [Google Scholar]
- Wei, C.-H.; Kao, H.-Y.; Lu, Z. PubTator: A web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013, 41, W518–W522. [Google Scholar] [CrossRef]
- Hu, Z.-Z.; Mani, I.; Hermoso, V.; Liu, H.; Wu, C.H. iProLINK: An integrated protein resource for literature mining. Comput. Biol. Chem. 2004, 28, 409–416. [Google Scholar] [CrossRef]
- Srisawat, K.; Shepherd, S.O.; Lisboa, P.J.; Burniston, J.G. A Systematic Review and Meta-Analysis of Proteomics Literature on the Response of Human Skeletal Muscle to Obesity/Type 2 Diabetes Mellitus (T2DM) Versus Exercise Training. Proteomes 2017, 5, 30. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Miller, J.A.; Horvath, S.; Geschwind, D.H. Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proc. Natl. Acad. Sci. USA 2010, 107, 12698–12703. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sahu, J.; Panda, D.; Baruah, G.; Patar, L.; Sen, P.; Borah, B.K.; Modi, M.K. Revealing shared differential co-expression profiles in rice infected by virus from reoviridae and sequiviridae group. Gene 2019, 698, 82–91. [Google Scholar] [CrossRef] [PubMed]
- Jurca, G.; Addam, O.; Aksac, A.; Gao, S.; Özyer, T.; Demetrick, D.; Alhajj, R. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends. BMC Res. Notes 2016, 9, 236. [Google Scholar] [CrossRef] [Green Version]
- Szklarczyk, D.; Morris, J.H.; Cook, H.; Kuhn, M.; Wyder, S.; Simonovic, M.; Santos, A.; Doncheva, N.T.; Roth, A.; Bork, P.; et al. The STRING database in 2017: Quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 2017, 45, D362–D368. [Google Scholar] [CrossRef]
- Mallory, E.K.; Zhang, C.; Ré, C.; Altman, R.B. Large-scale extraction of gene interactions from full-text literature using DeepDive. Bioinformatics 2015, 32, 106–113. [Google Scholar] [CrossRef] [Green Version]
- Al-Aamri, A.; Taha, K.; Al-Hammadi, Y.; Maalouf, M.; Homouz, D. Analyzing a co-occurrence gene-interaction network to identify disease-gene association. BMC Bioinform. 2019, 20, 70. [Google Scholar] [CrossRef]
- Haoudi, A.; Bensmail, H. Bioinformatics and data mining in proteomics. Expert Rev. Proteom. 2006, 3, 333–343. [Google Scholar] [CrossRef]
- Couvillion, S.P.; Zhu, Y.; Nagy, G.; Adkins, J.N.; Ansong, C.; Renslow, R.S.; Piehowski, P.; Ibrahim, Y.M.; Kelly, R.T.; Metz, T.O. New mass spectrometry technologies contributing towards comprehensive and high throughput omics analyses of single cells. Analyst 2019, 144, 794–807. [Google Scholar] [CrossRef]
- Sanders, K.L.; Edwards, J.L. Nano-liquid chromatography-mass spectrometry and recent applications in omics investigations. Anal. Methods 2020, 12, 4404–4417. [Google Scholar] [CrossRef]
- Chen, C.; Hou, J.; Tanner, J.J.; Cheng, J. Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis. Int. J. Mol. Sci. 2020, 21, 2873. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ozaki, T.; Nakagawara, A. Role of p53 in Cell Death and Human Cancers. Cancers 2011, 3, 994–1013. [Google Scholar] [CrossRef] [PubMed]
- Bieging, K.T.; Mello, S.S.; Attardi, L.D. Unravelling mechanisms of p53-mediated tumour suppression. Nat. Rev. Cancer 2014, 14, 359–370. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mantovani, F.; Collavin, L.; Del Sal, G. Mutant p53 as a guardian of the cancer cell. Cell Death Differ. 2019, 26, 199–212. [Google Scholar] [CrossRef]
- Feng, Z. p53 regulation of the IGF-1/AKT/mTOR pathways and the endosomal compartment. Cold Spring Harb. Perspect. Biol. 2010, 2, a001057. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sigismund, S.; Avanzato, D.; Lanzetti, L. Emerging functions of the EGFR in cancer. Mol. Oncol. 2018, 12, 3–20. [Google Scholar] [CrossRef]
- Satelli, A.; Li, S. Vimentin in cancer and its potential as a molecular target for cancer therapy. Cell Mol. Life Sci. 2011, 68, 3033–3046. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Han, Z.; Lu, Z.-R. Targeting fibronectin for cancer imaging and therapy. J. Mater. Chem. B 2017, 5, 639–654. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Choi, S.-K.; Kam, H.; Kim, K.-Y.; Park, S.I.; Lee, Y.-S. Targeting Heat Shock Protein 27 in Cancer: A Druggable Target for Cancer Treatment? Cancers 2019, 11, 1195. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kumari, N.; Dwarakanath, B.S.; Das, A.; Bhatt, A.N. Role of interleukin-6 in cancer progression and therapeutic resistance. Tumor Biol. 2016, 37, 11553–11572. [Google Scholar] [CrossRef]
- Grivennikov, S.I.; Karin, M. Inflammatory cytokines in cancer: Tumour necrosis factor and interleukin 6 take the stage. Ann. Rheum. Dis. 2011, 70, i104–i108. [Google Scholar] [CrossRef] [PubMed]
- Zamanian-Daryoush, M.; DiDonato, J.A. Apolipoprotein A-I and Cancer. Front. Pharmacol. 2015, 6, 265. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Parameter | Value |
---|---|
Year range | 1992–2021 |
Highest publication year | 2014 |
Lowest publication year | 1992 |
Total number of journals | 1745 |
Most frequent journal | Proteomics |
Highest no of publications in a journal | 2500 |
Top publishing country | United States |
Sl. No. | Term | Frequency | Frequency (Percentage) |
---|---|---|---|
1 | proteome | 32,273 | 10.228 |
2 | humans | 15,686 | 4.971 |
3 | proteomics | 13,016 | 4.125 |
4 | animals | 11,804 | 3.741 |
5 | electrophoresis, gel, two-dimensional | 5159 | 1.635 |
6 | mass spectrometry | 4803 | 1.522 |
7 | male | 4654 | 1.475 |
8 | female | 4474 | 1.418 |
9 | tandem mass spectrometry | 3615 | 1.146 |
10 | mice | 3419 | 1.084 |
11 | gene expression profiling | 3105 | 0.984 |
12 | spectrometry, mass, matrix-assisted laser desorption-ionization | 2984 | 0.946 |
13 | chromatography, liquid | 2978 | 0.944 |
14 | amino acid sequence | 2969 | 0.941 |
15 | signal transduction | 2947 | 0.934 |
16 | bacterial proteins | 2816 | 0.892 |
17 | molecular sequence data | 2770 | 0.878 |
18 | computational biology | 2372 | 0.752 |
19 | biomarkers | 2262 | 0.717 |
20 | proteins | 2219 | 0.703 |
Bioconcept Class | Total No. of Bioconcepts | No. of Unique Bioconcept Annotations | No. of Bioconcepts with No ID |
---|---|---|---|
CellLine | 318 | 74 | 35 |
Chemical | 80,803 | 3476 | 12,062 |
Disease | 76,456 | 2672 | 3530 |
DNAMutation | 118 | 55 | 0 |
Gene | 65,533 | 12,133 | 0 |
Genus | 33 | 16 | 0 |
ProteinMutation | 377 | 197 | 0 |
SNP | 40 | 36 | 0 |
Species | 98,340 | 4350 | 0 |
Strain | 8 | 3 | 0 |
Sl. No. | ID | Name | Weighted Degree | Degree | Most Interacting Node (ID|Name) | Frequency of Most Interacting Node |
---|---|---|---|---|---|---|
1 | 7157 | p53 | 1414 | 425 | 4609|MYC | 52 |
2 | 7431 | Vimentin | 1174 | 398 | 3315|HSP27 | 22 |
3 | 207 | AKT | 1118 | 325 | 2475|mTOR | 64 |
4 | 3569 | IL-6 | 972 | 289 | 3576|IL-8 | 42 |
5 | 7124 | TNF-alpha | 932 | 314 | 3569|IL-6 | 42 |
6 | 1956 | EGFR | 916 | 303 | 2064|HER2 | 28 |
7 | 335 | Apo A-I | 878 | 244 | 3240|Haptoglobin | 32 |
8 | 2335 | Fibronectin | 862 | 333 | 7431|Vimentin | 14 |
9 | 3315 | HSP27 | 860 | 281 | 7431|Vimentin | 22 |
10 | 2475 | mTOR | 820 | 277 | 207|AKT | 64 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sahu, J. Mining Proteome Research Reports: A Bird’s Eye View. Proteomes 2021, 9, 29. https://doi.org/10.3390/proteomes9020029
Sahu J. Mining Proteome Research Reports: A Bird’s Eye View. Proteomes. 2021; 9(2):29. https://doi.org/10.3390/proteomes9020029
Chicago/Turabian StyleSahu, Jagajjit. 2021. "Mining Proteome Research Reports: A Bird’s Eye View" Proteomes 9, no. 2: 29. https://doi.org/10.3390/proteomes9020029
APA StyleSahu, J. (2021). Mining Proteome Research Reports: A Bird’s Eye View. Proteomes, 9(2), 29. https://doi.org/10.3390/proteomes9020029