Metabolically Active Microbial Communities in Oilﬁelds: A Systematic Review and Synthesis of RNA Preservation, Extraction, and Sequencing Methods

: Characterizing metabolically active microorganisms using RNA-based methods is a crucial tool for monitoring and mitigating operational issues, such as oil biodegradation and biocorrosion of pipelines in the oil and gas industry. Our review, a pioneering study, addresses the main methods used to preserve, isolate, and sequence RNA from oilﬁeld samples and describes the most abundant metabolically active genera studied. Using the MEDLINE/PubMed, PubMed Central, Scopus, and Web of Science databases, 2.561 potentially eligible records were identiﬁed. After screening, 20 studies were included in our review, underscoring the scarcity of studies related to the subject. Data were extracted and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA). These studies evaluated different samples, including produced water (PW), injection water (IW), solid deposits (SD), oil (OIL), and oily sludge (OS) collected from oilﬁelds located in Australia, China, India, Mexico, and the United Arab Emirates. Environmental samples accounted for 55% of the studies, while enriched cultures and microbial consortia represented 35% and 15% of studies, respectively. PW was the most frequently studied sample, comprising 72% of all samples. Filtration and centrifugation were the only processes employed to concentrate the biomass present in samples. For RNA preservation, the most used method was a solution composed of 95:5 v / v ethanol/TRIzol, while for RNA isolation, the TRIzol reagent was the most cited. The Sanger sequencing method was used in all studies evaluating functional genes ( alkB , dsrA , aprA , assA , and mcrA ), and the Next-Generation Sequencing (NGS) method was employed in studies for sequencing transcripts of the 16S rRNA gene and metatranscriptomes. Pseudomonas (16S rRNA = PW: 2%; IW: 8%; metatranscriptome = PW: 20%) and Acinetobacter (16S rRNA = PW: 1%; IW: 4%; metatranscriptome = PW: 17%) were the most abundant genera. This study outlined the primary methods employed in researching metabolically active microorganisms. These data provide a foundation for future research. However, it is essential to note that we cannot yet determine the most effective method. We hope that this study will inspire further research related to the standardization of RNA preservation, extraction, and sequencing methods and signiﬁcantly contribute to our understanding of active microbial communities in oilﬁelds.


Introduction
In the oil production process, the fluid that reaches the surface from the subsurface is triphasic, containing oil, water, and gas phases [1].During the stage of oil primary recovery, the water phase is composed exclusively of formation water that occurs naturally within the pores of reservoir rocks, while waterflood during secondary recovery occurs through the injection of water (or gas) into reservoirs to repressurize the environment and displace the oil to the producing wells [2][3][4][5].As oil production continues, the amount of produced water increases in relation to the oil and gas phases, and, as a consequence, produced water represents the largest volume of waste stream in oil and gas production operations [2].
The identification and characterization of microbial communities, including hydrocarbondegrading microorganisms (HDM) [14][15][16] and corrosion-influencing microorganisms (CIM) of pipelines [1,10,17,18], are crucial for developing strategies that minimize the biological impacts on oil quality, and transport and storage facilities of fluids resulting from oil production processes [1,10].Traditionally, culture-dependent methods were used to isolate and identify microbial groups from oil reservoirs [4,11,19,20].However, the so-called "culturable" strains usually represent about 1−5% of the total species present in a given environmental sample, casting doubt on the representativeness of the results [21].To overcome these limitations, microbiological molecular methods that are culture-independent have been increasingly used to characterize microbial communities and identify unculturable and rare species present in complex environmental samples from the oil industry [21,22].
RNA transcript sequencing is limited by difficulties in preserving and isolating highquality RNA from environmental samples as RNA is an extremely unstable molecule and susceptible to degradation by the action of RNase enzymes [22].Moreover, an adequate amount of RNA is critical to successfully carry out all stages of the analytical process, which involves the synthesis of complementary DNA (cDNA), construction of cDNA libraries, sequencing, and bioinformatic processing of obtained sequences [29].Consequently, there is a lack of studies reporting the microbial diversity and composition in oil industry samples based on RNA transcripts [22].
The existence of standardized and established protocols for nucleic acid preservation ensures that samples collected in remote areas can be successfully and consistently transported to laboratories with suitable infrastructure for RNA extraction without significantly compromising their integrity [30][31][32].Therefore, it allows for the characterization of metabolically active microbial communities, as well as the identification of rare, less abundant species [32].
In this context, the present review aimed to describe the methods applied in studies that evaluated metabolically active microbial communities (based on RNA) from oil industry samples.Additionally, the dominant genera of active microorganisms (bacteria and archaea) in the analyzed samples were identified.It should be noted that no previous review studies were identified in the literature aimed at active microorganisms in oil reservoirs or oil industry facilities.Therefore, this review is innovative in analyzing methodological data of preservation, extraction, and sequencing of RNA from samples related to the oil industry.

Materials and Methods
This systematic review was designed and carried out according to the guidelines of the preferred reporting items for systematic reviews and meta-analyses (PRISMA) [33].The study consisted of the following steps: (1) identification of records in databases; (2) automated and manual screening of records; (3) assessment of document eligibility and inclusion of selected studies; and (4) synthesis and analysis of data.The PRISMA-S checklist [34] helped to describe the items applicable to these four stages (Supplementary Table S1).

Identification of Records in Databases
The search for records was carried out in the MEDLINE/PubMed (via National Library of Medicine), PubMed Central, Scopus, and Web of Science (Core Collection) databases.Records were identified by searching for combined terms (keywords) using the Boolean operators "OR" and "AND" (Supplementary Table S2).These terms were carefully defined to characterize the sampling points, types of samples, and analyses.The search was carried out in the title or abstract (MEDLINE/PubMed and PubMed Central) and title or abstract or keywords (Scopus and Web of Science) of publications.There were no restrictions on document types, language, and publication date to avoid pre-excluding relevant records.The last access to databases was on 20 January 2023.

Automated and Manual Screening of Records
Data from identified records were exported from databases (MEDLINE/PubMed, PubMed Central, Scopus, and Web of Science) in .csvfile format (Supplementary Table S3).These were converted to the .xlsxformat using the format_input.pyscript (https://github.com/lbmcf/format-input) (Supplementary Table S4).The format_input.pyscript also identified and removed records without DOI and those with identical titles or DOI.
Using the remove_duplicates.pyscript (https://github.com/lbmcf/remove-duplicates), the .xlsxfiles from databases were unified and duplicate data were removed (Supplementary Table S4).Documents corresponding to records listed in the unified file were downloaded in PDF format using an automated program that downloads scientific articles based on their DOI.This program is restricted to the network of the Federal University of Minas Gerais (UFMG), and inaccessible documents (closed access) were categorized as not available (NA).

Assessment of Eligibility and Inclusion of Studies
The downloaded PDF documents were converted to TXT format, using the pdf2txt.pyscript (https://github.com/lbmcf/pdf2txt).Using the search_keywords.pyscript (https: //github.com/lbmcf/search-keywords),an automated screening was conducted on the methodology section in the TXT documents (Supplementary Table S4).For selection, terms analogous to those used in the database record search were applied (Supplementary Table S2).
The selected documents were independently reviewed by two reviewers (R.F.G. and J.d.C.F.D.).In cases of disagreement, a third reviewer (M.S.C.) was consulted to obtain definitive consensual information.According to eligibility criteria (Table 1), the screening was conducted by checking the title and methodology of the documents.Records without DOI, previously removed by the format_input.pyscript, were recovered, and the corresponding documents were manually downloaded (in PDF format).These records were reviewed by title and methodology (Supplementary Table S4).According to the adopted eligibility criteria, studies that evaluated metabolically active microbial communities based on RNA sequencing from samples collected from reservoirs, pipelines, and tanks in the oil industry were included (    Studies were excluded if: (1) they only presented genomic DNA sequencing data; (2) they analyzed samples of soil, fauna and flora (marine and terrestrial) contaminated with oil; (3) they evaluated samples of oil-refined products; (4) they performed experiments with commercial strains or isolated from oilfields in which the isolation conditions and molecular analysis were not specified; and (5) they consisted of a literature review of documents, book chapters, conferences, or similar.

Synthesis and Analysis of Data of Included Studies
The methods used in different stages of development of the studies included in this review were evaluated.Data were collected on (1) sampling, (2) preservation and isolation of RNA, (3) RNA sequencing, and (4) identification of metabolically active microorganisms by analysis of 16S rRNA gene transcripts and metatranscriptome.

Analysis of Data from 16S rRNA Gene Transcripts and Metatranscriptome
All published data were compiled from studies related to the relative abundance of metabolically active microorganisms (Bacteria and Archaea).These data were obtained from the amplicon analyses of 16S rRNA gene transcripts (Supplementary Table S6).To normalize the data, only relative abundances greater than zero (>0) and with valid scientific nomenclature were considered (Supplementary Table S6).
The analysis of metatranscriptomic data was carried out using the SqueezeMeta pipeline, which performs a complete analysis from quality control to taxonomic and functional annotation [36], and default parameters were used for the co-assembly mode.The pipeline also uses Trimmomatic [37] for quality control (trimming and filtering).Assembly was done using Megahit [38], followed by Prodigal [39] for open reading frame (ORF) prediction.The Diamond tool was used for the taxonomic classification of ORFs against the Genbank NR database [40].Using the R package SQMtools v1.6.0 [41], a tabulated file containing the results of absolute taxonomic abundances was exported.
The 16S rRNA transcripts data from studies were grouped and analyzed by sample type, while the metatranscriptomic data were evaluated by study.For the analysis of phyla and genera abundance, the ggplot2 package (v3.3.6) was used [42].To compare the similarity of shared phyla and genera, a Venn diagram was elaborated with the Venn package (v1.11)(https://github.com/dusadrian/venn).

Identification and Selection of Studies
Potentially eligible records were identified in the four selected databases: MED-LINE/PubMed (n = 154), PubMed Central (n = 60), Scopus (n = 1650), and Web of Science (n = 697), totaling 2561 records (Figure 1).Duplicates by title or DOI identified in the Scopus database (n = 7) and those without a DOI detected in the MEDLINE/PubMed (n = 10), Scopus (n = 156), and Web of Science (n = 30) databases were removed (Figure 1), using the format input.pyscript.After removal, 2358 records were tracked (Figure 1).These were unified into a single file, and duplicate records (n = 624) were removed (Figure 1) using the remove_duplicates.pyscript.After duplicate removal, 1734 reports were selected for recovery (Figure 1).Using software with permissions to access UFMG's internal network, 1648 reports were retrieved in PDF format (Figure 1).The remaining 86 reports were not available due to access restrictions (Figure 1).In the methodology section of recovered reports, a keyword screening (Supplementary Table S2) was performed using the pdf2txt.pyand search_keywords.pyscripts.A total of 1457 reports (Figure 1) that did not contain at least one of the determined terms were excluded.Following the eligibility (Table 1) and exclusion criteria, the remaining 191 reports were manually reviewed by title and methodology (Figure 1).Of these, 20 original articles were included in this systematic review for data synthesis and analysis (Figure 1; Supplementary Table S5).It is important to note that the reports corresponding to records without DOI (n = 196) removed at the identification stage were downloaded and reviewed.However, none of these reports were considered eligible for inclusion (Figure 1).
Appl.Microbiol.2023, 3, FOR PEER REVIEW 5 Scopus database (n = 7) and those without a DOI detected in the MEDLINE/PubMed (n = 10), Scopus (n = 156), and Web of Science (n = 30) databases were removed (Figure 1), using the format input.pyscript.After removal, 2358 records were tracked (Figure 1).These were unified into a single file, and duplicate records (n = 624) were removed (Figure 1) using the remove_duplicates.pyscript.After duplicate removal, 1734 reports were selected for recovery (Figure 1).Using software with permissions to access UFMG's internal network, 1648 reports were retrieved in PDF format (Figure 1).The remaining 86 reports were not available due to access restrictions (Figure 1).In the methodology section of recovered reports, a keyword screening (Supplementary Table S2) was performed using the pdf2txt.pyand search_keywords.pyscripts.A total of 1457 reports (Figure 1) that did not contain at least one of the determined terms were excluded.Following the eligibility (Table 1) and exclusion criteria, the remaining 191 reports were manually reviewed by title and methodology (Figure 1).Of these, 20 original articles were included in this systematic review for data synthesis and analysis (Figure 1; Supplementary Table S5).It is important to note that the reports corresponding to records without DOI (n = 196) removed at the identification stage were downloaded and reviewed.However, none of these reports were considered eligible for inclusion (Figure 1).S4). (a) Software based on permissions of the UFMG Figure 1.PRISMA flow diagram [33] of identification, screening, and inclusion of studies in the systematic review.Script1 format_input.py;Script2 remove_duplicates.py;Script3 pdf2txt.py;Script4 search_keywords.py(Supplementary Table S4). (a) Software based on permissions of the UFMG internal network. (b) Keywords (Supplementary Table S2). (c) Eligibility and exclusion criteria (Methodology Section 2.3). (d) Reports screened manually.

Included Studies
The included studies were published between 2011 and 2023 (Figure 2).These studies evaluated metabolically active microorganisms capable of degrading hydrocarbons and organic compounds and/or influencing corrosion processes in oil industry facilities.In 2011, only one study was published on the subject (Figure 2).However, since 2016, there has been a considerable increase in the number of publications, ranging from one to three articles per year, except in 2020, when seven studies were identified (Figure 2).

Included Studies
The included studies were published between 2011 and 2023 (Figure 2).These studies evaluated metabolically active microorganisms capable of degrading hydrocarbons and organic compounds and/or influencing corrosion processes in oil industry facilities.In 2011, only one study was published on the subject (Figure 2).However, since 2016, there has been a considerable increase in the number of publications, ranging from one to three articles per year, except in 2020, when seven studies were identified (Figure 2).The studies evaluated samples collected from oilfields located in Australia, China, India, Mexico, and the United Arab Emirates (UAE) (Table 2).Notably, most of these studies (n = 13; 65%) were carried out with samples from Chinese fields, followed by Australia (n = 4), Mexico (n = 1), India (n =1), and UAE (n = 1) (Table 2).Different types of environmental samples were evaluated, including produced water (PW), injection water (IW), solid deposits (SD), oil (OIL), and oily sludge (OS) (Table 2).The PW, IW, and OIL samples were mostly collected from production and injection wells , and surface pipelines [1,3,4,15,[17][18][19]43,[44][45][46].The OS samples were obtained from storage tanks and fluid passages originating from oil reservoirs [13,14], while the DS was acquired from the internal scraping of pipelines [1].Salgar-Chaparro and Machuca (2019) [1] were the only ones who proposed to evaluate planktonic microbiota circulating in fluids (PW and IW) and sessile microbiota adhered (SD) to the inner walls of the pipelines (Table 2).The studies evaluated samples collected from oilfields located in Australia, China, India, Mexico, and the United Arab Emirates (UAE) (Table 2).Notably, most of these studies (n = 13; 65%) were carried out with samples from Chinese fields, followed by Australia (n = 4), Mexico (n = 1), India (n = 1), and UAE (n = 1) (Table 2).Different types of environmental samples were evaluated, including produced water (PW), injection water (IW), solid deposits (SD), oil (OIL), and oily sludge (OS) (Table 2).The PW, IW, and OIL samples were mostly collected from production and injection wells, and surface pipelines [1,3,4,15,[17][18][19][43][44][45][46].The OS samples were obtained from storage tanks and fluid passages originating from oil reservoirs [13,14], while the DS was acquired from the internal scraping of pipelines [1].Salgar-Chaparro and Machuca (2019) [1] were the only ones who proposed to evaluate planktonic microbiota circulating in fluids (PW and IW) and sessile microbiota adhered (SD) to the inner walls of the pipelines (Table 2).
(b) Description of CIM groups (Supplementary Table S7). (c) Sample type (inoculum) not available in the study, no data (ND).
The filtration process of water samples (PW and IW) was described in only four studies, three of which chose field preprocessing with sample filtration immediately after collection (Table 3).For biomass concentration, these studies used filter membranes with pore sizes of 0.1 µm [1,17,45].In two studies, the microbial cells retained on filters were preserved with a 95:5 v/v ethanol/TRIzol solution, and in one study with RNAprotect Bacteria Reagent (Table 3).In the study by Nazina et al. (2017) [3], the IW sample was preserved with ethanol reagent during collection and, unlike the other three articles, the filtration process with a 0.22 µm diameter pore membrane filter took place in the laboratory (Table 3).Produced water (PW); injection water (IW); solid deposits (SD); oil (OIL); oily sludge (OS); corrosion-influencing microorganisms (CIM); hydrocarbon-degrading microorganisms (HDM); organic compounds degrading microorganisms (OCDM); sulfate-reducing bacteria (SRB); methanogenic hydrocarbon-degrading microorganisms (MET-HDM).Preprocessing: (a) before preserving the samples; (b) after preserving or not preserving the samples. (c) Description of CIM groups (Supplementary Table S7).Manufacturer of RNA extraction reagent and kits: (d) Thermo Fisher Scientific/Invitrogen; (e) QIAGEN; (f) Roche. (g) Mobi. (h) Not declared. (i) Sample type (inoculum) not available in the study, no data (ND).
In the laboratory, 15 studies used centrifugation as preprocessing to concentrate microbial cells, of which six were environmental samples (PW), six were microbial cultures (PW.HDM, IW.HDM, PW.SRB, OS.MET-HDM, and OS.OCDM), and three were consortia (PW.CIM and ND.CIM) (Table 3).To preserve the pellets generated from PW samples, a 95:5 v/v ethanol/ TRIzol solution was used (Table 3).On the other hand, studies of cultures PW.SRB, OS.OCDM, and OS.MET-HDM used different methods to preserve the pellets: RNAprotect bacteria reagent, 20% glycerol, and liquid nitrogen (N2), respectively (Table 3).It is noteworthy that eight studies do not specify the use of RNA preservatives (Table 3).Furthermore, two studies [19,46] did not specify preprocessing and nucleic acid preservative agents (Table 3).
It is highlighted that SD samples [1] and OS samples [13] may present characteristics that make filtration and/or centrifugation a difficult process.As reported by Salgar-Chaparro and Machuca (2019) [1], deposit samples were collected from pipelines covered by approximately 3 cm of schmoo material.Zhou et al. (2022) [14] mention that the OS sample evaluated in their study was a mixture of water (27-46% w/w), crude oil (35-59% w/w), and sand (13-19% w/w).
For RNA extraction, studies used different methods, including commercial kits, TRIzol reagent, and acid phenol chloroform/isoamyl alcohol reagent (Table 3).Most studies (n = 11; 55%) opted for the use of a kit, describing seven types (Table 3).According to information provided by the manufacturers, these kits are specific for extracting RNA from bacteria and samples of water, soil, biofilm/microbiomes, and plants (Table 3).TRIzol reagent was used in four environmental sample studies (PW and IW) and four culture studies (PW.HDM, IW.HDM, OIL.HDM, and OS.OCDM) (Table 3).Zhou et al. (2022) [14] was the only one that used acid phenol chloroform/isoamyl alcohol reagent for RNA isolation from OS.MET-HDM samples (Table 3).
After RNA extraction, five studies did not specify the method of removing residual DNA [13,16,19,35,47].Depletion of ribosomal RNA (rRNA) for enrichment of messenger RNA (mRNA) was described in six studies [4,13,14,16,19,35].The synthesis of complementary DNA (cDNA) from total RNA or from mRNA was carried out in all studies.Liu et al. (2020) [47] was the only study that did not specify the preprocessing of RNA samples before sequencing.

Methods for Amplification and Sequencing RNA
Among the eligibility criteria considered, studies were included that carried out the amplicon sequencing of 16S rRNA and functional gene transcripts, as well as transcriptome and metatranscriptome of studied samples (Table 4).It was observed that amplicon sequencing of 16S rRNA transcripts was the most applied (n = 8; 40%).This sequencing was used to identify metabolically active microorganisms (bacteria and archaea) from environmental samples (PW and IW), cultures (PW.HDM and IW.HDM), and CIM consortia (PW.CIM and ND.CIM) (Table 4).
In the study by Shestakova et al. (2011) [15], alkB gene transcripts were analyzed to investigate the microbial diversity and identify microorganisms responsible for hydrocarbon degradation in cultures enriched with PW (PW.HDM) and IW (IW.HDM) (Table 4).In contrast, two PW studies evaluated the sequencing of mcrA gene transcripts to identify methanogens (Table 4).In addition, the biomarker assA was assessed in three studies to detect the microorganisms responsible for the activation of alkane during anaerobic degradation processes.S7). (c) Metatranscriptomic data was not disclosed. (d) Isolated RNA was used for quantitative PCR (qPCR). (e) Sample type (inoculum) not available in the study, no data (ND).
Two studies were identified that investigated sulfate-reducing microorganisms based on dsrA and aprA transcripts (Table 4).In one of these studies, conducted by Zapata-Peñasco et al. (2016) [4], the dsrA gene was amplified using the pair of primers DSRAV-ibF ('5-CGGCGTTATCGGCCGTTACTG-3 ) and DSRAVibR (5 -GA[A/G]CCCGAACC GCCGAGGTCGG-3 ), designed specifically to recover sequences from the Desulfovibrionales order, obtained from SRB cultures (PW.SRB) (Table 4).In another study conducted by Li et al. (2017) [17], the microbial diversity and composition of the sulfate-reducing community were analyzed, using transcripts from the aprA and dsrA genes of PW samples (Table 4).
After normalization, data from the environmental samples, cultures, and consortia were grouped as follows: (1) PW (PW, PW.HDM, and PW.CIM); (2) IW (IW and IW.HDM); and (3) ND.CIM (Figure 3).This approach was adopted because, in cultures, PW and IW samples were used as an inoculum to maximize the growth of specific microbial groups.Also, different cultures enriched with PW and ND were used to form CIM consortia (Supplementary Table S7).It is important to mention that the ND sample was not grouped with other samples, as it cannot be confirmed whether the fluid was PW, OIL or a PW/OIL mixture.
The aggregated data from PW, IW, and ND (Figure 3, Supplementary Table S6) reveals patterns of sharing and exclusivity, and a Venn diagram analysis was performed to highlight the number of shared and exclusive phyla and genera of metabolically active microorganisms.
The PW sample had a larger number of phyla (n = 20) compared to IW samples (n = 7) and ND (n = 3) (Figure 3).In the comparative analysis, it was observed that PW and IW samples had 13 and two exclusive phyla, respectively.On the other hand, PW and IW shared five bacterial phyla (Bacteroidetes, Deferribacteres, Firmicutes, Ignavibacteriae, and Proteobacteria) (Figures 3 and 4, Supplementary Table S6).Also, the ND sample shares three phyla with PW, two bacterial (Synergistetes and Firmicutes), and one archaeal (Euryarchaeota) (Figures 3 and 4, Supplementary Table S6).No shared phyla were observed between ND and IW (Figure 3).The aggregated data from PW, IW, and ND (Figure 3, Supplementary Table S6) reveals patterns of sharing and exclusivity, and a Venn diagram analysis was performed to highlight the number of shared and exclusive phyla and genera of metabolically active microorganisms.
The PW sample had a larger number of phyla (n = 20) compared to IW samples (n = 7) and ND (n = 3) (Figure 3).In the comparative analysis, it was observed that PW and IW samples had 13 and two exclusive phyla, respectively.On the other hand, PW and IW shared five bacterial phyla (Bacteroidetes, Deferribacteres, Firmicutes, Ignavibacteriae, and Proteobacteria) (Figures 3 and 4, Supplementary Table S6).Also, the ND sample shares three phyla with PW, two bacterial (Synergistetes and Firmicutes), and one archaeal (Euryarchaeota) (Figures 3 and 4, Supplementary Table S6).No shared phyla were observed between ND and IW (Figure 3).As mentioned earlier, PW samples presented a larger quantity of phyla.Consequently, a larger number of genera was identified in PW (n = 186), followed by IW (n = 25) and ND (n = 3) (Figure 3).PW and IW share 12 bacterial genera (Figures 3 and 4).On the other hand, PW and ND share three genera, two belonging to the Bacteria domain (Thermoanaerobacter and Thermovirga) and one belonging to the Archaea domain (Methanothermobacter) (Figures 3 and 4, Supplementary Table S6).Unique genera were also identified, being 171 in PW and 13 in IW (Figure 3, Supplementary Table S6).On the other hand, no exclusive genera were observed in the ND sample (Figure 3).The aggregated data from PW, IW, and ND (Figure 3, Supplementary Table S6) reveals patterns of sharing and exclusivity, and a Venn diagram analysis was performed to highlight the number of shared and exclusive phyla and genera of metabolically active microorganisms.
The PW sample had a larger number of phyla (n = 20) compared to IW samples (n = 7) and ND (n = 3) (Figure 3).In the comparative analysis, it was observed that PW and IW samples had 13 and two exclusive phyla, respectively.On the other hand, PW and IW shared five bacterial phyla (Bacteroidetes, Deferribacteres, Firmicutes, Ignavibacteriae, and Proteobacteria) (Figures 3 and 4, Supplementary Table S6).Also, the ND sample shares three phyla with PW, two bacterial (Synergistetes and Firmicutes), and one archaeal (Euryarchaeota) (Figures 3 and 4, Supplementary Table S6).No shared phyla were observed between ND and IW (Figure 3).As mentioned earlier, PW samples presented a larger quantity of phyla.Consequently, a larger number of genera was identified in PW (n = 186), followed by IW (n = 25) and ND (n = 3) (Figure 3).PW and IW share 12 bacterial genera (Figures 3 and 4).On the other hand, PW and ND share three genera, two belonging to the Bacteria domain (Thermoanaerobacter and Thermovirga) and one belonging to the Archaea domain (Methanothermobacter) (Figures 3 and 4, Supplementary Table S6).Unique genera were also identified, being 171 in PW and 13 in IW (Figure 3, Supplementary Table S6).On the other hand, no exclusive genera were observed in the ND sample (Figure 3).As mentioned earlier, PW samples presented a larger quantity of phyla.Consequently, a larger number of genera was identified in PW (n = 186), followed by IW (n = 25) and ND (n = 3) (Figure 3).PW and IW share 12 bacterial genera (Figures 3 and 4).On the other hand, PW and ND share three genera, two belonging to the Bacteria domain (Thermoanaerobacter and Thermovirga) and one belonging to the Archaea domain (Methanothermobacter) (Figures 3 and 4, Supplementary Table S6).Unique genera were also identified, being 171 in PW and 13 in IW (Figure 3, Supplementary Table S6).On the other hand, no exclusive genera were observed in the ND sample (Figure 3).
Among them, the genus Methanothermobacter is the only representative of the Archaea domain (Figure 4).
Metatranscriptomic data were analyzed in three studies.These evaluated environmental samples (PW) [16] and cultures (PW.HDM and OS.MET-HDM) [14,35].Metatranscriptome data were downloaded from the NCBI [16,35] and NODE databases [14].The composition of the metabolically active community was evaluated separately in each study since raw data were used (Figure 5, Supplementary Table S6).
Metatranscriptomic data were analyzed in three studies.These evaluated environmental samples (PW) [16] and cultures (PW.HDM and OS.MET-HDM) [35,14].Metatranscriptome data were downloaded from the NCBI [16,35] and NODE databases [14].The composition of the metabolically active community was evaluated separately in each study since raw data were used (Figure 5, Supplementary Table S6).Using metatranscriptomic data, a Venn diagram was made to show the number of shared and exclusive phyla and genera of metabolically active microorganisms detected in PW, PW.HDM, and OS.MET-HDM samples (Figure 5, Supplementary Table S6).A total of 65,119, and 164 phyla were identified in the analysis of PW, PW.HDM, and OS.MET-HDM samples, respectively.Of the 65 phyla of PW, 64 are shared with PW.HDM (Figure 5, Supplementary Table S6).The OS.MET-HDM sample presented 44 exclusive phyla.However, it shared 65 phyla with PW and 119 phyla with PW.HDM (Figure 5, Supplementary Table S6).At the genus level, a total of 1864 genera were observed (Figure 5, Supplementary Table S6).Of these, ion in PW, 327 in PW.HDM and 615 in OS.MET-HDM samples were exclusive.
The relative abundance of phyla identified in PW, PW.HDM, and OS.MET-HDM samples was also evaluated using metatranscriptome data.The Proteobacteria phylum, belonging to the Bacteria domain, had the highest abundance in the PW sample (59%), while the Euryarchaeota phylum, belonging to the Archaea domain, was more abundant Using metatranscriptomic data, a Venn diagram was made to show the number of shared and exclusive phyla and genera of metabolically active microorganisms detected in PW, PW.HDM, and OS.MET-HDM samples (Figure 5, Supplementary Table S6).A total of 65,119, and 164 phyla were identified in the analysis of PW, PW.HDM, and OS.MET-HDM samples, respectively.Of the 65 phyla of PW, 64 are shared with PW.HDM (Figure 5, Supplementary Table S6).The OS.MET-HDM sample presented 44 exclusive phyla.However, it shared 65 phyla with PW and 119 phyla with PW.HDM (Figure 5, Supplementary Table S6).At the genus level, a total of 1864 genera were observed (Figure 5, Supplementary Table S6).Of these, ion in PW, 327 in PW.HDM and 615 in OS.MET-HDM samples were exclusive.
The relative abundance of phyla identified in PW, PW.HDM, and OS.MET-HDM samples was also evaluated using metatranscriptome data.The Proteobacteria phylum, belonging to the Bacteria domain, had the highest abundance in the PW sample (59%), while the Euryarchaeota phylum, belonging to the Archaea domain, was more abundant in the PW.HDM (46%) and OS.MET-HDM (27%) samples (Figure 6, Supplementary Table S6).

Identification and Selection of Studies
The present systematic review was based on the PRISMA guidelines and aimed to evaluate the methods applied in studies based on RNA that analyzed metabolically active microbial communities from oilfields.In the four selected databases (MEDLINE/PubMed, PubMed Central, Scopus and Web of Science), 2561 potentially eligible records were identified.After automated and manual screening, only 20 eligible original articles were included, highlighting the scarcity of studies directed at the analysis of 16S rRNA transcripts and functional genes, as well as transcriptome and metatranscriptome of samples of reservoirs and oil industry facilities.It is noteworthy that, in recent decades, several studies have investigated microbial communities from the oil industry [10,[23][24][25][26][27][28]30,31].However, most of these analyses are primarily focused on DNA sequences.
Probably, the limited number of studies directed at active microorganisms is attributed to the challenges associated with preserving and isolating high-quality RNA [22].RNA is an inherently unstable molecule and highly vulnerable to degradation caused by RNases [30].This instability represents a significant challenge in obtaining reliable results for the characterization of active microbial communities.

Studies Included
The studies analyzed environmental samples (PW, IW, OIL, and SD) collected from oilfields in Australia, China, India, Mexico, and the United Arab Emirates.The majority of evaluated samples were from PW, which is the most abundant waste stream of the oil industry and, consequently, one of the most available sources of samples in oilfield

Identification and Selection of Studies
The present systematic review was based on the PRISMA guidelines and aimed to evaluate the methods applied in studies based on RNA that analyzed metabolically active microbial communities from oilfields.In the four selected databases (MEDLINE/PubMed, PubMed Central, Scopus and Web of Science), 2561 potentially eligible records were identified.After automated and manual screening, only 20 eligible original articles were included, highlighting the scarcity of studies directed at the analysis of 16S rRNA transcripts and functional genes, as well as transcriptome and metatranscriptome of samples of reservoirs and oil industry facilities.It is noteworthy that, in recent decades, several studies have investigated microbial communities from the oil industry [10,[23][24][25][26][27][28]30,31].However, most of these analyses are primarily focused on DNA sequences.
Probably, the limited number of studies directed at active microorganisms is attributed to the challenges associated with preserving and isolating high-quality RNA [22].RNA is an inherently unstable molecule and highly vulnerable to degradation caused by RNases [30].This instability represents a significant challenge in obtaining reliable results for the characterization of active microbial communities.

Studies Included
The studies analyzed environmental samples (PW, IW, OIL, and SD) collected from oilfields in Australia, China, India, Mexico, and the United Arab Emirates.The majority of evaluated samples were from PW, which is the most abundant waste stream of the oil industry and, consequently, one of the most available sources of samples in oilfield systems [2,31].Salgar-Chaparro and Machuca (2019) [1] were the only authors who investigated both the planktonic microorganisms in circulating fluids and the sessile microbiota adhered to the inner walls of the pipelines.The comprehensive analysis of both fluids (PW, IW, and OIL) and solids (sediments, sludge, biofilm, pig residue, or similar) provides complementary insights into microbial activities primarily involved in hydrocarbon biodegradation and pipeline biocorrosion processes [21].
Therefore, the analysis of metabolically active microbial communities using RNA sequencing provides valuable insights into identifying microorganisms involved in hydrocarbon biodegradation and corrosion in oil industry facilities [16,35,43,44,47].High-quality RNA sequencing allows for comprehensive coverage and characterization of active microorganisms [32], but the successful application of these methods depends primarily on effective protocols to preserve RNA integrity [30,32].

RNA Preprocessing, Preservation, and Extraction Method
Despite the implementation of effective protocols for sample collection, transportation, and temporary storage, the use of nucleic acid preservatives is increasingly employed, particularly for samples collected from remote areas such as offshore petroleum platforms [22].In the studies included in the review, various preservation methods were identified.The 95:5 v/v ethanol/TRIzol solution was mentioned in eight out of 20 studies.However, information regarding the impact of preservatives on the RNA integrity of oilfield samples remains limited.
The evaluated studies employed various methods for RNA isolation, including the use of commercial kits, TRIzol reagent, and acid phenol chloroform/isoamyl alcohol reagent (the TRIzol reagent was utilized in eight studies).While nucleic acid extraction protocols should be tailored to the specific characteristics of each sample [32], the TRIzol method stands out as a rapid, accessible, and cost-effective protocol [50].Its favorable attributes likely contribute to its widespread use for RNA isolation across diverse sample types, such as medicinal plants [51], human visceral adipose tissue [50], the SARS-CoV-2 virus [52], and fungi [53].
It is worth noting that filtration and centrifugation processes were mentioned in four studies and 14 studies, respectively.While aqueous fluids such as PW and IW are commonly sampled in petroleum systems, these samples typically exhibit low biomass and diversity [31,54].To address this limitation, larger sample volumes are collected and subjected to preprocessing steps to concentrate biomass and facilitate the isolation of nucleic acids [1,10,17,31,45,54].

RNA Amplification and Sequencing Method
Among the molecular methods mentioned in the studies, sequencing of 16S rRNA gene transcripts was the most implemented approach for studying the active microbiota.However, there is a challenge in interpreting the data due to variations in the number of copies of the 16S rRNA gene among species, ranging from one to 15 in Bacteria and from one to four in Archaea [55,56].Some studies suggest that the number of ribosomes increases in actively growing cells, and ribosomal RNA analysis can be used to identify metabolically active cell forms [55,56].However, evidence indicates that the use of rRNA as an indicator of microbial activity has limitations.
Blazewicz et al. ( 2013) [57] reviewed several studies and concluded that (a) rRNA concentration and growth rate are not always correlated, (b) the relationship between rRNA concentration and growth rate can vary significantly among taxa, and (c) dormant cells can contain a large number of ribosomes.While there are inherent uncertainties in 16S rRNA gene sequencing, and DNA-based analysis cannot differentiate active species among all those present in the environment, the comparison of DNA and RNA libraries provides a more comprehensive characterization of microorganisms in petroleum environments [1,3,17,22,48,49].
One study in the review utilized the sequencing of the functional gene alkB, which codes for the enzyme alkane hydroxylase, to investigate microorganisms involved in the aerobic degradation of hydrocarbons [15].In contrast, three studies sequenced the functional genes assA and mcrA to analyze microorganisms associated with the anaerobic degradation of hydrocarbons and methanogenesis, respectively.The assA gene has been employed as a biomarker for detecting microorganisms responsible for the initial activation of alkanes during the anaerobic degradation process [48,49].Meanwhile, the mcrA functional gene is widely recognized as a biomarker for identifying methanogens [45,48,49].
The HDM can use hydrocarbons as a source of energy and carbon [58,59], preferably alkanes, cyclic and aromatic, and they metabolize organic and carbon compounds, drastically reducing oil quality [45,60].With increasing levels of biodegradation, the content of asphaltenes and resins, acidity, and oil viscosity increase, while the content of saturated and aromatic hydrocarbons decreases [60,61].These alterations have a negative effect on oil production by reducing the flow rates from reservoirs, as well as refining operations, and increasing process costs [61,62].
In turn, CIM represents another significant concern for the oil industry since biocorrosion processes involve the degradation of metallic materials caused by the presence and activity of various microorganisms that are in direct or close contact with the metal surface [43,63].It is estimated that MIC contributes to nearly 40% of internal corrosion problems and 20−30% of external pipeline corrosion [63].Corroborating the data obtained in this review, sulfate-reducing bacteria (SRB) have been widely cited in studies related to pipeline corrosion [4,10].Functional genes such as aprA and drsA, encoding the enzymes adenosine-5-reductase and dissimilatory sulfate reductase, respectively, are used to track SRB [4,17].
Metatranscriptomic analysis has been evaluated with data from three studies.This analysis is considered a powerful technique that enables the examination of the gene expression profile of microorganisms present in a sample, providing valuable insights into their metabolic activity in a given environment [64].Despite the advancements in microbiome studies of oilfield samples, there is still a notable scarcity of studies analyzing the metatranscriptome, and this can be attributed to several challenges, including the extraction of high-quality RNA, the presence of inhibitors, limited quantities of genetic material in the samples, and rapid RNA degradation.Additionally, the collection and proper preservation of samples for metatranscriptomic analysis can be complex due to the demanding conditions of the petroleum environment, such as high pressure and temperature, as well as the presence of toxic compounds.
Analyzing the metatranscriptome from culture samples, it was observed that the phylum Euryarchaeota showed greater abundance (PW.HDM: 46% and OS.MET-HDM: 27%), a phylum affiliated with the Archaea domain, which harbors hydrogenotrophic and acetotrophic methanogens microorganisms [69].In oil reservoirs, under methanogenic conditions, hydrocarbon degradation occurs when communities of bacteria and archaea cooperate syntrophically to produce methane through thermodynamically favorable pathways [70].
Analyzing metatranscriptomic data at the genus level, it was observed that in the samples from the PW.HDM cultures, the most abundant genera were Thermococcus (29%) and Methanothermobacter (11%).While in the study that evaluated OS.MET-HDM cultures, the most abundant genera were "Candidatus Methanoliparum" (22%) and Methanothrix (18%).The genera Thermococcus, Methanothermobacter, and Methanothrix belong to the phylum Euryarchaeota and are often detected in oil reservoirs [45].

Conclusions
This systematic review reveals the limited research on RNA-based analysis of metabolically active microbial communities in oilfields.Despite substantial prior research on microbial communities in the oil industry, there is a significant gap in the analysis of 16S rRNA transcripts, functional genes, transcriptomes, and metatranscriptomes.It is believed that challenges related to RNA preservation and isolation have hindered progress in this area.The selected studies predominantly focus on produced water samples, highlighting the need for a broader exploration of various sample types within oilfield systems.Additionally, metatranscriptomic analysis remains underutilized due to its technical complexities.The predominance of Proteobacteria and specific genera like Pseudomonas and Acinetobacter underscores the importance of studying these microorganisms' roles in hydrocarbon degradation.Furthermore, the presence of Euryarchaeota points to the significance of methanogenic processes in oil reservoirs.This review serves as a call to action for further research in this area, emphasizing the importance of standardizing RNA preservation, extraction, and sequencing methods.Advancements in RNA-based analysis have the potential to significantly enhance our understanding of active microbial communities in oilfields and their impacts on the industry.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/applmicrobiol3040079/s1.Table S1: PRISMA-S checklist.Table S2: Terms used in the search for records in databases and selection of documents by automation tools.Table S3: Records identified in the databases.Table S4: Automated and manual screening of records.Table S5.References, title, and DOI of the studies included in the present systematic review Table S5.Composition of the metabolically active microorganisms.Table S6.References, title, and DOI of the original articles are included in the present systematic review.Table S7: Description of the microbial groups corrosion-influencing microorganisms.

Figure 2 .
Figure 2. Number of publications per year of studies that evaluated metabolically active microorganisms in oilfields.

Figure 2 .
Figure 2. Number of publications per year of studies that evaluated metabolically active microorganisms in oilfields.

Figure 3 .
Figure 3. Venn diagram presents the number of shared and exclusive (A) phyla and (B) genera of metabolically active microorganisms based on RNA obtained from the analyses of 16S rRNA gene transcript amplicons.Sample type: produced water (PW) and injection water (IW), no data (ND).ND: not available in the study.

Figure 4 .
Figure 4. Relative abundance of metabolically active microorganisms based on RNA at the (A) phylum and (B) genus levels obtained from 16S rRNA gene transcript amplicon data.Sample type: produced water (PW) and injection water (IW), no data (ND).ND: not available in the study.

Figure 3 .
Figure 3. Venn diagram presents the number of shared and exclusive (A) phyla and (B) genera of metabolically active microorganisms based on RNA obtained from the analyses of 16S rRNA gene transcript amplicons.Sample type: produced water (PW) and injection water (IW), no data (ND).ND: not available in the study.

Figure 3 .
Figure 3. Venn diagram presents the number of shared and exclusive (A) phyla and (B) genera of metabolically active microorganisms based on RNA obtained from the analyses of 16S rRNA gene transcript amplicons.Sample type: produced water (PW) and injection water (IW), no data (ND).ND: not available in the study.

Figure 4 .
Figure 4. Relative abundance of metabolically active microorganisms based on RNA at the (A) phylum and (B) genus levels obtained from 16S rRNA gene transcript amplicon data.Sample type: produced water (PW) and injection water (IW), no data (ND).ND: not available in the study.

Figure 4 .
Figure 4. Relative abundance of metabolically active microorganisms based on RNA at the (A) phylum and (B) genus levels obtained from 16S rRNA gene transcript amplicon data.Sample type: produced water (PW) and injection water (IW), no data (ND).ND: not available in the study.

Figure 5 .
Figure 5.The Venn diagram shows the number of shared and exclusive (A) phyla and (B) genera of metabolically active microorganisms based on RNA obtained from metatranscriptomic data.Produced water (PW); PW hydrocarbon-degrading microorganisms (PW.HDM) and Oily sludge (OS) methanogenic hydrocarbon-degrading microorganisms (OS.MET-HDM).

Figure 5 .
Figure 5.The Venn diagram shows the number of shared and exclusive (A) phyla and (B) genera of metabolically active microorganisms based on RNA obtained from metatranscriptomic data.Produced water (PW); PW hydrocarbon-degrading microorganisms (PW.HDM) and Oily sludge (OS) methanogenic hydrocarbon-degrading microorganisms (OS.MET-HDM).

Table 1 .
Eligibility criteria for the inclusion of articles in the systematic review.

Table 2 .
Reference, country, and samples studied in the articles included in this systematic review.

Table 2 .
Reference, country, and samples studied in the articles included in this systematic review.

Table 3 .
Methods of preprocessing, preservation, and extraction of RNA from the studied samples in the articles included in this systematic review.

Table 4 .
Methods for amplification and sequencing of RNA transcripts from the samples studied in the articles included in this systematic review.
(b)Description of CIM groups (Supplementary Table