Next Article in Journal
Spectrogram Data Set for Deep-Learning-Based RF Frame Detection
Previous Article in Journal
Forecasting Daily COVID-19 Case Counts Using Aggregate Mobility Statistics
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Data Descriptor

Database of Metagenomes of Sediments from Estuarine Aquaculture Farms in Portugal—AquaRAM Project Collection

INIAV—National Institute for Agrarian and Veterinary Research, IP, 2780-157 Oeiras, Portugal
cE3c—Center for Ecology, Evolution and Environmental Changes & CHANGE—Global Change and Sustainability Institute, 1749-016 Lisbon, Portugal
CIBIO—Centro de Investigação em Biodiversidade e Recursos Genéticos, 4485-661 Vairão, Portugal
Author to whom correspondence should be addressed.
Data 2022, 7(11), 167;
Submission received: 4 October 2022 / Revised: 21 October 2022 / Accepted: 27 October 2022 / Published: 20 November 2022


Aquaculture farms and estuarine environments close to human activities play a critical role in the interaction between aquatic and terrestrial surroundings and animal and human health. The AquaRAM project aimed to study estuarine aquaculture farms in Portugal as a reservoir of antibiotic resistance genes and the potential of its spread due to mobile genetic elements. We have assembled a collection of metagenomic data from 30 sediment samples from oysters, mussels, and gilt-head sea bream aquaculture farms. This collection includes samples of the estuarine environment of three rivers and one lagoon located from the north to the south of Portugal, namely, the Lima River in Viana do Castelo, Aveiro Lagoon in Aveiro, Tagus River in Alcochete, and Sado River in Setúbal. Statistical data from the raw metagenome files, as well as the file sizes of the assembled nucleotide and protein sequences, are also presented. The link to the statistics and the download page for all the metagenomes is also listed below.
Dataset License: CC-BY-NC

1. Summary

In Portugal, more than 80 aquaculture projects are underway through the March 2020 operational program (, accessed on 25 September 2022). These projects aim to increase the productivity and diversification of seafood products, namely sea bass, sea bream, oysters, clams, and mussels, and to mitigate their diseases. Furthermore, the AquaRAM (Antimicrobial Resistance Determinants in Aquaculture Environments) project (, accessed on 25 September 2022) is an exploratory study with different funding and purposes, which aims to identify reservoirs of antimicrobial resistance by characterizing the antimicrobial resistance profile (resistome) and the mobilome (the set of mobile genetic elements) of aquaculture environments. Antibiotics are often used to mitigate diseases in aquaculture but can also increase antimicrobial resistance in existing bacteria, which can be transmitted to the environment and to humans through food consumption. Thus, the AquaRAM project provides information on the consequences of the overuse of antibiotics in aquaculture to control diseases that are increasing in food production, complementing the March 2020 projects. To our knowledge, AquaRAM is the first study in Portugal on this subject, while in China and Scandinavian countries, many researchers have addressed this issue.
This project focused on the extensive and semi-intensive aquaculture of shellfish (oysters and mussels) and gilt-head sea bream located in several Portuguese river estuaries to assess the possible risk to public health [1]. Since aquaculture environments are at the hinge between animal and human health, these data are crucial to implementing new strategies to control multi-resistant bacteria in aquaculture environments.
During the development of this project, a collection of metagenomes was generated whose data we grouped in this publication. It has been used to study the role of estuarine aquacultures in the spread of antibiotic resistance in animal production [1]. It could also be used to perform comparative metagenomics studies, through phylogenetic, functional, and metabolic diversity analysis, with other metagenomes from aquatic environments.

2. Data Description

2.1. Metagenomes Metadata

Figure 1 shows the geographical location of the sediment samples that were used in this work. These were collected from Portuguese aquaculture farms located in the estuaries of the Lima River in Viana do Castelo, the Aveiro Lagoon in Aveiro, the Tagus River in Alcochete, and the Sado River in Setúbal.
Table 1 summarizes the information on the collection of samples.

2.2. Metagenomse Statistics

All samples described in this work were sequenced by Illumina whole genome sequencing, and the metagenomes were processed and assembled on the MG-RAST (metagenomic rapid annotations using subsystems technology) server that performs an automatic phylogenetic and functional analysis of metagenomes. All generated files were stored in the MG-RAST server [1]. Table 2 summarizes the statistics of each metagenome, along with the link to the download page for each one.
The Shannon alpha diversity index summarizes the phylogenetic abundances in each metagenome in terms of their richness (the number of species/clades) and evenness (inequalities between species abundances).
The average G/C ratio, base pairs count, and sequence counts of the raw sequence reads are also presented. The duplicate read inferred sequencing error estimation (DRISEE) score of the raw sequence gives an estimation of the sequencing error [2].
The columns “Bp count assembled” and “protein count 90%id” present two other measures regarding the size of the assembled metagenomes, namely, the number of base pairs of the assembled metagenomes and the number of proteins clustered with a 90% identity.

2.3. Metagenomes Database

Table 3 brings together the information on all the identifiers of each metagenome, as well as the hyperlink to the page with the link to all the downloadable MG-RAST pipeline output files.

3. Methods

At each site, samples were taken from the top 5 cm layer, stored in sterile polypropylene centrifuge tubes (10 mL), and immediately transported refrigerated to the laboratory where they were kept at −20 °C until analysis. The total DNA was extracted from the frozen sediment samples using the DNeasy PowerSoil Pro Kit (, accessed on 25 September 2022). DNA was sent to CeGaT (, accessed on 25 September 2022) in Germany in frozen isothermal boxes. Libraries for shotgun metagenome sequencing were prepared using an Illumina DNA (M) Tagmentation Library Prep kit from Illumina, followed by NovaSeq 6000, 2 × 100 bp sequencing. The demultiplexing of the sequencing reads was performed with Illumina bcl2fastq (2.20), and adapters were trimmed with Skewer (version 0.2.2) [3]. The quality of the FASTQ files was analyzed with FastQC (version 0.11.5-cegat) [4]. The Illumina raw sequence data file pairs in the FASTQ format were assembled on the MG-RAST metagenomic analysis server [5]. The pipeline options chosen were the removal of artificial replicate sequences, any host-specific Homo sapiens NCBI v36 specie sequence, as well as low-quality sequences. The lowest Phred scores to count as a high-quality base were set to 15 and trimmed to, at most, 5 low Phred score bases. All the resulting files can be assessed at (accessed on 25 September 2022).
Programmatic access to the files containing the assembled DNA contigs and protein sequences and the computation of the size of each metagenome was performed through bash scripting in a Linux environment.

4. User Notes

The MG-RAST server enables programmatical access to the collection AquaRAM using the following paths:<Metagenome id>?file=299.1, accessed on 25 September 2022, (FASTA format files containing the assembled DNA contigs); and<Metagenome id>?file=550.2, accessed on 25 September 2022, (FASTA format files containing the protein sequences), clustered at a 90% identity, where the string <Metagenome id> is listed in the first column of Table 2.

Author Contributions

Conceptualization, T.N.; methodology, T.N.; investigation, T.N., D.G.S. and S.L.; resources, A.B.; data curation, D.G.S. and T.N.; writing—original draft preparation, T.N.; writing—review and editing, D.G.S.; supervision, T.N.; project administration, T.N. and A.B.; funding acquisition, A.B. All authors have read and agreed to the published version of the manuscript.


This research and the APC were funded by Fundação para a Ciência e a Tecnologia, grant number ALG-01-0145-FEDER-028824 and PTDC/BIA-MIC/28824/2017.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset generated in this study is publicly archived in the MG-RAST repository: (accessed on 25 September 2022).


The authors would like to thank the team members of the AquaRAM project at INIAV (Cristina Ferreira, Sandra Cavaco, Leonor Orge, and Andreia Freitas), and at IPMA (Florbela Soares, Ana Maulvault, and Patrícia Anacleto) for providing, collecting, and preparing the sediment samples used in this work.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.


  1. Silva, D.G.; Domingues, C.P.F.; Figueiredo, J.F.; Dionisio, F.; Botelho, A.; Nogueira, T. Estuarine Aquacultures at the Crossroads of Animal Production and Antibacterial Resistance: A Metagenomic Approach to the Resistome. Biology 2022, 11, 1681. [Google Scholar] [CrossRef]
  2. Keegan, K.P.; Trimble, W.L.; Wilkening, J.; Wilke, A.; Harrison, T.; D’Souza, M.; Meyer, F. A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE. PLoS Comput. Biol. 2012, 8, e1002541. [Google Scholar] [CrossRef] [PubMed]
  3. Jiang, H.; Lei, R.; Ding, S.-W.; Zhu, S. Skewer: A Fast and Accurate Adapter Trimmer for next-Generation Sequencing Paired-End Reads. BMC Bioinform. 2014, 15, 182. [Google Scholar] [CrossRef] [PubMed]
  4. Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data; Babraham Bioinformatics, Babraham Institute: Cambridge, UK, 2010. [Google Scholar]
  5. Meyer, F.; Paarmann, D.; D’Souza, M.; Olson, R.; Glass, E.; Kubal, M.; Paczian, T.; Rodriguez, A.; Stevens, R.; Wilke, A.; et al. The Metagenomics RAST Server–a Public Resource for the Automatic Phylogenetic and Functional Analysis of Metagenomes. BMC Bioinform. 2008, 9, 386. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Map of mainland Portugal with the geographical regions where the samples of sediments from the aquaculture farms were collected. Regions where the estuaries of Lima River at Viana do Castelo (orange), Aveiro lagoon at Aveiro (salmon), Tagus River at Alcochete (dark blue), and Sado River (turquoise) are represented.
Figure 1. Map of mainland Portugal with the geographical regions where the samples of sediments from the aquaculture farms were collected. Regions where the estuaries of Lima River at Viana do Castelo (orange), Aveiro lagoon at Aveiro (salmon), Tagus River at Alcochete (dark blue), and Sado River (turquoise) are represented.
Data 07 00167 g001
Table 1. Descriptors of the metagenomes samples.
Table 1. Descriptors of the metagenomes samples.
Metagenome IDLocationRegionAquaculture Farm
mgm4949363.3Aveiro LagoonAveiroOyster
mgm4949369.3Aveiro LagoonAveiroOyster
mgm4949371.3Aveiro LagoonAveiroOyster
mgm4949372.3Aveiro LagoonAveiroOyster
mgm4949374.3Lima RiverViana do CasteloOyster
mgm4949362.3Lima RiverViana do CasteloOyster
mgm4949373.3Lima RiverViana do CasteloOyster
mgm4949367.3Tagus RiverAlcocheteMussel
mgm4949365.3Tagus RiverAlcocheteMussel
mgm4949364.3Tagus RiverAlcocheteMussel
mgm4949366.3Tagus RiverAlcocheteMussel
mgm4949368.3Tagus RiverAlcocheteMussel
mgm4949370.3Tagus RiverAlcocheteMussel
mgm4954830.3Sado RiverSetúbalOyster
mgm4954821.3Sado RiverSetúbalOyster
mgm4954835.3Sado RiverSetúbalGilt-head bream
mgm4954834.3Sado RiverSetúbalGilt-head bream
mgm4954833.3Sado RiverSetúbalGilt-head bream
mgm4954831.3Sado RiverSetúbalGilt-head bream
mgm4954819.3Sado RiverSetúbalOyster
mgm4954820.3Sado RiverSetúbalOyster
mgm4954822.3Sado RiverSetúbalOyster
mgm4954818.3Sado RiverSetúbalOyster
mgm4954814.3Sado RiverSetúbalOyster
mgm4954832.3Sado RiverSetúbalOyster
mgm4954828.3Sado RiverSetúbalOyster
mgm4954836.3Sado RiverSetúbalOyster
mgm4954815.3Sado RiverSetúbalOyster
mgm4965131.3Sado RiverSetúbalOyster
mgm4954827.3Sado RiverSetúbalOyster
Table 2. Statistics of the metagenomes.
Table 2. Statistics of the metagenomes.
Metagenome IDAlpha
Diversity Shannon
Average G/C Ratio RawSequence Count RawDrisee Score RawBp Count RawBp Count AssembledProtein Count 90%IDLinks (Accessed on 1 October 2022)
Table 3. Identifiers and links to the metagenomes’ files.
Table 3. Identifiers and links to the metagenomes’ files.
Metagenome IDLibrary IDLibrary NameSample IDLinks (Accessed on 1 October 2022)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Nogueira, T.; Silva, D.G.; Lopes, S.; Botelho, A. Database of Metagenomes of Sediments from Estuarine Aquaculture Farms in Portugal—AquaRAM Project Collection. Data 2022, 7, 167.

AMA Style

Nogueira T, Silva DG, Lopes S, Botelho A. Database of Metagenomes of Sediments from Estuarine Aquaculture Farms in Portugal—AquaRAM Project Collection. Data. 2022; 7(11):167.

Chicago/Turabian Style

Nogueira, Teresa, Daniel G. Silva, Susana Lopes, and Ana Botelho. 2022. "Database of Metagenomes of Sediments from Estuarine Aquaculture Farms in Portugal—AquaRAM Project Collection" Data 7, no. 11: 167.

Article Metrics

Back to TopTop