Metagenomics data processing and analysis for microbiome
purlPURL: https://gxy.io/GTN:P00008Comment: What is a Learning Pathway?We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.
This learning path aims to teach you the basics of Galaxy and analysis of metagenomics data. You will learn how to use Galaxy for analysis, and will be guided through the common steps of microbiome data analysis: quality control, taxonomic profiling, taxonomic binning, assembly, functional profiling, and also some applications
Module 1: Introduction to metagenomics
Why study the microbiome? What are the different approaches for metagenomics? This module will give you a short introduction to metagenomics.
Time estimation:
Learning Objectives
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Introduction to Microbiome Analysis
|
Module 2: Introduction to Galaxy
Get a first look at the Galaxy platform for data analysis. We start with a short introduction (video slides & practical) to familiarize you with the Galaxy interface, and then proceed with a slightly longer introduction tutorials where you perform a first, very simple, analysis.
Time estimation: 1 hour 40 minutes
Learning Objectives
- Learn how to upload a file
- Learn how to use a tool
- Learn how to view results
- Learn how to view histories
- Learn how to extract and run a workflow
- Learn how to share a history
- Familiarize yourself with the basics of Galaxy
- Learn how to obtain data from external sources
- Learn how to run tools
- Learn how histories work
- Learn how to create a workflow
- Learn how to share your work
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
A short introduction to Galaxy | |||
Galaxy Basics for genomics
|
Module 3: Quality control
When analysing sequencing data, you should always start with a quality control step to clean your data and make sure your data is good enough to answer your research question.
Time estimation: 1 hour 30 minutes
Learning Objectives
- Assess short reads FASTQ quality using FASTQE 🧬😎 and FastQC
- Assess long reads FASTQ quality using Nanoplot and PycoQC
- Perform quality correction with Cutadapt (short reads)
- Summarise quality metrics MultiQC
- Process single-end and paired-end data
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Quality Control
|
Module 4: Community taxonomic profiling
This module covers the following questions:
- Which species (or genera, families, …) are present in my sample?
- What are the different approaches and tools to get the community profile of my sample?
- How can we visualize and compare community profiles?
This module will cover taxonomic profiling in theory and also with an example tutorial.
Time estimation: 3 hours
Learning Objectives
- Explain what taxonomic assignment is
- Explain how taxonomic assignment works
- Apply Kraken and MetaPhlAn to assign taxonomic labels
- Apply Krona and Pavian to visualize results of assignment and understand the output
- Identify taxonomic classification tool that fits best depending on their data
- Inspect metagenomics data
- Run metagenomics tools
- Identify yeast species contained in a sequenced beer sample using DNA
- Visualize the microbiome community of a beer sample
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Taxonomic Profiling and Visualization of Metagenomic Data | |||
Identification of the micro-organisms in a beer using Nanopore sequencing |
Module 5: Community diversity
This module covers the following questions:
- How many different taxons are present in my sample? How do I additionally take their relative abundance into account?
- How similar or how dissimilar are my samples in term of taxonomic diversity?
- What are the different metrics used to calculate the taxonomic diversity of my samples?
Time estimation: 20 minutes
Learning Objectives
- Explain what taxonomic diversity is
- Explain different metrics to calculate α and β diversity
- Apply Krakentools to calculate α and β diversity and understand the output
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Calculating α and β diversity from microbiome taxonomic data |
Module 6: Assembly
This module covers the following questions:
- Why metagenomic data should be assembled?
- What is the difference between co-assembly and individual assembly?
- What is the difference between reads, contigs and scaffolds?
- How tools based on De Bruijn graph work?
- How to assess the quality of metagenomic data assembly?
Time estimation: 2 hours
Learning Objectives
- Describe what an assembly is
- Describe what de-replication is
- Explain the difference between co-assembly and individual assembly
- Explain the difference between reads, contigs and scaffolds
- Explain how tools based on De Bruijn graph work
- Apply appropriate tools for analyzing the quality of metagenomic data
- Construct and apply simple assembly pipelines on short read data
- Apply appropriate tools for analyzing the quality of metagenomic assembly
- Evaluate the Quality of the Assembly with Quast, Bowtie2, and CoverM-Genome
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Assembly of metagenomic sequencing data |
Module 7: Taxonomic binning
This module covers the process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.
Time estimation: 2 hours
Learning Objectives
- Describe what metagenomics binning is
- Describe common problems in metagenomics binning
- What software tools are available for metagenomics binning
- Binning of contigs into metagenome-assembled genomes (MAGs) using MetaBAT 2 software
- Evaluation of MAG quality and completeness using CheckM software
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Binning of metagenomic sequencing data |
Module 8: Applying concepts to metatranscriptomics data
This module covers the following questions:
- How to analyze metatranscriptomics data?
- What information can be extracted of metatranscriptomics data?
- How to assign taxa and function to the identified sequences?
Time estimation: 5 hours
Learning Objectives
- Choose the best approach to analyze metatranscriptomics data
- Understand the functional microbiome characterization using metatranscriptomic results
- Understand where metatranscriptomics fits in 'multi-omic' analysis of microbiomes
- Visualise a community structure
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Metatranscriptomics analysis using microbiome RNA-seq data |
Recommended follow-up tutorials
Time estimation: 5 hours
Learning Objectives
- Perform metagenomics read mapping against mobile genetic element database.
- Evaluate the distribution of mapping scores to identify high-quality alignments.
- Evaluate plasmid coverage to determine effective filtering thresholds.
- Filter alignments based on plasmid coverage and mapping quality.
- Justify the filtering thresholds chosen for identifying plasmid sequences.
- Generate a curated table of plasmid sequences and convert it into a FASTA file for further analysis.
- Use tools to process sequences, ensuring data is sorted, deduplicated, and formatted correctly.
- Annotate features on the identified plasmids using mobile genetic element database annotations.
- Construct a final annotated dataset integrating genetic element information for downstream applications.
- Check quality reports generated by FastQC and NanoPlot for metagenomics Nanopore data
- Preprocess the sequencing data to remove adapters, poor quality base content and host/contaminating reads
- Perform taxonomy profiling indicating and visualizing up to species level in the samples
- Identify pathogens based on the found virulence factor gene products via assembly, identify strains and indicate all antimicrobial resistance genes in samples
- Identify pathogens via SNP calling and build the consensus gemone of the samples
- Relate all samples' pathogenic genes for tracking pathogens via phylogenetic trees and heatmaps
Editorial Board
This material is reviewed by our Editorial Board:
Bérénice BatutFunding
These individuals or organisations provided funding support for the development of this resource