Metagenomics data processing and analysis for microbiome

purlPURL: https://gxy.io/GTN:P00008
Comment: What is a Learning Pathway?
A graphic depicting a winding path from a start symbol to a trophy, with tutorials along the way
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

This learning path aims to teach you the basics of Galaxy and analysis of metagenomics data. You will learn how to use Galaxy for analysis, and will be guided through the common steps of microbiome data analysis: quality control, taxonomic profiling, taxonomic binning, assembly, functional profiling, and also some applications

Module 1: Introduction to metagenomics

Why study the microbiome? What are the different approaches for metagenomics? This module will give you a short introduction to metagenomics.

Time estimation:

Learning Objectives
Lesson Slides Hands-on Recordings
Introduction to Microbiome Analysis

Module 2: Introduction to Galaxy

Get a first look at the Galaxy platform for data analysis. We start with a short introduction (video slides & practical) to familiarize you with the Galaxy interface, and then proceed with a slightly longer introduction tutorials where you perform a first, very simple, analysis.

Time estimation: 1 hour 40 minutes

Learning Objectives
  • Learn how to upload a file
  • Learn how to use a tool
  • Learn how to view results
  • Learn how to view histories
  • Learn how to extract and run a workflow
  • Learn how to share a history
  • Familiarize yourself with the basics of Galaxy
  • Learn how to obtain data from external sources
  • Learn how to run tools
  • Learn how histories work
  • Learn how to create a workflow
  • Learn how to share your work
Lesson Slides Hands-on Recordings
A short introduction to Galaxy
Galaxy Basics for genomics

Module 3: Quality control

When analysing sequencing data, you should always start with a quality control step to clean your data and make sure your data is good enough to answer your research question.

Time estimation: 1 hour 30 minutes

Learning Objectives
  • Assess short reads FASTQ quality using FASTQE 🧬😎 and FastQC
  • Assess long reads FASTQ quality using Nanoplot and PycoQC
  • Perform quality correction with Cutadapt (short reads)
  • Summarise quality metrics MultiQC
  • Process single-end and paired-end data
Lesson Slides Hands-on Recordings
Quality Control

Module 4: Community taxonomic profiling

This module covers the following questions:

This module will cover taxonomic profiling in theory and also with an example tutorial.

Time estimation: 3 hours

Learning Objectives
  • Explain what taxonomic assignment is
  • Explain how taxonomic assignment works
  • Apply Kraken and MetaPhlAn to assign taxonomic labels
  • Apply Krona and Pavian to visualize results of assignment and understand the output
  • Identify taxonomic classification tool that fits best depending on their data
  • Inspect metagenomics data
  • Run metagenomics tools
  • Identify yeast species contained in a sequenced beer sample using DNA
  • Visualize the microbiome community of a beer sample
Lesson Slides Hands-on Recordings
Taxonomic Profiling and Visualization of Metagenomic Data
Identification of the micro-organisms in a beer using Nanopore sequencing

Module 5: Community diversity

This module covers the following questions:

Time estimation: 20 minutes

Learning Objectives
  • Explain what taxonomic diversity is
  • Explain different metrics to calculate α and β diversity
  • Apply Krakentools to calculate α and β diversity and understand the output
Lesson Slides Hands-on Recordings
Calculating α and β diversity from microbiome taxonomic data

Module 6: Assembly

This module covers the following questions:

Time estimation: 2 hours

Learning Objectives
  • Describe what an assembly is
  • Describe what de-replication is
  • Explain the difference between co-assembly and individual assembly
  • Explain the difference between reads, contigs and scaffolds
  • Explain how tools based on De Bruijn graph work
  • Apply appropriate tools for analyzing the quality of metagenomic data
  • Construct and apply simple assembly pipelines on short read data
  • Apply appropriate tools for analyzing the quality of metagenomic assembly
  • Evaluate the Quality of the Assembly with Quast, Bowtie2, and CoverM-Genome
Lesson Slides Hands-on Recordings
Assembly of metagenomic sequencing data

Module 7: Taxonomic binning

This module covers the process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.

Time estimation: 2 hours

Learning Objectives
  • Describe what metagenomics binning is
  • Describe common problems in metagenomics binning
  • What software tools are available for metagenomics binning
  • Binning of contigs into metagenome-assembled genomes (MAGs) using MetaBAT 2 software
  • Evaluation of MAG quality and completeness using CheckM software
Lesson Slides Hands-on Recordings
Binning of metagenomic sequencing data

Module 8: Applying concepts to metatranscriptomics data

This module covers the following questions:

Time estimation: 5 hours

Learning Objectives
  • Choose the best approach to analyze metatranscriptomics data
  • Understand the functional microbiome characterization using metatranscriptomic results
  • Understand where metatranscriptomics fits in 'multi-omic' analysis of microbiomes
  • Visualise a community structure
Lesson Slides Hands-on Recordings
Metatranscriptomics analysis using microbiome RNA-seq data

Time estimation: 5 hours

Learning Objectives
  • Perform metagenomics read mapping against mobile genetic element database.
  • Evaluate the distribution of mapping scores to identify high-quality alignments.
  • Evaluate plasmid coverage to determine effective filtering thresholds.
  • Filter alignments based on plasmid coverage and mapping quality.
  • Justify the filtering thresholds chosen for identifying plasmid sequences.
  • Generate a curated table of plasmid sequences and convert it into a FASTA file for further analysis.
  • Use tools to process sequences, ensuring data is sorted, deduplicated, and formatted correctly.
  • Annotate features on the identified plasmids using mobile genetic element database annotations.
  • Construct a final annotated dataset integrating genetic element information for downstream applications.
  • Check quality reports generated by FastQC and NanoPlot for metagenomics Nanopore data
  • Preprocess the sequencing data to remove adapters, poor quality base content and host/contaminating reads
  • Perform taxonomy profiling indicating and visualizing up to species level in the samples
  • Identify pathogens based on the found virulence factor gene products via assembly, identify strains and indicate all antimicrobial resistance genes in samples
  • Identify pathogens via SNP calling and build the consensus gemone of the samples
  • Relate all samples' pathogenic genes for tracking pathogens via phylogenetic trees and heatmaps
Lesson Slides Hands-on Recordings
Query an annotated mobile genetic element database to identify and annotate genetic elements (e.g. plasmids) in metagenomics data
Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition

Editorial Board

This material is reviewed by our Editorial Board:

orcid logoBérénice Batut avatar Bérénice Batut

Funding

These individuals or organisations provided funding support for the development of this resource