Training

Please vote on the training topics you would like to see at BCC2020.

Training will be offered

  • Saturday, July 18: Dedicated training day, with multiple sessions and concurrent tracks. Has a separate registration.
  • Sunday-Monday, July 19-21: There will be one training session with concurrent topics each day, during the first two days of the meeting.

Training topics have been nominated by the community. Please take a few minutes to review the topics listed below, and then vote on the topics you would like to see offered at BCC2020.

The nominations are split into three sections:

The distinction betweeen Tools and Platforms and Domains is a bit fuzzy. Be sure to check both.

Nominated Topics

Tools and Platforms
Getting started in Git using GitHub Desktop

Git doesn't need to be tricky, and you don't need to use a terminal to do it. In a 2.5 hour session, we will talk over the basics of version control covering:

  • why version control is useful,
  • how to create your first git repository,
  • the basics of markdown,
  • what a pull request is,
  • and why open source is important in science.

Instead of focusing on code in a specific programming langauge, will instead focus on a common neutral ground - markdown - which will also give participants the ability to create their own personal or lab website on GitHub Pages.

Prerequisites

  • A laptop capable of running GitHub desktop (e.g. a linux, mac, or windows laptop, but not a chromebook or tablet).

Duration

1 session (2.5 hours)

R / Bioconductor in the Cloud

Bioconductor provides more than 1800 R packages for the analysis and comprehension of high-throughput genomic data. Most users install and run Bioconductor on a personal computer or perhaps use an academic cluster. Cloud-based solutions are increasing appealing, removing the headaches of local installation while providing access to (a) better, scalable computing resources; and (b) large-scale 'consortium' and other reference data sets. This session introduces the AnVIL cloud computing environment. We cover use of the cloud as

  • a replacement to desktop-style computing;
  • integrating workflows for 'upstream' processing of large data resources with interactive 'downstream' analysis and comprehension, using Human Cell Atlas single-cell datasets as an example; and
  • querying cloud-based consortium data for integration with a users' own data sets.

Prerequisites

  • Participants should be comfortable working with R and RStudio.
  • Some familiarity with Bioconductor is helpful but not required.
  • No prior cloud-based experience is necessary.
  • A wifi enabled laptop with RStudio installed.

Duration

2 sessions (5 hours)

Introduction to Using Galaxy

This workshop will introduce the Galaxy user interface and how it can be used for reproducible data analysis. We will cover the basic features of Galaxy, including where to find tools, how to import and use your data, and an introduction to workflows. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites

  • Little or no experience using Galaxy
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

1 session (2.5 hours)

From stars to constellations: Scaling analyses in Galaxy

Everyone knows how to do their analysis on a single dataset, but now it’s the Big Data era and data is pouring in faster than you can process it! We will show you how to manage importing hundreds and thousands of samples, processing these in batch, and scaling analyses to hundreds and thousands of datasets with complex experimental designs. You’ll learn about the new rule based uploader and how to attach metadata to datasets in bulk, management of sample collections in workflows, and scaling your processing to meet your demands.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

1 session (2.5 hours)

How to use Reactome data, tools and web services

Reactome stakeholders span the informatics, clinical and basic research communities, and present us with a broad set of user requirements, from casual browsing of online pathway information to network analysis and modeling. During the BCC2020 training session, we will introduce the Reactome graph database, web site, web services, Docker image, and downloadable data sets. We will demonstrate how Reactome is useful to bioinformaticians and data integrators who are interested in finding, organizing, and utilizing biological information to support data visualization, integration and analysis. We will address the following:

  • Different use cases for using the web portal (analysis tool, curated content, content service, download files).
  • What data/bioinformatics questions Reactome can help answer.
  • How to use Reactome’s Content Service and Analysis Service web interfaces and APIs.
  • How to do basic queries using Reactome’s Graph Database (Neo4J and Cypher).

Prerequisites

  • A wi-fi enabled laptop with a modern web browser.
  • Basic knowledge of how to navigate a system and run commands from the command line (curl, grep, etc…)
  • A robust text editor and web browser.
  • Optional: A laptop capable of running Docker installation instructions.

Duration

1 session (2.5 hours)

Handling integrated biological data using Python, Jupyter, and InterMine

This tutorial will guide you through loading and analyzing integrated biological data (generally genomic or proteomic data) using InterMine, either via UI or via an API in Python. Topics covered will include automatically generating code to perform queries, customising the code to meet your needs, and automated analysis of sets, e.g gene sets, including enrichment statistics. Skills gained can be re-used in any of the dozens of InterMines available, covering a broad range of organisms and dedicated purposes, from model organisms to plants, drug targets, and mitochondrial DNA.

Users will also be shown how to import and export their gene and protein lists to and from Jupyter notebooks hosted on https://jupyter.org/.

Prerequisites

  • A wi-fi enabled laptop with a modern web browser.

Duration

1 session (2.5 hours)

Introduction to Galaxy Administration

After attending this three-session workshop you will be able to set up, configure, and administer a fairly polished Galaxy instance. Topics include:

  • deployment and platform options
  • using Ansible to install and configure your own server
  • customizing and extending your instance
  • defining and importing genomes, running data managers
  • upgrading to a new Galaxy release
  • configure nginX webserver with Galaxy
  • database overview and best practices
  • running tools in containers
  • users and groups and quotas
  • storage management and using heterogeneous storage services
  • exploring the Galaxy job configuration file
  • connecting Galaxy to compute clusters
  • polishing Galaxy on uWSGI application server
  • instance monitoring using Grafana
  • shared data management with CVMFS
  • when things go wrong: Galaxy server troubleshooting tips & examples

Prerequisites

  • Knowledge and comfort with the Unix/Linux command line interface and a text editor: If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle

Duration

3 sessions (7.5 hours)

Scaling genomic analysis with Glow and Apache Spark

How to use Glow to:

  • Read/write genomic file formats
  • Write results to Delta tables
  • Perform quality control and GWAS analyses
  • Parallelize command line tools
  • Serve results to shiny apps or BI tools

Glow makes genomic data work with Apache Spark, the leading engine for working with large structured datasets. It fits natively into the ecosystem of tools that have enabled thousands of organizations to scale their workflows to petabytes of data. Glow bridges the gap between bioinformatics and the Spark ecosystem by working with datasets in common file formats like VCF, BGEN, and Plink as well as high-performance big data standards. You can write queries using the native Spark SQL APIs in Python, SQL, R, Java, and Scala. The same APIs allow you to bring your genomic data together with other datasets such as electronic health records, real world evidence, and medical images. Glow makes it easy to parallelize existing tools and libraries implemented as command line tools or Pandas functions.

Prerequisites

  • Basic Python
  • Some exposure to Spark useful but not necessary

Duration

1 session (2.5 hours)

Embedding JBrowse 2 in your website

JBrowse is the next generation of JBrowse genome browsers, with an all-new pluggable technology platform based on React, mobx-state-tree, and web workers. The embedded version, "JBrowse 2 Embedded", is designed to be self-contained and easily embedded in any website without requiring any iframes or CSS hacking and without requiring any specific JavaScript frameworks.

We will show you several ways of embedding JBrowse 2 Embedded in your web-based tool or website.

Prerequisites

  • A wi-fi enabled laptop with a modern web browser.

Duration

1 session (2.5 hours)

How to write a JBrowse 2 plugin

JBrowse 2 is a new genome browser that is built using ReactJS. It has new features for structural variant visualization and comparative genomics with things like split views, synteny views, Circos views and more. We will demonstrate how to setup JBrowse 2 and show how new plugins can create custom views, custom tracks, or custom data adapters in JBrowse 2.

Prerequisites

  • A wi-fi enabled laptop with a modern web browser.
  • Experience with Javascript a plus but not necessary.

Duration

1 session (2.5 hours)

Enabling GA4GH htsget API server

htsget API is a protocol for securely streaming read and variant genomic data. Benefits it offers over file transfer are parallel access and smaller transfers, as users only download subsections of the genome they are interested in. Retrieval with htsget is built into htslib and samtools, and it is in use at large genomic dataset providers such as the European Genome-phenome archive, CanDIG, Genomics England, and the Australian Genomics Health Alliance.

This session will consist of a talk introducing htsget followed by a practical session in which participants will learn how to setup a htsget server on a collection of BAM/CRAM files, run a variant calling pipeline accessing data using htsget over these files, and then setting up htsget server over the resultant VCF files.

Prerequisites

  • A wi-fi enabled laptop
  • Ability to use command line tools

Duration

1 session (2.5 hours)

Galaxy Code Architecture

How is the Galaxy code structured? What do the various other projects related to Galaxy do? What happens when I start Galaxy?

Please join us to explore various aspects of the Galaxy codebase, understand the various top-level files and modules in Galaxy, understand how dependencies work in Galaxy's frontend and backend, and a whole lot more.

Prerequisites

  • A Wi-Fi-enabled laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Duration

1 session (2.5 hours)

Scripting Galaxy with BioBlend

Galaxy has an always-growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover various approaches to access the API, in particular using the BioBlend Python library.

Prerequisites

  • Basic understanding of Galaxy from a developer point of view.
  • Python programming.
  • A Wi-Fi-enabled laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Duration

1 session (2.5 hours)

Reference data with CVMFS and remote jobs with Pulsar

Learn to use CVMFS for easy access to ready-to-go terabytes of reference data in Galaxy. Then find out how to send jobs to the ends of the universe with Pulsar!

Prerequisites

  • Basic understanding of Galaxy from a developer point of view.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox, and Safari will work best

Duration

1 session (2.5 hours)

Getting started with reproducible analysis using Dockstore

This will be a hands-on workshop to train a beginner on the fundamental technologies used to create portable and reproducible workflows. Attendees will learn how to use Docker for packaging software into containers, how to write analytical workflows in a descriptor language (CWL, WDL, or Nextflow), and how to publish these workflows on Dockstore for sharing with others. We will cover basic Dockstore features such as running workflows using the Dockstore command-line interface and end with an overview of more advanced topics like best practices for workflows, publishing using DOIs, and sharing collections of workflows through organizations.

Prerequisites

  • Basic command line and scripting knowledge
  • A wi-fi enabled laptop with a modern web browser.

Duration

1 sessions (2.5 hours)

How to do reproducible analyses in the cloud

This training will lead users through the steps of performing reproducible analysis at scale in the cloud. Attendees will learn how to find workflows on Dockstore and how to export them to Terra’s interoperable cloud compute platform. We will give a brief tutorial of the Terra platform, show users how to find and import datasets, and walk through an example use case for genomic analysis. Along the way we’ll give you tips and tricks for scaling analyses on the Terra environment and introduce some of the more advanced features like using Jupyter Notebooks for producing and exploring results.

Prerequisites

  • Preferably, a Google account set up with Terra, instructions will be provided ahead of time.
  • A wi-fi enabled laptop with a modern web browser.

Duration

1 session (2.5 hours)

Domains
Epigenetics Analysis with Galaxy

This workshop showcases the powerful capabilities of Galaxy to handle NGS data. We will perform a complete workflow, from fastq files to publication-ready figures, using an ATAC-seq dataset. Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq) is a method to investigate the accessibility of chromatin and thus a method to determine regulatory mechanisms of gene expression. The method helps identify accessible promoter regions and potential enhancers.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • familiarity with NGS and genomic data.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

1 session (2.5 hours)

Import, handle, visualize and analyze biodiversity data in Galaxy

This Ecology-focused session will introduce using Galaxy to import (from external sources as GBIF, iNaturalist, Atlas of Living Australia or Zenodo repositories), handle (filter, rename fields, search/replace text patterns), visualize (stacked histograms) and analyze (calculate species abundance, phenology and trends) biodiversity data.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

1 session (2.5 hours)

LC-MS/MS metabolomics data analysis with Galaxy

Metabolomics data analysis is a complex, multistep process, which is constantly evolving with the development of new analytical technologies, mathematical methods, and bioinformatics tools and databases. The Workflow4Metabolomics (W4M) project aims to develop full LC/MS, GC/MS, FIA/MS and NMR pipelines using Galaxy framework for data analysis including preprocessing, normalization, quality control, statistical analysis and annotation steps. This workshop will introduce W4M and how to use it for metabolomics data analysis.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

2 sessions (5 hours)

Real-time, collaborative genome assembly and annotation with Galaxy, Maker, and Apollo

To demonstrate collaborative assembly and annotation, we will assemble a small bacteria genome in Galaxy, annotate it running Prokka, and publish the result to Apollo where collaborative structural annotation is done. Choosing a few completed structural annotations from Apollo, we will guide students on pushing those annotations into Galaxy to do functional analyses, visualising the final result in Apollo.

We will then demonstrate how the same system can be used for eukaryotic genomes with an example of an annotation performed using MAKER.

This will be done on existing public infrastructure for free, demonstrating how small groups can easily get started with genome annotation, allowing them to focus on the bioinformatics. For larger groups we will show some examples of how you can reproduce our infrastructure and pipelines.

Prerequisites

  • A Wi-Fi-enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Duration

2 sessions (5 hours)

RNA-Seq analysis with AskOmics Interactive Tool

AskOmics is a web software for data integration and query using the Semantic Web technologies. It helps users to convert multiple data sources (CSV/TSV files, GFF and BED annotation) into RDF triples, and perform complex queries against this files, but also on distant SPARQL endpoint. AskOmics provide a user-frendly interface to build the queries so users don't have to learn the SPARQL language.

AskOmics comes useful for cross-referencing results datasets with various reference data. For example, in RNA-Seq studies, we often need to filter the results on the fold change and the p-value, to get the most significant deferentially expressed genes. These genes often need to be linked on the reference genome to obtain more information about their location. Then, we may need to determine if these genes are part of a QTL associated with a phenotype of interest. Finally, we can have access to distant endpoints to get disease linked to our genes, or publications.

AskOmics offers a solution to 1) automatically convert the multiple data formats to RDF and 2) use a user-friendly interface to perform complex SPARQL queries on the RDF datasets to find the genes you are interested in and 3) cross-reference local datasets with distant databases (NeXtProt for example).

During this training session, we will use the Galaxy AskOmics Interactive Tool to integrate galaxy datasets into an AskOmics instance. Then we will perfom complex queries against this data and a distant SPARQL endpoint NeXtProt to answer a biological questions.

Prerequisites

  • Basic knowledge about RNA-seq
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

1 session (2.5 hours)

Processing of Single Cell RNA-Seq Data with Galaxy

Participants will learn about the processing, mapping and quantification of single cell RNA-Seq data generated using the 10x Genomics platform.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • familiarity with bulk RNA-Seq.
  • a wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

1 session (2.5 hours)

Clustering of Single Cell RNA-seq Data with Galaxy

Participants will learn about clustering and annotation of single-cell data generated using the 10x Genomics platform using the ScanPy downstream analysis workflow.

Prerequisites

  • Processing of Single Cell RNA-Seq Data or equivalent experience
  • Introduction to Using Galaxy or equivalent experience
  • familiarity with bulk RNA-Seq.
  • a wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

1 session (2.5 hours)

Getting your hands on Climate data

Training on accessing and analyzing climate data in Galaxy. During this session you will understand how to use climate data for developing a simple adaptation case study using Galaxy Climate workbench. We will first explain the difference between climate and weather data; show how to visualize climate data on a map with Galaxy and then how to create a simple workflow for framing a very simple adaptation case study.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Duration

1 session (2.5 hours)

Community
Building communities with open source + open science

Many journals require that scientific / research code to be open source in order to be published, but simply sharing source code alone isn’t usually enough to draw in new users and contributors. This session will teach researchers and coders the basics of how to make their open source scientific code repositories inclusive and welcoming to contributors. Experienced community managers are also welcome to attend and help pass their knowledge on to others. This session will be run by the Open Life Science team, who collectively have experience working openly, mentoring, and training others in open practice.

Prerquisites

  • An interest in open science
  • A wi-fi enabled laptop

Duration

1 session (2.5 hours)

Train the Galaxy Trainer

This workshop will introduce:

  • using Galaxy as a training tool
  • Determining aim and audience
    • e.g. single topic; string of related topics;
    • e.g. response to specific request for training; or general upskilling people in Galaxy bioinformatics
  • setting up appropriate infrastructure
    • usegalaxy.* resources
    • TIaaS
    • Your own
  • The available materials
    • GTN tutorials
    • and/or write your own; including how to contribute it to GTN
    • Customising materials for your needs (Slides, language etc.)
  • Distributed workshops
    • In practice
    • Local facilitators vs lead trainers
    • Using Zoom / Skype / other video conferencing software
  • Practise setting up your own workshop?
    • eg. choose a topic from GTN
    • check that it runs on Galaxy server of choice
    • time it // modify if need be (e.g. cut down data set more?)
    • create schedule, eg google doc → publish → tinyurl
  • Getting good feedback!

Prerquisites

  • An interest in bioinformatics training and Galaxy

Duration

1 session (2.5 hours)