Training

Training

Training is offered on a dedicated training day before the conference and on the first two days of the conference.

Training topics were nominated and then selected by the community. Please take a few minutes to review the topics listed below, before registering for the conference.

Topics

Getting started in Git using GitHub Desktop

When

Training Day: Saturday, July 18, Session 1, 9-11:30am

Instructors

Yo Yehuedi, University of Cambridge

Description

Git doesn't need to be tricky, and you don't need to use a terminal to do it. In a 2.5 hour session, we will talk over the basics of version control covering:

  • why version control is useful,
  • how to create your first git repository,
  • the basics of markdown,
  • what a pull request is,
  • and why open source is important in science.

Instead of focusing on code in a specific programming langauge, will instead focus on a common neutral ground - markdown - which will also give participants the ability to create their own personal or lab website on GitHub Pages.

Prerequisites

  • A laptop capable of running GitHub desktop (e.g. a linux, mac, or windows laptop, but not a chromebook or tablet).

Introduction to Using Galaxy

When

Training Day: Saturday, July 18, Session 1, 9-11:30am

Instructors

TBD, Galaxy Project

Description

This workshop will introduce the Galaxy user interface and how it can be used for reproducible data analysis. We will cover the basic features of Galaxy, including where to find tools, how to import and use your data, and an introduction to workflows. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites

  • Little or no experience using Galaxy
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

How to use Reactome data, tools and web services

When

Training Day: Saturday, July 18, Session 1, 9-11:30am

Instructors

Robin Haw, Ontario Institute for Cancer Research (OICR)

Description

Reactome stakeholders span the informatics, clinical and basic research communities, and present us with a broad set of user requirements, from casual browsing of online pathway information to network analysis and modeling. During the BCC2020 training session, we will introduce the Reactome graph database, web site, web services, Docker image, and downloadable data sets. We will demonstrate how Reactome is useful to bioinformaticians and data integrators who are interested in finding, organizing, and utilizing biological information to support data visualization, integration and analysis. We will address the following:

  • Different use cases for using the web portal (analysis tool, curated content, content service, download files).
  • What data/bioinformatics questions Reactome can help answer.
  • How to use Reactome’s Content Service and Analysis Service web interfaces and APIs.
  • How to do basic queries using Reactome’s Graph Database (Neo4J and Cypher).

Prerequisites

  • A wi-fi enabled laptop with a modern web browser.
  • Basic knowledge of how to navigate a system and run commands from the command line (curl, grep, etc…)
  • A robust text editor and web browser.
  • Optional: A laptop capable of running Docker installation instructions.

Getting started with reproducible analysis using Dockstore

When

Training Day: Saturday, July 18, Session 1, 9-11:30am

Instructors

Denis Yuen, Ontario Institute for Cancer Research; Louise Cabansay, University of California Santa Cruz; Andrew Duncan, Ontario Institute for Cancer Research

Description

This will be a hands-on workshop to train a beginner on the fundamental technologies used to create portable and reproducible workflows. Attendees will learn how to use Docker for packaging software into containers, how to write analytical workflows in a descriptor language (CWL, WDL, or Nextflow), and how to publish these workflows on Dockstore for sharing with others. We will cover basic Dockstore features such as running workflows using the Dockstore command-line interface and end with an overview of more advanced topics like best practices for workflows, publishing using DOIs, and sharing collections of workflows through organizations.

Prerequisites

  • Basic command line and scripting knowledge
  • A wi-fi enabled laptop with a modern web browser.

Introduction to Galaxy Administration

When

Training Day: Saturday, July 18, Sessions 1, 2 & 3 , 9am-6pm

Instructors

Helena Rasche, Independent; Simon Gladman, University of Melbourne; Martin Cech, Penn State University; Nicola Soranzo, Earlham Institute

Description

After attending this three-session workshop you will be able to set up, configure, and administer a fairly polished Galaxy instance. Topics include:

  • deployment and platform options
  • using Ansible to install and configure your own server
  • customizing and extending your instance
  • defining and importing genomes, running data managers
  • upgrading to a new Galaxy release
  • configure nginX webserver with Galaxy
  • database overview and best practices
  • running tools in containers
  • users and groups and quotas
  • storage management and using heterogeneous storage services
  • exploring the Galaxy job configuration file
  • connecting Galaxy to compute clusters
  • polishing Galaxy on uWSGI application server
  • instance monitoring using Grafana
  • shared data management with CVMFS
  • when things go wrong: Galaxy server troubleshooting tips & examples

Prerequisites

  • Knowledge and comfort with the Unix/Linux command line interface and a text editor: If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle

R / Bioconductor in the Cloud

When

Training Day: Saturday, July 18, Sessions 2 & 3, 12:30-6pm

Instructors

Martin Morgan, Roswell Park Comprehensive Cancer Center; Nitesh Turaga, Roswell Park Comprehensive Cancer Center; Lori Shepherd, Roswell Park Comprehensive Cancer Center

Description

Bioconductor provides more than 1800 R packages for the analysis and comprehension of high-throughput genomic data. Most users install and run Bioconductor on a personal computer or perhaps use an academic cluster. Cloud-based solutions are increasing appealing, removing the headaches of local installation while providing access to (a) better, scalable computing resources; and (b) large-scale 'consortium' and other reference data sets. This session introduces the AnVIL cloud computing environment. We cover use of the cloud as

  • a replacement to desktop-style computing;
  • integrating workflows for 'upstream' processing of large data resources with interactive 'downstream' analysis and comprehension, using Human Cell Atlas single-cell datasets as an example; and
  • querying cloud-based consortium data for integration with a users' own data sets.

Prerequisites

  • Participants should be comfortable working with R and RStudio.
  • Some familiarity with Bioconductor is helpful but not required.
  • No prior cloud-based experience is necessary.
  • A wifi enabled laptop with RStudio installed.

From stars to constellations: Scaling analyses in Galaxy

When

Training Day: Saturday, July 18, Session 2, 12:30-3pm

Instructors

Helena Rasche, Independent; Marius van den Beek, Penn State University; Saskia Hiltemann, Erasmus MC

Description

Everyone knows how to do their analysis on a single dataset, but now it’s the Big Data era and data is pouring in faster than you can process it! We will show you how to manage importing hundreds and thousands of samples, processing these in batch, and scaling analyses to hundreds and thousands of datasets with complex experimental designs. You’ll learn about the new rule based uploader and how to attach metadata to datasets in bulk, management of sample collections in workflows, and scaling your processing to meet your demands.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Handling integrated biological data using Python, Jupyter, and InterMine

When

Training Day: Saturday, July 18, Session 2, 12:30-3pm

Instructors

Yo Yehuedi, university of Cambridge

Description

This tutorial will guide you through loading and analyzing integrated biological data (generally genomic or proteomic data) using InterMine, either via UI or via an API in Python. Topics covered will include automatically generating code to perform queries, customising the code to meet your needs, and automated analysis of sets, e.g gene sets, including enrichment statistics. Skills gained can be re-used in any of the dozens of InterMines available, covering a broad range of organisms and dedicated purposes, from model organisms to plants, drug targets, and mitochondrial DNA.

Users will also learn how to import and export their gene and protein lists to and from Jupyter notebooks hosted on https://jupyter.org/.

Prerequisites

  • A wi-fi enabled laptop with a modern web browser.

Scaling genomic analysis with Glow and Apache Spark

When

Training Day: Saturday, July 18, Session 2, 12:30-3pm

Instructors

Henry Davidge, Databricks

Description

How to use Glow to:

  • Read/write genomic file formats
  • Write results to Delta tables
  • Perform quality control and GWAS analyses
  • Parallelize command line tools
  • Serve results to shiny apps or BI tools

Glow makes genomic data work with Apache Spark, the leading engine for working with large structured datasets. It fits natively into the ecosystem of tools that have enabled thousands of organizations to scale their workflows to petabytes of data. Glow bridges the gap between bioinformatics and the Spark ecosystem by working with datasets in common file formats like VCF, BGEN, and Plink as well as high-performance big data standards. You can write queries using the native Spark SQL APIs in Python, SQL, R, Java, and Scala. The same APIs allow you to bring your genomic data together with other datasets such as electronic health records, real world evidence, and medical images. Glow makes it easy to parallelize existing tools and libraries implemented as command line tools or Pandas functions.

Prerequisites

  • Basic Python
  • Some exposure to Spark useful but not necessary

Real-time, collaborative genome assembly and annotation with Galaxy, Prokka, and Apollo

When

Training Day: Saturday, July 18, Session 3, 3:30-6pm

Instructors

Helena Rasche, Independent; Anthony Bretaudeau, INRAE; Nathan Dunn, Lawrence Berkeley National Laboratory; Simon Gladman, University of Melbourne

Description

To demonstrate collaborative assembly and annotation, we will assemble a small bacteria genome in Galaxy, annotate it running Prokka, and publish the result to Apollo where collaborative structural annotation is done. Choosing a few completed structural annotations from Apollo, we will guide students on pushing those annotations into Galaxy to do functional analyses, visualising the final result in Apollo.

This will be done on existing public infrastructure for free, demonstrating how small groups can easily get started with genome annotation, allowing them to focus on the bioinformatics. For larger groups we will demonstrate some examples of how you can reproduce our infrastructure.

Prerequisites

  • A Wi-Fi-enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Building communities with open source + open science

When

Training Day: Saturday, July 18, Session 3, 3:30-6pm, and Meeting Day 2: Monday, July 20. 1:50-4:10

Instructors

Yo Yehuidi, University of Cambridge; Malvika Sharan, The Turing Way

Description

Many journals require that scientific / research code to be open source in order to be published, but simply sharing source code alone isn’t usually enough to draw in new users and contributors. This session will teach researchers and coders the basics of how to make their open source scientific code repositories inclusive and welcoming to contributors. Experienced community managers are also welcome to attend and help pass their knowledge on to others. This session will be run by the Open Life Science team, who collectively have experience working openly, mentoring, and training others in open practice.

Prerequisites

  • An interest in open science
  • A wi-fi enabled laptop

Import, handle, visualize and analyze biodiversity data in Galaxy

When

Training Day: Saturday, July 18, Session 3, 3:30-6pm

Instructors

Yvan Le Bras, Muséum National d’Histoire Naturelle

Description

This Ecology-focused session will introduce using Galaxy to import (from external sources as GBIF, iNaturalist, Atlas of Living Australia or Zenodo repositories), handle (filter, rename fields, search/replace text patterns), visualize (stacked histograms) and analyze (calculate species abundance, phenology and trends) biodiversity data.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Embedding JBrowse 2 in your website

When

Meeting Day 1: Sunday, July 19, 1:40-4:10pm

Instructors

Robert Buels, University of California Berkeley; Colin DieshUniversity of California Berkeley; Scott Cain, Ontario Institute for Cancer Research

Description

JBrowse is the next generation of JBrowse genome browsers, with an all-new pluggable technology platform based on React, mobx-state-tree, and web workers. The embedded version, JBrowse 2 Embedded, is designed to be self-contained and easily embedded in any website without requiring any iframes or CSS hacking and without requiring any specific JavaScript frameworks.

We will show you several ways of embedding JBrowse 2 Embedded in your web-based tool or website.

Prerequisites

  • A wi-fi enabled laptop with a modern web browser.

Galaxy Code Architecture

When

Meeting Day 1: Sunday, July 19, 1:40-4:10pm

Instructors

John Chilton , Nate Coraor ,

Description

How is the Galaxy code structured? What do the various other projects related to Galaxy do? What happens when I start Galaxy?

Please join us to explore various aspects of the Galaxy codebase, understand the various top-level files and modules in Galaxy, understand how dependencies work in Galaxy's frontend and backend, and a whole lot more.

Prerequisites

  • A Wi-Fi-enabled laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

How to do reproducible analyses in the cloud

When

Meeting Day 1: Sunday, July 19, 1:40-4:10pm

Instructors

Louise Cabansay, University of California Santa Cruz; Beth Sheets, University of California Santa Cruz

Description

This training will lead users through the steps of performing reproducible analysis at scale in the cloud. Attendees will learn how to find workflows on Dockstore and how to export them to Terra’s interoperable cloud compute platform. We will give a brief tutorial of the Terra platform, show users how to find and import datasets, and walk through an example use case for genomic analysis. Along the way we’ll give you tips and tricks for scaling analyses on the Terra environment and introduce some of the more advanced features like using Jupyter Notebooks for producing and exploring results.

Prerequisites

  • Preferably, a Google account set up with Terra, instructions will be provided ahead of time.
  • A wi-fi enabled laptop with a modern web browser.

RNA-Seq analysis with AskOmics Interactive Tool

When

Meeting Day 1: Sunday, July 19, 1:40-4:10pm

Instructors

Anthony Bretaudeau, INRAE

Description

AskOmics is a web software for data integration and query using the Semantic Web technologies. It helps users to convert multiple data sources (CSV/TSV files, GFF and BED annotation) into RDF triples, and perform complex queries against this files, but also on distant SPARQL endpoint. AskOmics provide a user-frendly interface to build the queries so users don't have to learn the SPARQL language.

AskOmics comes useful for cross-referencing results datasets with various reference data. For example, in RNA-Seq studies, we often need to filter the results on the fold change and the p-value, to get the most significant deferentially expressed genes. These genes often need to be linked on the reference genome to obtain more information about their location. Then, we may need to determine if these genes are part of a QTL associated with a phenotype of interest. Finally, we can have access to distant endpoints to get disease linked to our genes, or publications.

AskOmics offers a solution to 1) automatically convert the multiple data formats to RDF and 2) use a user-friendly interface to perform complex SPARQL queries on the RDF datasets to find the genes you are interested in and 3) cross-reference local datasets with distant databases (NeXtProt for example).

During this training session, we will use the Galaxy AskOmics Interactive Tool to integrate galaxy datasets into an AskOmics instance. Then we will perfom complex queries against this data and a distant SPARQL endpoint NeXtProt to answer a biological questions.

Prerequisites

  • Basic knowledge about RNA-seq
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Train the Galaxy Trainer

When

Meeting Day 1: Sunday, July 19, 1:40-4:10pm

Instructors

Helena Rasche, Independent; Simon Gladman, University of Melbourne; Saskia Hiltemann, Erasmus Medical Center; Delphine Lariviere, Penn State University

Description

This workshop will introduce:

  • using Galaxy as a training tool
  • Determining aim and audience
    • e.g. single topic; string of related topics;
    • e.g. response to specific request for training; or general upskilling people in Galaxy bioinformatics
  • setting up appropriate infrastructure
    • usegalaxy.* resources
    • TIaaS
    • Your own
  • The available materials
    • GTN tutorials
    • and/or write your own; including how to contribute it to GTN
    • Customising materials for your needs (Slides, language etc.)
  • Distributed workshops
    • In practice
    • Local facilitators vs lead trainers
    • Using Zoom / Skype / other video conferencing software
  • Practise setting up your own workshop?
    • eg. choose a topic from GTN
    • check that it runs on Galaxy server of choice
    • time it // modify if need be (e.g. cut down data set more?)
    • create schedule, eg google doc → publish → tinyurl
  • Getting good feedback!

Prerequisites

  • An interest in bioinformatics training and Galaxy

How to write a JBrowse 2 plugin

When

Meeting Day 2: Monday, July 20, 1:40-4:10pm

Instructors

Robert Buels, University of California Berkeley; Colin DieshUniversity of California Berkeley; Scott Cain, Ontario Institute for Cancer Research

Description

JBrowse 2 is a new genome browser that is built using ReactJS. It has new features for structural variant visualization and comparative genomics with things like split views, synteny views, Circos views and more. We will demonstrate how to setup JBrowse 2 and show how new plugins can create custom views, custom tracks, or custom data adapters in JBrowse 2.

Prerequisites

  • A wi-fi enabled laptop with a modern web browser.
  • Experience with Javascript a plus but not necessary.

Scripting Galaxy with BioBlend

When

Meeting Day 2: Monday, July 20, 1:40-4:10pm

Instructors

Dannon Baker, Johns Hopkins University; Marius van den Beek, Penn State Uniersity; Nicola Soranzo, Earlham Instutute

Description

Galaxy has an always-growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover various approaches to access the API, in particular using the BioBlend Python library.

Prerequisites

  • Basic understanding of Galaxy from a developer point of view.
  • Python programming.
  • A Wi-Fi-enabled laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Reference data with CVMFS and remote jobs with Pulsar

When

Meeting Day 2: Monday, July 20, 1:40-4:10pm

Instructors

Helena Rasche, Independent; Simon Gladman, University of Melbourne; Gianmauro Cuccuru, University of Freiburg; Nate Coraor, Penn State University

Description

Learn to use CVMFS for easy access to ready-to-go terabytes of reference data in Galaxy. Then find out how to send jobs to the ends of the universe with Pulsar!

Prerequisites

  • Basic understanding of Galaxy from a developer point of view.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox, and Safari will work best

Getting your hands on Climate data

When

Meeting Day 2: Monday, July 20, 1:40-4:10pm

Instructors

Anne Fouilloux, University of Oslo

Description

Training on accessing and analyzing climate data in Galaxy. During this session you will understand how to use climate data for developing a simple adaptation case study using Galaxy Climate workbench. We will first explain the difference between climate and weather data; show how to visualize climate data on a map with Galaxy and then how to create a simple workflow for framing a very simple adaptation case study.

Prerequisites

  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best