Skip to content

Latest commit

 

History

History
331 lines (229 loc) · 16.3 KB

introduction_galaxy.md

File metadata and controls

331 lines (229 loc) · 16.3 KB

Galaxy for virologist training Exercise 1: Introduction to Galaxy

Title Galaxy
Training dataset: None
Questions:
  • How do I create a fasta reference for Crimea Congo?
  • How many nucleotides has each fragment of Crimea Congo genome?
Objectives:
  • Familiarize with Galaxy website
  • Understand the Galaxy's history
  • Learn how to upload data in Galaxy
  • Learn how to visualize data in Galaxy
  • Learn how to run tools in Galaxy
Estimated time: 1h 15 min

When we have to do a bioinformatic analysis using a reference genome, we need to provide just one reference file. The problem with segmented genomes, such as Crimea Congo's, is that we have one different file for each fragment in the databases. So here we are going to learn how to load the different segments of a genome in Galaxy and concatenate them in order to create a unique fasta file that can be used for further analyses. Also, we are going to learn how to count the number of sequences in a multifasta file, and the number of nucleotides in each sequence in a fasta file.

1. Galaxy website

First of all go to Galaxy Web Server in Europe and you will se a display such as this one:

Webiste

Where you have 4 different elements:

  1. The first one in yellow is the Title panel with the buttons:
    • Home (house): To go to the home page in Spanish
    • Workflows: To go to the workflow manager
    • Visualize: Displays the visualization manager and options
    • Share Data: Displays the sharing options
    • Help: Displays all the help menu available
    • Login or Register
    • Galaxy Training Materials (graduation cap): Displays de Galaxy Trainings list
    • Enable/Disable scratchbook (9 squares)
  2. The left side panel in blue with all the tools in this Galaxy mirror
  3. Central panel in red, which will let you run analyses and view outputs
  4. Right panel in green, with the history record.

Sign up/Login:

The first thing we would do is to sign up, so you can save your history. To do that, you should follow the next steps:

  1. Select Login or Register in the header panel
  2. Select Register here.
  3. Fill in the registration information. ⚠️ Use an email you can access now, because it will ask you to confirm your e-mail adress.
  4. Log into your e-mail, and verify your Galaxy account.
  5. Log in with your credentials.

Login 1

Login 2

Login 3

2. Galaxy's history

Now select the Home button and return to the home page. We are going to learn how to manage the history, which is in the right panel. To do this, we will follow these steps:

  1. Click the new-history (+) icon at the top of the history panel.
    • If the new-history is missing:
      • Click on the galaxy-gear icon (History options) on the top of the history panel
      • Select the option Create New from the menu
  2. Click once on Unnamed history which is the title of your history and type a new meaningful name for it. In our case it would be good Crimea Congo Reference Genome. Then type Enter on the keyboard and the new name will be set.

History 1History 2History 3

3. Loading data:

Now we are going to load the data. In this case we are going to use the Crimea Congo reference genome. Crimea Congo's genome is composed of 3 segments, each with its own code:

  • S segment: DQ133507
  • M segment: EU037902
  • L segment: EU044832

In order to load these fragments in Galaxy we have to follow these steps:

  1. In the left side panel, select Upload Data
  2. In the new panel select Paste/Fetch Data
  3. Then copy the following block of text:
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/S_DQ133507.fasta
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/M_EU037902.fasta
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/L_EU044832.fasta
  1. Now, in the Download data from the web by entering URLs (one per line) or directly paste content. square, paste the text you copied before
  2. Select Start
  3. When everything is green in the screen, select Close

Upload 1

Upload 2

With this, our data is loading into Galaxy. You can see that each job is given a different number, so you can keep track of the order of your jobs with it.

The jobs can have three different states:

  1. Waiting: Your jobs will have a grey color and a clock on their left side. In this state your jobs are waiting to enter in the Galaxy server.
  2. Running: Your jobs will have an orange color and rotatory dots on their left side. In this state your jobs are running in the Galaxy server.
  3. Done: Your jobs will have a green color. Your data is ready to be used.

waitingrunningDone

5. Edit and Visualize your data:

Visualization

Now we can start using our data. First of all, we are going to see how these fasta files look like. There are different ways to do this:

  1. Select the 👁️ icon in the right to the file name. For the first time, our center panel has changed, and now it displays the content inside the fasta file.

visualize_fastqvisualization

  1. Another way is to select the name of the file to see the first five lines of the file.

name_selectshort_visualization

When we display this file summary, we obtain additional options to process this file:

  • Save: Allows you to save your files locally

save

  • Copy link: copies the link of the data to your clipboard.

copy_link

  • View details: Shows a new window in the center panel with additional information about the sample.

details

data_fetch

  • Visualize this data: As we said before in the theory, in the visualization panel you have all the options of visualization allowed in Galaxy, but not all of then fit your data. With this button, you can see which visualization options are better for your type of data.

visualize

visualize_options

  • Help: Displays help about the tool used to generate the data.

help

Note: If you select again in the file name, the summary disappears

Edition

Now we are going to rename all the fasta files we uploaded to Galaxy. To do this, we have to click in the pencil icon that appears next to each file name. This will display a new central window with the different edition options for each file:

edit_name_fastq

edit_1

This screen allows you to perform different things. Starting from the right:

  • Set permissions: Allows you to manage the access and permissions of the selected file, for the different users registered.
  • Datatype: Allows you to change the datatype of the existing dataset, but not modify its contents. Use this if Galaxy has incorrectly guessed the type of your dataset.
  • Convert: Allows you to create a new dataset with the contents of this dataset, converted to a new format.
  • Change the attributes: Allows you to rename the file, and add some additional information.

⚠️ Select Save button to save the changes.

We are going to rename the files as shown here:

rename

6. Run tools

Now we are going to use the fasta files uploaded to Galaxy to run tools. To run tools we have to:

Search

  1. Search the tool in the search tab. We want to concatenate the fasta files, so we are going to search for concatenate in the bar.
  2. Select the tool we want to use. In this case Concatenate datasets tail-to-head (cat).

concatenate_tool

Run tools

When we select the tool we are going to see the tool's options in the center panel. We are going to see different information about the tool we want to run. ⚠️ These options are tool specific. This means each tool has its own options.

  1. Tool name, version and options to save and share the tool
  2. The input dataset options:
    • We can select data from the history
    • Upload data from a collection
    • Upload a dataset (the upload dataset pop up will appear)
    • Brows a dataset (you can brows dataset from the history)
  3. Insert new dataset blocks (no need in our case)
  4. Execute button
  5. Tool information:
    • ⚠️
    • What it does
    • Examples
    • Citaiton

To concatenate the samples, we will follow the wollowing steps:

  1. In Datasets to concatenate:
    • Press Ctrl key in your keyboard
    • Select the three fasta files while still pressing the Ctrl key.
  2. Press execute

select_samples

Running jobs

Once we have pressed Execute, a new central panel window will appear and our job will be in queue process:

  1. In the top of the panel (blue) you have a summary of what we've just run. In our case 3 input datasets have are involved in a single process, with a unique output.
  2. In the foot of the panel (red) you have some recommendations from Galaxy on how to process your data after the process we have just run.
  3. In the history (yellow) we have now a new entry, which is the number 4, with the results of our job. Galaxy names jobs according to the used tool and the input dataset.

job_output

Visualize results

Whenever our job is green, we can see the results by clicking in the 👁️ icon. Now we can see the three sequences for the segments, headers included, in a unique fasta file.

visualize_ref_genome

Now we are going to rename the fasta file as follows:

  1. Click on the 📝 icon
  2. Write Crimea Congo Ref Genome in the Name square
  3. Press Save

rename_ref_genome

First Question Answer

How do I create a fasta reference for fragmented Crimea Congo genome?
By concatenating the different fragments of the genome

7. Furtherly process your data

Now that we have our concatenated fasta file, we can check that everything is fine by scrolling down the genome, and checking that the three fragments are fine, or we can use another tool to count the number of sequences in a fasta file, and the number of nucleotides in each sequence.

To do this, we are going to:

  1. Search fasta in the tool square.
  2. Select Fasta Statistics Display summary statistics for a fasta file
  3. In fasta or multifasta file select multiple data set
  4. With Ctrl key pressed, select the 3 fragments and the multifasta file
  5. Press Start button.

fasta_statistics_tool

select_fasta_statistics_sample

Now we have 4 jobs running, because this tool will run one statistics process for each fasta file we selected.

fasta_statistics_output

Results visualization

Now we are going to se the statistics summary for each fasta file. To do this we have to select the 👁️ icon in each of the Fasta Statistics output.

For the S fragment, we are going to see the number of sequences inside the fasta file, and the number of nucleotides. We are going to:

  1. Select the 👁️ icon in the job with the name Fasta Statistics on data 1: Fasta summary stats
  2. See the num_bp row, which corresponds to the number of nucleotides in the fasta file, 1673 in this case.
  3. Check num_seq, corresponding to the number of sequences in the fasta file.

S_fragment_stats

Now we are going to repeat this process for the rest of the fasta files:

M fragment

How many nucleotides are in M fragment?

5364 nt

M_fragment_stats

L fragment

How many nucleotides are in L fragment?

12150 nt

L_fragment_stats

Crimea Congo Genome

How many sequences and nucleotides are in the Crimea Congo reference genome?
3 sequences (3 fragments)

19187 nt

ccongo_genome_stats

Now we can answer the second question.

Second Question Answer

How many nucleotides has each fragment of Crimea Congo genome?
1673 the S fragment
5364 the M fragment
12150 the L fragment

Share results

Now that we know that the reference genome for the whole Crimea Congo virus is done correctly, we can use it as reference genome for further analysis in this same history, or save it to use it in our computer. To do so:

  1. Select the name of the fasta you want to download: 4: Crimea Congo Ref Genome
  2. Select the Save button in the emerging panel.

save_fasta_ref

8. History management

Now, we are going to learn how to manage the history. In this case, we created a new history record and, while we were doing our analysis, the steps we followed were recorded.

This history is saved in your account so you can create a new one for a new analysis, and access previous analysis later.

  1. To create a new history, select the + button in the history panel.
  2. Then, rename your new history to: History TEST

new_histotry

Now we have a clean history, but we have lost the previous history with the Crimea Congo results. To se the previous history, we have to access the history manager:

history_magaer

Now we can check out the previous history, with all the Crimea Congo results. We are going to remove the TEST history and go back to the Crimea Congo Ref Genome history to share it.

  1. Select the dropdown icon ⚠️ be sure to select the dropdown in the history you want to delete, not in the good one.
  2. Select Delete
  3. Press Switch to in the Crimea Congo history
  4. Select the HOME icon

remove_swithc_hist

Once we are finished, we can save our history in order to access this results later, or to share them with other lab members. To do this, we are going to:

  1. Select the engine icon in the history
  2. Select Share or publish
  3. Select the option Make History accessible

engin_historyshare_history_1

share_history_2

Now everyone with the link can access the history.

Note: