Title | Galaxy |
---|---|
Training dataset: | None |
Questions: |
|
Objectives: |
|
Estimated time: | 1h 15 min |
When we have to do a bioinformatic analysis using a reference genome, we need to provide just one reference file. The problem with segmented genomes, such as Crimea Congo's, is that we have one different file for each fragment in the databases. So here we are going to learn how to load the different segments of a genome in Galaxy and concatenate them in order to create a unique fasta file that can be used for further analyses. Also, we are going to learn how to count the number of sequences in a multifasta file, and the number of nucleotides in each sequence in a fasta file.
First of all go to Galaxy Web Server in Europe and you will se a display such as this one:
Where you have 4 different elements:
- The first one in yellow is the Title panel with the buttons:
- Home (house): To go to the home page in Spanish
- Workflows: To go to the workflow manager
- Visualize: Displays the visualization manager and options
- Share Data: Displays the sharing options
- Help: Displays all the help menu available
- Login or Register
- Galaxy Training Materials (graduation cap): Displays de Galaxy Trainings list
- Enable/Disable scratchbook (9 squares)
- The left side panel in blue with all the tools in this Galaxy mirror
- Central panel in red, which will let you run analyses and view outputs
- Right panel in green, with the history record.
The first thing we would do is to sign up, so you can save your history. To do that, you should follow the next steps:
- Select Login or Register in the header panel
- Select Register here.
- Fill in the registration information.
⚠️ Use an email you can access now, because it will ask you to confirm your e-mail adress. - Log into your e-mail, and verify your Galaxy account.
- Log in with your credentials.
Now select the Home button and return to the home page. We are going to learn how to manage the history, which is in the right panel. To do this, we will follow these steps:
- Click the new-history (+) icon at the top of the history panel.
- If the new-history is missing:
- Click on the galaxy-gear icon (History options) on the top of the history panel
- Select the option Create New from the menu
- If the new-history is missing:
- Click once on Unnamed history which is the title of your history and type a new meaningful name for it. In our case it would be good Crimea Congo Reference Genome. Then type Enter on the keyboard and the new name will be set.
Now we are going to load the data. In this case we are going to use the Crimea Congo reference genome. Crimea Congo's genome is composed of 3 segments, each with its own code:
- S segment: DQ133507
- M segment: EU037902
- L segment: EU044832
In order to load these fragments in Galaxy we have to follow these steps:
- In the left side panel, select Upload Data
- In the new panel select Paste/Fetch Data
- Then copy the following block of text:
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/S_DQ133507.fasta
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/M_EU037902.fasta
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/L_EU044832.fasta
- Now, in the Download data from the web by entering URLs (one per line) or directly paste content. square, paste the text you copied before
- Select Start
- When everything is green in the screen, select Close
With this, our data is loading into Galaxy. You can see that each job is given a different number, so you can keep track of the order of your jobs with it.
The jobs can have three different states:
- Waiting: Your jobs will have a grey color and a clock on their left side. In this state your jobs are waiting to enter in the Galaxy server.
- Running: Your jobs will have an orange color and rotatory dots on their left side. In this state your jobs are running in the Galaxy server.
- Done: Your jobs will have a green color. Your data is ready to be used.
Now we can start using our data. First of all, we are going to see how these fasta files look like. There are different ways to do this:
- Select the 👁️ icon in the right to the file name. For the first time, our center panel has changed, and now it displays the content inside the fasta file.
- Another way is to select the name of the file to see the first five lines of the file.
When we display this file summary, we obtain additional options to process this file:
- Save: Allows you to save your files locally
- Copy link: copies the link of the data to your clipboard.
- View details: Shows a new window in the center panel with additional information about the sample.
- Visualize this data: As we said before in the theory, in the visualization panel you have all the options of visualization allowed in Galaxy, but not all of then fit your data. With this button, you can see which visualization options are better for your type of data.
- Help: Displays help about the tool used to generate the data.
Note: If you select again in the file name, the summary disappears
Now we are going to rename all the fasta files we uploaded to Galaxy. To do this, we have to click in the pencil icon that appears next to each file name. This will display a new central window with the different edition options for each file:
This screen allows you to perform different things. Starting from the right:
- Set permissions: Allows you to manage the access and permissions of the selected file, for the different users registered.
- Datatype: Allows you to change the datatype of the existing dataset, but not modify its contents. Use this if Galaxy has incorrectly guessed the type of your dataset.
- Convert: Allows you to create a new dataset with the contents of this dataset, converted to a new format.
- Change the attributes: Allows you to rename the file, and add some additional information.
We are going to rename the files as shown here:
Now we are going to use the fasta files uploaded to Galaxy to run tools. To run tools we have to:
- Search the tool in the search tab. We want to concatenate the fasta files, so we are going to search for concatenate in the bar.
- Select the tool we want to use. In this case Concatenate datasets tail-to-head (cat).
When we select the tool we are going to see the tool's options in the center panel. We are going to see different information about the tool we want to run.
- Tool name, version and options to save and share the tool
- The input dataset options:
- We can select data from the history
- Upload data from a collection
- Upload a dataset (the upload dataset pop up will appear)
- Brows a dataset (you can brows dataset from the history)
- Insert new dataset blocks (no need in our case)
- Execute button
- Tool information:
⚠️ - What it does
- Examples
- Citaiton
To concatenate the samples, we will follow the wollowing steps:
- In Datasets to concatenate:
- Press Ctrl key in your keyboard
- Select the three fasta files while still pressing the Ctrl key.
- Press execute
Once we have pressed Execute, a new central panel window will appear and our job will be in queue process:
- In the top of the panel (blue) you have a summary of what we've just run. In our case 3 input datasets have are involved in a single process, with a unique output.
- In the foot of the panel (red) you have some recommendations from Galaxy on how to process your data after the process we have just run.
- In the history (yellow) we have now a new entry, which is the number 4, with the results of our job. Galaxy names jobs according to the used tool and the input dataset.
Whenever our job is green, we can see the results by clicking in the 👁️ icon. Now we can see the three sequences for the segments, headers included, in a unique fasta file.
Now we are going to rename the fasta file as follows:
- Click on the 📝 icon
- Write Crimea Congo Ref Genome in the Name square
- Press Save
First Question Answer
How do I create a fasta reference for fragmented Crimea Congo genome?
By concatenating the different fragments of the genome
Now that we have our concatenated fasta file, we can check that everything is fine by scrolling down the genome, and checking that the three fragments are fine, or we can use another tool to count the number of sequences in a fasta file, and the number of nucleotides in each sequence.
To do this, we are going to:
- Search fasta in the tool square.
- Select Fasta Statistics Display summary statistics for a fasta file
- In fasta or multifasta file select multiple data set
- With Ctrl key pressed, select the 3 fragments and the multifasta file
- Press Start button.
Now we have 4 jobs running, because this tool will run one statistics process for each fasta file we selected.
Now we are going to se the statistics summary for each fasta file. To do this we have to select the 👁️ icon in each of the Fasta Statistics output.
For the S fragment, we are going to see the number of sequences inside the fasta file, and the number of nucleotides. We are going to:
- Select the 👁️ icon in the job with the name Fasta Statistics on data 1: Fasta summary stats
- See the num_bp row, which corresponds to the number of nucleotides in the fasta file, 1673 in this case.
- Check num_seq, corresponding to the number of sequences in the fasta file.
Now we are going to repeat this process for the rest of the fasta files:
M fragment
L fragment
Crimea Congo Genome
How many sequences and nucleotides are in the Crimea Congo reference genome?
3 sequences (3 fragments)
19187 nt
Now we can answer the second question.
Second Question Answer
How many nucleotides has each fragment of Crimea Congo genome?
1673 the S fragment
5364 the M fragment
12150 the L fragment
Now that we know that the reference genome for the whole Crimea Congo virus is done correctly, we can use it as reference genome for further analysis in this same history, or save it to use it in our computer. To do so:
- Select the name of the fasta you want to download: 4: Crimea Congo Ref Genome
- Select the Save button in the emerging panel.
Now, we are going to learn how to manage the history. In this case, we created a new history record and, while we were doing our analysis, the steps we followed were recorded.
This history is saved in your account so you can create a new one for a new analysis, and access previous analysis later.
- To create a new history, select the + button in the history panel.
- Then, rename your new history to: History TEST
Now we have a clean history, but we have lost the previous history with the Crimea Congo results. To se the previous history, we have to access the history manager:
Now we can check out the previous history, with all the Crimea Congo results. We are going to remove the TEST history and go back to the Crimea Congo Ref Genome history to share it.
- Select the dropdown icon
⚠️ be sure to select the dropdown in the history you want to delete, not in the good one. - Select Delete
- Press Switch to in the Crimea Congo history
- Select the HOME icon
Once we are finished, we can save our history in order to access this results later, or to share them with other lab members. To do this, we are going to:
- Select the engine icon in the history
- Select Share or publish
- Select the option Make History accessible
Now everyone with the link can access the history.
Note:
- This hands-on history URL: https://usegalaxy.eu/u/svarona/h/crimea-congo-reference-genome
- This hands-in workflow URL: https://usegalaxy.eu/u/svarona/w/concat-frags-reference-genome