-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding analyst tutorial markdown and jupyter notebook #143
base: main
Are you sure you want to change the base?
Conversation
… there is an onboarding dataset
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
--access-level full \ | ||
scripts/create_test_subset.py --project bioheart --samples XPG280371 XPG280389 XPG280397 XPG280405 XPG280413 --skip-ped | ||
``` | ||
**FOR REFERENCE: The above was taken from [this](https://centrepopgen.slack.com/archives/C03FA2M1MR9/p1700020527448029?thread_ts=1699935103.776929&cid=C03FA2M1MR9) Slack thread** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
referencing is good in code, but I don’t think necessary in this README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this was for personal referencing so I could keep track!
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
|
||
#### Task: Write a config file | ||
- If we are wanting to run the `large cohort` pipeline on the `bioheart-test` dataset, we will need to create a config file that is capable of doing this. | ||
- Have a go at writing your own config file capable of running the `large cohort` pipeline on `bioheart-test` up until the `Combiner` stage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The person reading this has no chance of successfully completing this; there is simply not enough information provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that asking the reader to recreate a config file from scratch might be a bit overwhelming, especially without a comprehensive guide. I'll revise this section to provide a step-by-step walkthrough of the parameters that need to be changed, emphasising that this is not exhaustive and the parameters needing to be changed will vary based on the requirements of the analysis.
#### Task: Write a config file | ||
- If we are wanting to run the `large cohort` pipeline on the `bioheart-test` dataset, we will need to create a config file that is capable of doing this. | ||
- Have a go at writing your own config file capable of running the `large cohort` pipeline on `bioheart-test` up until the `Combiner` stage. | ||
- You can use the default config file as a starting point, and then override the necessary parameters to run the pipeline on `bioheart-test`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the default config file as a starting point
I personally think the default is very confusing and missing lots of entries. Plus, most of it is not explained well. It would be very hard for the individual going through this pipeline to know what they need to keep and omit without any background
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a full list of parameters available?
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
onboarding_documentation/technical_documentation/analyst_tutorial/analyst_tutorial.md
Outdated
Show resolved
Hide resolved
|
||
Instructions on setting up a Jupyter Notebook in the cloud can be found [here](https://github.com/populationgenomics/team-docs/blob/main/notebooks.md) | ||
|
||
Please continue this tutorial once you have a Jupyter Notebook running in the cloud, good luck! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good luck!
Again, I think a task-based tutorial with very little context and supporting help doesn't seem like the best way to help a new hire
Fundamentally, I don't think production pipelines is a good place to have a tutorial for an analyst. Rather, it should be a dataset in |
For the notebook, I think your two sections (the content taken from the production-pipelines README in the first half) and the PCA example at the bottom, are confusing. They both lack an explanation of what is happening (one of the benefits of having a notebook), context, and referencing of where the material came from. |
… Also improved descriptions of tools and improved clarity
…e clarity and ease of access to newcomers
Created a tutorial for any new analysts to come on board. A couple of points:
bioheart
dataset at this stage as there is no dataset to be used by those onboarding. Obviously this should change and would be best to use publicly available genomes.Final_analyst_tutorial.ipynb
is not able to be completed because I have not included anannotations.txt
file as it references CPG ID's and did not want them visible.1kg.mt
and the tutorialsmt
. At this stage the1kg.mt
is built usingGRCh37
and our theMatrix Table
from our pipeline usesGRCh38
. If anyone has any ideas please let me know!