-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-Learning-R.Rmd
68 lines (39 loc) · 9.39 KB
/
02-Learning-R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Learning R {#learning-r}
`R` is used by anyone in the Ryten lab looking to do bioinformatics analyses. Chances are, if you've joined to do bioinformatics you've come across or will come across using `R`. There's a steep learning curve to start programming and it's easy to become stumped before you become familiar, and miss out on all of the joy programming with `R` can give. This page is dedicated to help you get you started by pointing you to the resources that we've found useful no matter what stage of learning `R` you fall into.
## Why R?
This [page](https://adv-r.hadley.nz/introduction.html#why-r) from Hadley Wickham's book lays out what makes `R` unique and why many are fond of `R`. Below, are the 3 key reasons we think `R` is great for our bioinformatics use-case:
1. `R` is **free** and **open source**. This makes your analyses more reproducible if you use `R` and means there's a huge community of `R` users. Importantly, a large proportion of this community is addressing similar questions you are, relating to genetics/bioinformatics. [Bioconductor](https://www.bioconductor.org/), for example, is specifically aimed at developing tools applied to all things biology. This makes our lives easier as if we run into questions that we can't answer in the lab or via google, there might already be an [R package](https://bioconductor.org/packages/devel/bioc/) that addresses it and if not, there are forums to [ask for help](#asking-for-help).
2. `R` is a high-level, functional programming language. These properties make it suited for **data science** and day-to-day bioinformatics analysis, where most often the leading metric is speed from data to results, rather than developing robust software. This aspect of `R` has inspired the development of set of packages termed the [tidyverse](https://www.tidyverse.org/), which makes mastering data wrangling, analysis and plotting significantly easier.
3. `R` comes with one of the most easy-to-use integrated development environments (IDEs) [RStudio](https://rstudio.com/) and it's own language for creating reproducible analyses `Rmarkdown`. RStudio makes the entry point into getting started with `R` significantly lower with the ability to manage projects, inspect objects and version control with `git` (+ many more). Together with `Rmarkdown`, a language that allows you combine `R` code and prose in the same document, this facilitates the creation of reproducible analyses.
## R resources
Whether you've never touched a programming language before, or you're an expert in a different programming language, or you're confident with conducting analyses in `R`, we hope that you find something useful below.
### Beginner {#beginner-r}
- Not technically a resource, but one which arguably is more important. Learning `R` is typically easiest when you have a project to work on from the outset. This helps to keep you motivated, means you'll cover "real" data issues and learn how to debug. It's likely that you'll already have an application in mind if you're reading this, but if not, we'd encourage you to find one. One great place to start could be to ask Mina or other lab members whether any of us has something we need doing.
- [DataCamp](https://www.datacamp.com/?utm_source=adwords_ppc&utm_campaignid=805200711&utm_adgroupid=39268379982&utm_device=c&utm_keyword=datacamp&utm_matchtype=e&utm_network=g&utm_adpostion=&utm_creative=340731356548&utm_targetid=kwd-297372810188&utm_loc_interest_ms=&utm_loc_physical_ms=9045906&gclid=Cj0KCQiA7qP9BRCLARIsABDaZziaBxYZlLNWXXZmKSeK_0wWjAPg_JVUIy49dwAfACQMIInjumNYUCYaAr4ZEALw_wcB) and [codeacademy](https://www.codecademy.com/) host online, interactive courses to learn `R` from the basics. These have been used by other's in the Ryten Lab in the past and are a fun way to take you through the basic foundations of `R`. 3 months free access to DataCamp is included via the [GitHub Student Developer Pack](https://education.github.com/pack), which can be claimed by any UCL student.
- [learnr](https://rstudio.github.io/learnr/) is an interactive `R` package designed to get you started with `R`. As far as I'm aware, I don't think any of us has started `R` through this route, however does look to be a solid resource.
### Intermediate
- [R for data science](https://r4ds.had.co.nz/) has pretty much all you need to know in terms of the `tidyverse` set of tools.
- [R Markdown Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/) is written by one of the authors of `R Markdown`, Yihui Xie, and is a good place to find the comprehensive overview of the functionality of `R Markdown` or look up if there's something you'd like to do but aren't sure how or whether it's possible.
- [Happy Git with R](https://happygitwithr.com/) is a nicely written book on how to interface into `git` and GitHub from RStudio. If you haven't heard of or used `git` for version controlling your scripts before, this is highly recommended and has all you need to know.
### Advanced
- [Advanced R](https://adv-r.hadley.nz/) is for those wondering what goes on behind the scenes in `R`. It covers topics from how to use object orientated programming in `R`, how to run benchmarks on your code to find time-limiting steps and how to speed these up using `R`'s interface into `C++` ?
- [R packages](https://r-pkgs.org/) details the good practices for creating an `R` package. Packages are the best way to allow others to use your `R` code. You may consider packaging up your code if you want to release your functions to be used by others (either others in the lab or more widely).
- Workshops on how to make Bioconductor packages can be found [here](https://github.com/dzhang32/biocthis_workshop) and [here](https://saskiafreytag.github.io/making_bioconductor_pkg/articles/workshop.html). If you have developed a method/package associated with a publication, you may want to submit it to Bioconductor.
## Asking for help {#asking-for-help}
The world of coding can be tough at times and [admitting that fact by asking for help](https://lcolladotor.github.io/2018/11/12/asking-for-help-is-challenging-but-is-typically-worth-it/#.X6k_X1P7RhE) can be even harder. But trust us, we've all been there. The 3 hour debug before noticing the misplaced ",", the package that never installs, the frustrating `git` conflict - all too familiar. Whichever stage of your coding journey you're at, you will always run into bugs and knowing how to efficiently find solutions is a major part of the battle. Here, we list to most common places we would ask for help from the trusty Google companion to more specific forums where you will reach experts or `R` package maintainers.
### Where to ask for help?
1. Google is often the best place to start and many issues you face won't be a first and will be answered by forums such as `Stack Overflow`.
2. The Ryten lab has a Slack workspace (ask to be added if not already) and either the channels `#programming` or `#general` are great places to reach us. If that's intimidating, also feel free to message any of us directly.
3. If neither Google or our lab has provided an answer to your issue, it might be that your problem is unique or specific. For these sorts of question it's best to try and direct your query to the experts in the relevant field. For example, if your issue is targeted at a usage of an `R` package, then the issues page on the associated GitHub repository is the best place to get in touch with the maintainers of that package. Or, if it's more of a broad question (e.g. what covariates should I correct for when run a differential splicing analysis) the [Bioconductor support site](https://support.bioconductor.org/) provides a catalog of questions surrounding how to use it's packages.
### How to ask for help?
Given that the people which you are asking (as we all) are busy and have limited time, it's important to help make it as easy as possible to answer your query. Below, in order of importance, are the steps we'd recommend thinking about before reaching out:
1. If it's appropriate (i.e. not a general question) then include, at minimum, the **error message** and **command** you have run as part of your question. It's most likely, the code will be more effective at directing your query than a text based explanation.
2. If the question is about a bug or quite a specific situation, it's best if you include a minimum reproducible example (reprex). This consists of setting up a small section of code that can be used to replicate your error including a `sessionInfo()` call (that details the versions of `R` packages in your environment. The [reprex](https://reprex.tidyverse.org/reference/reprex.html) package provides a great way to capture both the code and it's output in different formats e.g. an R script, markdown or html.
## Teaching R
If you've reached a point where you're comfortable with `R`, you might consider teaching it. Here are some reasons we've taught in the past:
- It's **rewarding** as usually students are really grateful.
- It looks good on your **CV**.
- **Networking** with companies and academic groups that you may be interested in working with/for in the future.
- You can earn some extra **£££**.
- It's a great **refresher** of the basics of `R`.
The Ryten lab has a existing connection with [Clinician Coders](https://github.com/ClinicianCoders/ClinicianCoders), a group at UCL that aims at teaching `R` to clinicians. If this is of interest to you, then let Mina know and she'll put you in touch with the right lab members involved at the time.