R is a data scientist's dream but a programmer's nightmare.
Here I'll describe the R programming principles and practices that @leobrueggeman and I will try to adhere to for this project. We are big believers in the Hadleyverse — a philosophy of R programming, data analysis, and visualization — spearheaded by Hadley Wickham. I'll describe the basics as well as our modifications below.
We follow the Hadley style guide, which builds off the older Google style guide. When calling functions from a package (any non-base function), use double colons to clarify function provenance (i.e. dplyr::filter() rather than just filter()).
The preferred storage format for tabular data is tab-separated with the .tsv function. Column names should always be included. For compression, use gzip with a .gz extension.
Tabular data in R should be in data frames. Avoid relying on row names by making a dedicated column for the attribute. options(stringsAsFactors = FALSE) is imperative but may not need to be explicitly called if reading data with readr and manipulating data with dplyr and tidyr. Create data frames using dplyr::data_frame().
Consecutive commands should be chained together using the magrittr pipe (%>%) when possible. Piping improves readability and avoids unnecessary variables cluttering the workspace.
ggplot2 is the preferred plotting package [2, 3]. Avoid the shortcut function ggplot2::qplot(). Consider cowplot or ggplot2::theme_bw() to avoid the ugly default theme. Make sure all text is readable. If text is too small to read, remove it. Exporting to vector images (pdfs and svgs) is preferred unless intensive rendering requires png.
Antoine Lizee: RStudio is getting better with every iteration and has improved significantly since this post. Many features (smart completion, tight git integration, ...) make it a better IDE today than the Jupyter notebooks in my opinion.
Conda for R
Conda is an awesome package manager that we've been using for Python. In most cases, conda alleviates the horror of installation errors and dependencies.
Now conda is available for R. In other words, you can install R using the following command:
conda install --channel r r
I found many of my favorite R packages were available in conda's r channel, usually with r- prepended to their lowercased name. For example, I installed r-ggplot2, r-tidyr, r-dplyr, r-caret, and r-glmnet. For notebook support, I installed rpy2 and r-irkernel.
Installing R packages that are not included in conda's channel became more difficult after the switch to conda management. For example, devtools::install_github() was failing, and when doing traditional package installation, I had to specify the repos argument because the GUI popup was broken: