Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.libPaths() can conflict with user/system installed R/RStudio #37

Open
ebolyen opened this issue Mar 9, 2018 · 31 comments
Open

.libPaths() can conflict with user/system installed R/RStudio #37

ebolyen opened this issue Mar 9, 2018 · 31 comments

Comments

@ebolyen
Copy link

ebolyen commented Mar 9, 2018

Hello,

I think I've finally found the issue and the right place to post it, but apologies if I'm mistaken.

It appears that when you have R installed from conda-forge and a system R (say RStudio), the conda installed R will include the user-package directory used by the system R in it's search path (.libPaths()).

This means if you have an R package installed with conda in your environment and the same package installed in RStudio, your conda environment will use the RStudio package (from your user directory) instead of the conda package. If there were any built extensions in that package (which usually there are) you'll end up with a cryptic segfault or memory not mapped error as the conda R is not binary compatible with the system R.

The reason the user-directory is included in .libPaths() is because of this section of the Renviron file.

If you install r-base and then comment out that section in ~/.conda/envs/<whatever>/lib/R/etc/Renviron the issue with .libPaths() disappears and your library() calls will only ever see your conda environment's packages, which I believe is kind of the entire point of conda.

I think this repo is where one would provide a patch file to drop that section of the Renviron file to make R installations isolated under conda. Assuming I'm not missing something important, would it make sense to provide a PR to fix this for everyone?

Cross referencing this issue: qiime2/q2-dada2#68 (where we have been trying to figure this out for a while).

@jdblischak
Copy link
Member

Thanks for starting this discussion @ebolyen and sharing your research.

I also get frustrated that the conda R does not ignore the user's personal R directory. To me this defeats one of the main reasons I use conda, which is to create reproducible computational environments that I can share with colleagues via an environment.yaml file. Even if my colleagues install and activate a given conda environment, I can't guarantee the setup is correct because I don't know what packages they have installed in their user directory.

Also, this has bitten me before on Ubuntu, e.g. this bug where the user directory caused conda packages to be installed in the wrong location.

However, I don't think we want to make the change unilaterally here. conda-forge is the community collection of conda packages, and I don't think we would want R to behave differently whether it was installed from the conda-forge channel or the defaults channels (the defaults recipe is hosted at AnacondaRecipes/r-base-feedstock).

@mingwandroid What is your opinion on modifying the R installed by conda to ignore the user's local package directory?

@mingwandroid
Copy link
Contributor

I am in two minds about this.

Making such changes can annoy upstreams and lead to claims that we're fragmenting the ecosystem. Also documentation is no longer valid.

But while we're at it how about re-routing install.packages() through conda in the first instance? I've been tempted to try to do this for a while. I'd probably hide it behind an env var that's off by default though.

With R upstream providing binaries for Windows and now macOS it makes us no longer fully compatible. Will the same thing happen on Linux?

I think I'd like there to be a manylinux2 style approach for binaries there but I'd like it to be based on our compilers.

@ebolyen
Copy link
Author

ebolyen commented Mar 13, 2018

To me this defeats one of the main reasons I use conda, which is to create reproducible computational environments that I can share with colleagues via an environment.yaml file.

Exactly! We distribute a bioinformatics framework called QIIME 2, and our installation mechanism is a conda environment which has worked amazingly for the most part. The only exception is R, as many bioinformaticians use RStudio and have some of the very same libraries we use, so we run into this reasonably often.

However, I don't think we want to make the change unilaterally here. conda-forge is the community collection of conda packages, and I don't think we would want R to behave differently whether it was installed from the conda-forge channel or the defaults channels (the defaults recipe is hosted at AnacondaRecipes/r-base-feedstock).

That makes sense, and it got me thinking, "well if Python is virtualized by conda, why shouldn't R be as well? Maybe this should go in defaults!"

So I double checked how conda treated Python's sys.path, and I learned today that Python has the same concept of a user site-packages as R. Much to my dismay I do see the user site-packages in my sys.path within an active conda environment (and even at a higher precedence than my env site-packages!).

So I suppose conda isn't treating R any differently than Python. You can escape the environment in either language. It's just that a lot more people use RStudio which uses the user-package directory than there are people that know about that user site-packages for Python.

But while we're at it how about re-routing install.packages() through conda in the first instance? I've been tempted to try to do this for a while. I'd probably hide it behind an env var that's off by default though.

That would be pretty incredible.

With R upstream providing binaries for Windows and now macOS it makes us no longer fully compatible. Will the same thing happen on Linux?

Doesn't this already happen with libc? I know we're stuck using Centos 5 for the older libc, but it's probably only a matter of time before the forwards-compatibility breaks in a way that matters.


I wonder if maybe the right way to go about this would be to create a patch-only package which alters the way conda integrates with R. Then if you want to ship a reproducible R environment you can include that package in your environment list. Otherwise R acts like it "should" and will prefer your user-packages, binary incompatible though they may be. That would work well for our purposes, and would make it easy for anyone to opt-in to the same behavior.

Technically the same thing could be done for Python as well (though I've really never seen the user site-packages until I looked for them today).

@jdblischak
Copy link
Member

Making such changes can annoy upstreams and lead to claims that we're fragmenting the ecosystem. Also documentation is no longer valid.

@mingwandroid I agree this is a big concern. However, I still think it makes sense to do this. The entire reason for the user directory is because the user needs a location where they have write-access. One of the benefits of using conda is that its a local installation. In that sense, when an R user installs R via conda, they are already breaking from the traditional ecosystem. Using the system .Renviron settings from a traditional ecosystem only serves to interfere with the conda setup.

But while we're at it how about re-routing install.packages() through conda in the first instance? I've been tempted to try to do this for a while. I'd probably hide it behind an env var that's off by default though.

That would be a cool feature, but I think it is less urgent than fixing the current issue with the user directory.

I wonder if maybe the right way to go about this would be to create a patch-only package which alters the way conda integrates with R. Then if you want to ship a reproducible R environment you can include that package in your environment list. Otherwise R acts like it "should" and will prefer your user-packages, binary incompatible though they may be. That would work well for our purposes, and would make it easy for anyone to opt-in to the same behavior.

@ebolyen That's an interesting proposal. I'd be willing to take that compromise. My worry though is that new conda users would be unlikely to know about this alternative R conda package.

This comes down to what the purpose of installing R via conda is. When I just want to use R to do some exploratory analysis, I use the system version of R and R packages installed via install.packages (and on Ubuntu I make heavy use of the APT binary packages available from cran2deb4ubuntu). It's only once I have a mature project (that often involves collaborators), that I want to start isolating my environment. For my use cases, I never want to mix the conda R environment with the system R environment.

as many bioinformaticians use RStudio and have some of the very same libraries we use

@ebolyen Also, I wanted to note that this isn't an RStudio feature. As you noted in your original post, the location of the user directory is determined by the configuration files shipped with R.

@ebolyen
Copy link
Author

ebolyen commented Mar 13, 2018

Not to complicate things further, but there's also the root environment, which I've always thought of as a replacement for a system-install. In the root env you could argue that user-package directories make sense again (but why even use conda??).

@jdblischak
Copy link
Member

In the root env you could argue that user-package directories make sense again (but why even use conda??).

@ebolyen I'd argue that mixing conda and system packages is a bad idea even in the root environment. In the long run, mixing and matching always ends in frustration. So many users get frustrated with conda because they'll install a few compiled packages via conda, but then also set a custom LD_LIBRARY_PATH for compiling packages from source, which always ends in disaster. To me this issue with the R library paths is analogous to the LD_LIBRARY_PATH issue. conda works with a conda setup, not mix-and-match.

@jdblischak
Copy link
Member

@mingwandroid Another complication with trying to override install.packages is that RStudio also intercepts any call to install.packages (it offers to restart the session first if the package to be installed is currently loaded).

# from within RStudio
> install.packages
function (...) 
.rs.callAs(name, hook, original, ...)
<environment: 0x1041c0840>

@ebolyen
Copy link
Author

ebolyen commented Mar 14, 2018

To me this issue with the R library paths is analogous to the LD_LIBRARY_PATH issue. conda works with a conda setup, not mix-and-match.

I like this line of argument. So then should the Anaconda distribution of Python also be patching sys.path to not include $HOME/.local/lib/python3.X/site-packages? That is the most direct analogue of R's .libPaths() I think (with the caveat that the use of R's user-directory is very common in contrast to Python's).

@jakirkham
Copy link
Member

We did sort out some issues early on to ignore site-packages from the system IIRC. Not sure if that would solve .local based paths or not. Could you please test and report the findings over at the python feedstock (even if they are affirmative)?

@ebolyen
Copy link
Author

ebolyen commented Mar 14, 2018

Sure thing! I've already tested that the .local is indeed included, but I'll create a little write-up demonstrating that.

@jdblischak
Copy link
Member

I like this line of argument. So then should the Anaconda distribution of Python also be patching sys.path to not include $HOME/.local/lib/python3.X/site-packages?

@ebolyen I agree. I'm surprised that this also affects local Python packages.

Here's the test I performed to complement your test:

# Start a Docker container running Ubuntu
$ sudo docker run --rm -ti ubuntu bash
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"

# Install system Python
$ apt update
$ apt install -y python python-pip wget

# Install requests in user library
$ pip install --user requests
$ python -c 'import sys; print sys.path'
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/root/.local/lib/python2.7/site-packages', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages']

# Install Miniconda2
$ wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
$ bash Miniconda2-latest-Linux-x86_64.sh
$ source ~/.bashrc

# Confirm that user library still included on path
$ python -c 'import sys; print sys.path'
['', '/root/miniconda2/lib/python27.zip', '/root/miniconda2/lib/python2.7', '/root/miniconda2/lib/python2.7/plat-linux2', '/root/miniconda2/lib/python2.7/lib-tk', '/root/miniconda2/lib/python2.7/lib-old', '/root/miniconda2/lib/python2.7/lib-dynload', '/root/.local/lib/python2.7/site-packages', '/root/miniconda2/lib/python2.7/site-packages']
$ python -c 'import requests'

@ebolyen
Copy link
Author

ebolyen commented Mar 14, 2018

I had no idea pip had a --user flag. That's much easier than what I did!

@ebolyen
Copy link
Author

ebolyen commented Apr 9, 2018

I think I'm going to be working on a "patch" package since there doesn't seem to be a lot of traction over on conda's issue tracker. My current plan is to use post-link/pre-unlink scripts.

They do say not to do what I'm about to do:

Post-link and pre-unlink scripts should:
- Be avoided whenever possible.
- Not touch anything other than the files being installed.
- Not write anything to stdout or stderr, unless an error occurs.
- Not depend on any installed or to be installed conda packages.
- Depend only on simple system tools such as rm, cp, mv and ln.

But I don't see a better option here (short of changing how the language installs itself in an environment).

Does anyone have a better way to make this conda package?

@jdblischak
Copy link
Member

My current plan is to use post-link/pre-unlink scripts.

@ebolyen Could you please elaborate? How are you planning to use these script to modify the library paths?

My idea would be to include a patch that deletes the R_LIBS_USER definitions from /etc/Renviron.in:

https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/etc/Renviron.in#L43

This would cover the most common use case, where a user is installing local R packages to the default R_LIBS_USER for their OS. It wouldn't be able to prevent the situation in which a user has defined a custom R_LIBS_USER in their ~/.Renviron, but presumably a user with a custom user library would have an easier time debugging any issues that arose from this.

There may be a better way to do this (i.e. passing an option to one of the commands in build.sh), but I couldn't find an obvious solution after skimming R Installation and Administration

@ebolyen
Copy link
Author

ebolyen commented Apr 9, 2018

Ah, yeah that is essentially my plan as well. Using Renviron.in looks like the best way to do this.

My plan was to avoid creating a different version of R by instead making a recipe that modified your conda environment directly (instead of the R package before installation into the conda-environment).

YAML:

...
requirements:
  run:
    - r-base  # already installed
...

/bin/.isolate-r.post-link.sh

#/bin/bash
sed -i '/R_LIBS_USER/d' $CONDA_PREFIX/R/etc/Renviron

/bin/.isolate-r.pre-unlink

# undo that somehow, maybe there's a backup made

@ebolyen
Copy link
Author

ebolyen commented Apr 9, 2018

Basically, there isn't actually any source to install, the package would just be a vehicle for executing the environment modifications (similar to distributing shared environment variables with post-activate hooks).

@jdblischak
Copy link
Member

My plan was to avoid creating a different version of R by instead making a recipe that modified your conda environment directly (instead of the R package before installation into the conda-environment).

OK. That makes sense. Please let me know when you have a prototype that I can test out.

Basically, there isn't actually any source to install, the package would just be a vehicle for executing the environment modifications (similar to distributing shared environment variables with post-activate hooks).

What were you planning on calling this new package?

@ebolyen
Copy link
Author

ebolyen commented Apr 9, 2018

OK. That makes sense. Please let me know when you have a prototype that I can test out.

Will do!

What were you planning on calling this new package?

¯\_(ツ)_/¯ conda-lang-isolation maybe? It would be easy to roll in the same changes for Python as well.

@claczny
Copy link

claczny commented Oct 18, 2018

I was wondering why conda's Renviron file defines R_LIBS_USER and not R_LIBS?

The documentation at https://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html says the following:

The library search path is initialized at startup from the environment variable R_LIBS (which should be a colon-separated list of directories at which R library trees are rooted) followed by those in environment variable R_LIBS_USER. Only directories which exist at the time will be included.

As such, if a user has a local Renviron file (e.g., ~/.Renviron) which defines R_LIBS, this will always be given precedence over the conda settings.

While for the sake of true installation-isolation I would not suggest to do so, conda's Renviron could prepend the conda-respective path to R_LIBS, thus enabling access to the user's local and the environment-based R packages.

UPDATE:
Upon further looking into this, it seems that the user-level-preferred configuration is to use R_LIBS_USER(which makes sense when thinking about it) and not R_LIBS, hence I assume the decision to use R_LIBS_USER.

However, this might still create problems if a user has set R_LIBSwhich always gets precedence over R_LIBS_USER (s. above).
Nevertheless, might it not already be a simple step in the "isolation"-direction if conda's R_LIBS_USER would be prepended so that it would be searched and used first?

@W-L
Copy link

W-L commented Feb 22, 2019

Has there been any progress with this?
I'm also maintaining a conda environment for a bioinformatic application and was pretty frustrated, when I found out that R (within conda) does not prioritize the packages installed in the conda environment.
I thought I could share this little bodge to prioritize the conda packages, in case anybody might find it useful. I place it at the beginning of my R scripts to change the order of .libPaths():

prioritize_conda <- function(lib_tree){
  cpath <-  grep('conda', lib_tree, value=TRUE, ignore.case=TRUE)
  ifelse(length(cpath) == 0, return(lib_tree), return(rev(c(lib_tree, cpath))))
}

new_tree <- prioritize_conda(lib_tree=.libPaths())
.libPaths(new_tree)

@ebolyen
Copy link
Author

ebolyen commented Feb 22, 2019

Our solution was to just set the env vars in a post-activate hook: qiime2/qiime2#395

We haven't had an issue with this since, but it would be very nice if upstream considered changing the default behavior w.r.t. user-packages.

@jdblischak
Copy link
Member

jdblischak commented Feb 23, 2019 via email

@blaiseli
Copy link

For what it's worth: I may have a similar issue with singularity: My container has an R installation, but if the user of the container also has local packages, R run in the container will try to load things from the user local packages. Its quite annoying that just by having local packages makes the container not usable.

@izahn
Copy link

izahn commented May 12, 2021

I've read this thread several times, but I still don't fully understand what the problem is. Can someone write up a reproducible example and explain specifically what the problem is?

@jdblischak
Copy link
Member

I've read this thread several times, but I still don't fully understand what the problem is.

conda-installed R will use the R packages installed in the users personal directory instead of the versions installed via conda because the user directory is listed first in .libPaths(). Therefore the conda environment is not "isolated". The user may be using completely different versions of R packages than what was specified, e.g. in an environment.yml file.

Can someone write up a reproducible example and explain specifically what the problem is?

This can be tricky because it depends on the setup on your machine. Here is an example I ran on Ubuntu. I have R3 installed, and I installed R4 with conda. If you have R4 installed on your machine, install R3 with conda to achieve the same effect.

Rscript --version
export R_LIBS_USER=/tmp/rlibs/
mkdir -p $R_LIBS_USER
Rscript -e '.libPaths()'
Rscript -e 'install.packages("jsonlite")'
mamba create --yes -n reprex r-base=4 r-jsonlite
conda activate reprex
Rscript --version
Rscript -e 'library("jsonlite")'

Here's what I see:

$ Rscript --version
R scripting front-end version 3.6.3 (2020-02-29)
$ export R_LIBS_USER=/tmp/rlibs/
$ mkdir -p $R_LIBS_USER
$ Rscript -e '.libPaths()'
[1] "/tmp/rlibs"                    "/usr/local/lib/R/site-library"
[3] "/usr/lib/R/site-library"       "/usr/lib/R/library"
$ Rscript -e 'install.packages("jsonlite")'
Installing package into ‘/tmp/rlibs’
...
$ mamba create --yes -n reprex r-base=4 r-jsonlite
$ conda activate reprex
$ Rscript --version
R scripting front-end version 4.0.5 (2021-03-31)
$ Rscript -e 'library("jsonlite")'
Error: package or namespace load failed for ‘jsonlite’:
 package ‘jsonlite’ was installed before R 4.0.0: please re-install it
Execution halted

The above demonstrates the issue when the user has explicitly set a value for R_LIBS_USER. It's even more insidious when the user is using R's default OS-specific location for the user directory. If I have time, I'll try to create an example using a Docker image.

Please see PR #65 for additional discussion and a proposed solution.

@izahn
Copy link

izahn commented May 12, 2021

For me as a long-time R user this is the expected and appropriate behavior. You specified R_LIBS_USER and so R looked there for packages, as documented. Any other behavior would be a bug from my perspective.

If you don't set R_LIBS_USER then running .libPaths() at the end of the example returns

"~/.conda/envs/reprex/lib/R/library"

That is a problem for the reason I explain in #169 , but that doesn't seem to be the thing people are objecting to here.

?.libPaths says

By default ‘R_LIBS’ is unset, and ‘R_LIBS_USER’ is set to
directory ‘R/R.version$platform-library/x.y’ of the home directory
(or ‘Library/R/x.y/library’ for CRAN macOS builds), for R x.y.z.

and indeed

$ Rscript -e 'Sys.getenv("R_LIBS_USER")'  
[1] "~/R/x86_64-conda-linux-gnu-library/4.0"

That isn't environment-specific, but it is conda and R version specific, which will avoid the version incompatibility in your example, and that directory won't be used by default anyway.

@jdblischak
Copy link
Member

Any other behavior would be a bug from my perspective.

I agree that R is doing what it is documented to do in regards to the user library. I and others use conda environments to create isolated computational environments. I don't want them to be affected by what I happened to have installed in my user library for use with my system-wide installation of R. When I give a collaborator an environment.yml file, I want them to run the code with the R package versions I specified in that file, not whichever versions that installed in their user library.

That isn't environment-specific, but it is conda and R version specific, which will avoid the version incompatibility in your example, and that directory won't be used by default anyway.

That is a new development. That wasn't the case when this issue was originally created. This fixes the default situation on Linux, as I had noted in #65 (comment)

But conda-installed R is still not isolated if 1) the user has manually specified R_LIBS_USER, or 2) they are using Windows (and maybe macOS, not sure).

@izahn
Copy link

izahn commented May 12, 2021

That is a new development. That wasn't the case when this issue was originally created. This fixes the default situation on Linux, as I had noted in #65 (comment)

I think that is the key fact I missed, makes much more sense now! Thanks for helping me understand that.

But conda-installed R is still not isolated if 1) the user has manually specified R_LIBS_USER,

It is isolated if the user sets R_LIBS_USER to a conda environment specific location :-) Users setting R_LIBS_USER are presumably doing so because they want R to look for libraries there, and that should be respected.

or 2) they are using Windows (and maybe macOS, not sure).

I just checked on Windows, and indeed R installed in conda environments shares the same R_LIBS_USER environment variable as R installed directly from CRAN. I agree that is a problem, and I'll look into fixing it the same way as was done for Linux. I don't have a Mac to test on, but possibly the same thing should be done there as well.

@kevinpauli
Copy link

It's a problem on Mac. Much googling has led me here. :). What is the recommended solution for a mac user?

@dpryan79
Copy link
Contributor

@kevinpauli Assuming you want R to ignore non-conda packages, put this in your scripts before any library() or require(): .libPaths(R.home("library"))

@jdblischak
Copy link
Member

Another potential solution: you can force conda-installed R to ignore an explicitly set R_LIBS_USER by installing the conda-forge package conda-ecosystem-user-package-isolation. Thanks to @mfansler for the tip in #65 (comment)

Warning though: this will not save you from user-installed packages installed in the default user-library on Windows and macOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants