-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate conda R from system user library. #65
base: main
Are you sure you want to change the base?
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
Thanks so much for doing this @jdblischak and sorry for the slow review. Do you (or any of @conda-forge/r) think we should add an env var around all of this as an escape hatch to bring back the old behaviour? |
An env solution to invoke the old behavior would be welcome, but not a deal breaker (I've lost track of the number of people who have been bitten by libraries in their home directories). |
@mingwandroid I think that's fine. However, I think it would only work if the user has explicitly set A few follow-up questions:
And a general question, what is the purpose of the subsection |
Yeah, I propose adding env() checks to your patch. I'll try to schedule some time when I next update default's R to build on top of this PR.
We do not want any difference between R in conda-forge and the R in defaults.
Not sure! I guess conda-forge should document it and we should also document it.
This specifies which files (from the recipe, and not from ${SRC_DIR}, those are specified via |
@mingwandroid That'd be great. I'm not familiar with env() checks.
100% agree.
Yeah, that is an Issue. Especially for conda-forge since its docs are about the channel, not how to use any given package that it distributes. The Anaconda docs has a section on using R, so that could be a potential options. https://docs.anaconda.com/anaconda/user-guide/tasks/use-r-language/
OK. Makes sense. Thanks for the explanation! |
@mingwandroid I was just reminded about this PR from a colleague whose code broke because the R packages were being loaded from his user library instead of the conda environment. I think this isolation is very important for conda R users. If the |
I've added support for an environment variable |
@halldc I am updating
|
@msarahan @jjhelmus @katietz Could you please review this PR to r-base? I am attempting to isolate R installed in a conda environment from the user's local R library (which always takes precedence when searching for R packages). Any feedback would be much appreciated. If you approve of the approach, I will start working on support for Windows. Thanks! |
Sounds fine to me. Sorry not a detailed review in the least! Pinging @katietz |
@jdblischak This is used to evaluate whether the parameter is set. ${var+x} is a parameter expansion which evaluates to nothing if var is unset, and substitutes the string x otherwise. See this Stackoverflow post for details. |
@halldc Thanks for the explanation and link. The key for me from that SO answer was that using |
Just got the email from the R Project: "R 3.6.1 scheduled for July 5". It would be nice if we could get this feature into the conda-forge release of r-base 3.6.1. |
For any users that find this thread and need a short-term solution, @dpryan79 provided a good solution in conda-forge/r-rjags-feedstock#6 (comment):
This will replace your default user library with the system library path (which is where conda R packages are installed). To automate this, you could put it inside of an |
Can this be retried now that 3.6.1 is out? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that review took a bit. Change looks sensible to me.
As 3.6.1 is out now, is that patch applying to new version too?
Btw in my opionion we should update soon to 3.6.1 version
@h-vetinari I rebased the PR against R 3.6.1. Could you please test it out? |
@katietz Thanks for the review! Can I interpret this as you being open to the same patch to the r-base recipe in the defaults channel? I'd prefer if the two behaved the same in regards to the user library. |
I can build on my personal laptop, but in the environment where I actually need it, I can't pull stuff from github (but we have a proxy for conda-forge)... There's some complicated ways around it, but even then I'd have to wait until conda-forge supports an r=3.6.1 install. That migration is in progress, at least... |
Yes. I am openminded to put patch also in r-base default channel for 3.6.1 |
@h-vetinari The migration is effectively finished if you can try things out. |
@dpryan79 In a fresh conda install, doing
fails with
Removing
|
@dpryan79 *Or so I thought; actually, all the packages that make up |
I also used the same strategy to handle the environment variable |
-1 to this. I think we should only warn that |
@isuruf Thanks for the feedback. Could you please elaborate on what you mean by "upstream behaviour"? Are you referring to r-base in the defaults channel or R itself? |
I meant R itself. |
I'm not sure I understand then. The environment variables like As an R user, the packages I have installed in my user directory are for use with the R installed on my system. When I activate a conda environment that has R installed, it is because I am specifically trying to isolate my analysis from whatever I happened to have installed on that computer (which likely differs from my other computers, and definitely differs from the computers of my collaborators). If the activate script only warns about the presence of a user directory, then that leaves me to try to come up with some way to get conda R to ignore it (and then I have to help my collaborators do the same). |
That's not necessarily true. People can use a conda installation as system if they want. (Maybe in /opt/conda)
That maybe your use case, but it might just be that a user just wants to easily install R.
The script can give specific instructions on how to fix this. |
Hi @isuruf,
I am very skeptical that this is a use-case that is worth optimizing for. I would argue that many R users simply follow RStudios instructions which incidentally does not play very well with non-default installation paths (such as what a base conda install would create without trickery). In the event the installer of R via conda is able to use trickery, then it seems that creating a simple workaround to "enable" user-local packages would be an appropriate escape hatch, but again I am skeptical of this use case. In contrast, I have met many people who have been interested in systems which package R libraries as part of a larger conda environment, and these same people often have a system install of R (with RStudio) and some of the same packages. By default, the local-packages are loaded first, which causes very inscrutable binary incompatibilities. This is not only intimidating, but it breaks their environment in a manner which is not obvious. Perhaps a compromise here would be to load user-local packages last, but again this would deviate from upstream, and you would be unable to override "system" (conda) installed packages as is often the intent of user-local. Python has a very similar problem here, using essentially the same package management strategy, however they also now have a built-in environment management tool, which is specifically specced to ignore user-local packages (providing a site-flag to load them if desired). The major difference is that R installs into user-local packages by default, whereas the default for Python has been, for a long time, to install into the site-local packages. So this incompatibility with conda was generally unnoticed. That said, Of course, R is not Python, but I really do see their package management systems as very similar. Sorry for the long reply, and thanks @jdblischak for trying to move this forward. I still think it's a great idea, but I think this is the same kind of resistance I saw on the miniconda issue tracker from years ago when it was first discussed. I've resorted to creating pre/post-activation hooks which just do the following:
The approach in this PR is much more polite than that. |
I'm glad you brought up python.
This is not true. With python, libraries in |
I was making that same point, I think we just disagree on what is or is not a "problem". Python's environment manager ignores user-local, so in this way conda deviates from upstream (if you consider it an environment manager). |
I don't understand this. How does conda deviate from upstream? |
This section of PEP 405 defines the behavior of EDIT: there was a better subsection to link to |
I acknowledge that as both a package manager and environment manager, it may be impossible for conda to perfectly handle both of these situations as there isn't a strong distinction between a "root" environment and a normal environment. So it is difficult to really disentangle these competing situations. |
I've needed to use both cases for R. For example, I configured the rstudio server at my old institute to use various versions of conda-installed r-base. This worked quite well and was much more convenient than compiling and installing various versions manually. Users then used the normal methods to install packages in their home directories (the default locations). At the same time our pipelines used r-base in preconfigured envs and then just had the following in their Rscripts:
Then users could run scripts and everything in their home directories would be ignored. So you don't need an environment variable, if you want only-conda libraries loaded just add that one line. |
@isuruf They could use
Thus I'd prefer the default behavior to work well with the recommended installation guidelines, and then provide workarounds for the non-standard system installation of conda (especially since a systems administrator installing R for a group of users should be comfortable with the extra steps).
@isuruf Do you have suggestions for the workaround? Here is a potential workaround I came up with. Users could put the following in their
@dpryan79 While this works, I don't like that this solution requires R to be installed with conda. I don't think an R script should be concerned with implementation details like whether or not R was installed via conda or not. |
That sounds like a good idea and maybe a conda-forge package just for that might fix this. We can warn in the activate script here and ask the user to install that package. |
The following workaround doesn't work on Windows, because
|
@jamesmyatt Thanks for reporting this shortcoming with the proposed solution. Below is an updated version that should work on both Windows and Unix-alike operating systems. I tested it on Windows 10.
You're correct that |
A quick update on this PR: As seen above, there is a debate about whether this "isolation" feature should be implemented. The main issue is that because conda is used in so many different ways, it is difficult to find the right behavior that will be intuitive for all of the use cases. I've abandoned this effort for now because the situation is actually fixed well enough for me on Linux. I don't know why or how the change was made, but now the conda recipe changes the value of the "platform" variable when installing R on Linux.
Because the default path to the user library uses this platform variable, when I activate a conda environment in Ubuntu, it ignores my user packages. Thus it's properly isolated. Unfortunately conda R is still not isolated in the following situations:
My macbook died, so I don't know what the current behavior is on macOS. I'd appreciate it if someone could run |
Might be worth mentioning that the |
@mfansler Yes! That is very relevant. It's a shame that it's been around for over 3 years, and I had no idea about it. Unfortunately I assume the only way to really learn about it is through word of mouth. Thanks for sharing! To clarify though, this solves the problem in the situation where a user has explicitly set Lastly, for the benefits of others that like me were not aware of |
I currently have access to a macOS (Catalina 10.15.7) machine through work. I checked the platform, and unfortunately "conda" is not inserted into the name. However, you could get lucky if your version of macOS differs from the version in which the r-base binary was built. In my case, the user-installed packages wouldn't conflict, but they very easily could on someone else's machine. # homebrew-installed R
# R 4.1.3
> R.version$platform
[1] "x86_64-apple-darwin19.6.0"
# conda-installed R
# r-base==4.1.2=h2b051ba_0
> R.version$platform
[1] "x86_64-apple-darwin13.4.0" |
A few updates relevant to this PR:
https://github.com/wch/r-source/blob/tags/R-4-2-0/doc/NEWS.Rd#L324
I'm not sure what to do with this PR. On the one hand, I'm aware at this point that it is going on almost 4 years, and has zero chance of being merged. Thus I'm not planning on updating it. On the other hand, I think it serves as very useful documentation. Users of conda-installed R are typically surprised when they find out that their conda envrionments are using their locally installed packages (and are thus not at all isolated or reproducible). I know I was. Furthermore, this PR makes it clear that it's not enough to simply temporarily clear |
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)It's a constant frustration for me that conda R uses the R packages installed in the R user library. This is also a major limitation when trying to convince new conda users about the reproducibility benefits of conda.
This PR attempts to do the following:
Note to future self: I created the patch by 1) cloning r-source, 2) editing and committing etc/Renviron.in, and 3) running
git format-patch HEAD~1
.This issue has previously been discussed in #37.
cc: @mingwandroid @ebolyen @bgruening @isuruf