Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve conda variant handling #448

Closed
wants to merge 1 commit into from

Conversation

wolfv
Copy link
Contributor

@wolfv wolfv commented May 30, 2021

There are a couple of formatting issues here but I am looking for some early feedback on this PR:

  • I am adding a disfavor map entry for those packages that have a track_features count != 0
  • I am returning multiple package versions for prune_to_best_version to create multiple solve branches. With conda the package alternatives have the same name, but different build strings. So in a first step the best package is selected, and in a second step the alternatives are added back in. Alternatives need to have the same version and build number as the "best" package (e.g. for scipy pypy and cpython builds):
    • scipy-1.6.3-py37h29e03ee_0
    • scipy-1.6.3-py37ha768fb6_0

So far this seems to be working quite well. I am just wondering if I am on the right path with this?

@beckermr
Copy link

Does this handle more than one feature?

@wolfv
Copy link
Contributor Author

wolfv commented May 30, 2021

@beckermr in a way, yes, because in one step we're sorting based on the number of track_features found.

But in the global minimization step we don't take into account how many track_features are added. Would love to get some deeper insight by @mlschroe how/if we could do that properly.

On the other hand, this entire "track_feature" is a bit of a weird artifact and maybe at some point we can come up with a revamped, better scheme...

@beckermr
Copy link

I do think track features is very useful since it sits above version number ordering. Whatever we replace it with needs this same property I think.

@wolfv wolfv force-pushed the improve_conda_variant_handling branch from c734a16 to 316da14 Compare June 2, 2021 14:16
@wolfv
Copy link
Contributor Author

wolfv commented Jun 4, 2021

fixes #447

@wolfv
Copy link
Contributor Author

wolfv commented Jun 17, 2021

Unfortunately this patch didn't work as well as I had hoped (especially in combination with the timestamp maximization which shuffled the order of packages a bit differently) :)

The biggest "problem" is that we do not have proper metadata since the "track_features" is not exported in many cases. For example, I might try to install numpy, and there is one numpy x pypy and a numpy x cpython. Now numpy itself does not have any de-priorization (only pypy has).

As far as I understand we have two options:

  • Either we find a way to downweight numpy directly -- for example by inspecting the direct dependencies: if we find a dep that pins a build string we check if this only matches packages with track features. Similarly we could prioritise packages that require higher version numbers of e.g. python with a requirement such as python >3.9,<3.10 prioritised over python >=3.7,<3.8 etc.)
  • We figure out how to do a second back-tracking search during the global minimization phase and try to avoid branches with track features (which I tried to do in this PR but as mentioned, the solutions aren't yet perfect).

Another, third option, would be to figure out how to globally add metadata on variants: we could have a global entry in the repodata with information about what variants exist, what the default choice should be and use that. With that information we might be able to take decisions in a faster way as we can sort the dependencies straight away without searching for potential track features.

@mlschroe if you have some insights into how we could achieve this best, would be greatly appreciated!

@beckermr
Copy link

How does conda do this? Do they run the solver with each variant fixed and then choose?

@wolfv
Copy link
Contributor Author

wolfv commented Jun 17, 2021

They add these clauses for the SAT solver minimization I guess:

https://github.com/conda/conda/blob/7dbd2729e4916446da65dacb444a9d33e6f8f355/conda/resolve.py#L929-L957

I am not yet sure how to properly achieve this with libsolv... but gonna keep trying :)

@beckermr
Copy link

Yeah I have no idea what that code does.

@wolfv
Copy link
Contributor Author

wolfv commented Jun 17, 2021

as for libsolv / rpm the way I understand is: for a variant they produce two distinct packages that both have the same "provides".

E.g. we'd have

numpy-pypy-37, and numpy-cpython-37 which both "provide numpy", and similarly pypy and cpython which would "provide python".

So the numpy package would also have to be a "proper variant", and one of the variants might be the "recommended" variant that would (hopefully) be chosen.

My problem is that I don't have this information on the NumPy package itself (on the first level I don't know which one is recommended, only by inspection of the first level of dependencies I can get to that information).

@mlschroe
Copy link
Member

Sorry for ignoring you, the last days were a bit too packed with other work. I'll try to look at this the next week.

@wolfv
Copy link
Contributor Author

wolfv commented Jun 22, 2021

@mlschroe no worries, would be great to get your input!

I think we have two problems that are slightly related.

1. select the better variant right away

When we have a package like numpy, we have ~5 variants that currently all look equal to libsolv (we are relying on buildstring or Id comparison, so it's almost a random selection. The variants are basically for python=3.6=cpython, python=3.7=cpython, python=3.9=cpython, ..., and python=3.7=pypy.

So first, it would be good to select the variant that has the highest dependencies. My idea was to look at the lower and upper bound of the dependency selectors. And thus to sort the variant to the top that has - python >=3.9,<3.10 vs the one that has - python >=3.6,<3.7.

2. try other branches if we end up with at track feature

This one is probably harder; conda does a global optimisation to "globally minimise the number of track features in the environment". So if we end up with a solution that has a package with an attached track feature, it would be nice to have a way to figure out if there would be another branch where we wouldn't end up with a track feature package. However, if we continue with the example of numpy, it's a bit tough to figure this out straight away (also because of the way the metadata is currently arranged in the conda-forge channel).

For the numpy-1.20-pypy package we have a list of dependencies that looks like

libblas >=3.8.0,<4.0a0
libcblas >=3.8.0,<4.0a0
libgcc-ng >=9.3.0
liblapack >=3.8.0,<4.0a0
pypy3.7 >=7.3.4
python >=3.7,<3.8.0a0
python_abi 3.7 *_pypy37_pp73

However, the package that is down-weighted by track feature is python=3.7.*=*pypy, so only after we have a full solution obtained we will see that we had to select python with a track feature.

There are two ways we could change the metadata to make the problem "easier":

  • inherit the track feature de-prioritization to the numpy package so that we know "in" the numpy package that the pypy variant shouldn't be selected by default
  • pin the python package by build string so that it reads something like python >=3.7,<3.8.0a0 *_pypy -- in that case we could search for all packages matching that build string and check if all / any have a track feature attached and thus also mark the numpy package as the non-default one.

However, if we do not change the metadata (which will take time ...) I was thinking that we could intelligently explore alternative branches with libsolv. I checked and for large environments an exhaustive search seemed very slow. However we could note that we have selected a python package with a track-feature, and then check the first branch where we selected a package that had a python requirement and evaluate the other branch to see if we end up with fewer track features. But maybe this idea is too trivial and won't work.

@wolfv
Copy link
Contributor Author

wolfv commented Jun 24, 2021

Just to give a quick update here:

I have some experimental code to extract the lower and upper bound from dependency strings like `>=3.8,https://github.com/wolfv/libsolv/blob/a7ad64b4181a7e5c9515efc37deb6f2dd79e02b4/src/conda.c#L683
which is then relying on a regular EVR comparison to figure out which bound is higher (not completely finished this code, yet.

Also, in conda-forge a repodata change was merged so that we can now determine from the "first-order" dependencies wether a dependency has a track_feature (e.g. numpy depends on python_abi 3.7 *pypy now, and all packages matching that version + build string will have a track_feature which we can then inherit to de-prioritise that numpy variant.

Still very interested in feedback :)

@beckermr
Copy link

Also, in conda-forge a repodata change was merged so that we can now determine from the "first-order" dependencies wether a dependency has a track_feature (e.g. numpy depends on python_abi 3.7 *pypy now, and all packages matching that version + build string will have a track_feature which we can then inherit to de-prioritise that numpy variant.

We did this for python, but I suspect there may be other features where this is not done.

@wolfv
Copy link
Contributor Author

wolfv commented Jun 24, 2021

I think for most other features the pinning will be more "direct". For python it was quite indirect over the python_abi and an explicit dependency to pypy3.7 etc.

@beckermr
Copy link

I hope so! We will need to keep this in mind when making features in the future though. We also may need to fix up some of the mpi ones for mpich.

@mlschroe
Copy link
Member

Regarding track features: IMHO the SAT-wise cleanest implementation would be to add new "trackfeature" rules that disallow the installation of any package that has a track feature (except for the features already installed, I guess). This makes the solver abort when it needs a new tracked feature, which can then be added to the allowed list.

The point of doing this is that it makes the solver backtrack if it needs to install a new tracked feature. I.e. in your case, it will go and choose the non-pypy variant if it comes to the pypy dependency.

This is pretty cheap to implement and somewhat simulates the track_feature minimization of conda.

@beckermr
Copy link

Thanks for the note!

We use track features to set the global priority of different solutions. Disallowing if the feature is not there is closer to the deprecated conda behavior that we do not use.

Is there a way to keep a running total of how many features are currently found for the solution and force the solver to backtrack if this increases? The only trick here would be to allow solutions with a non-zero number of features if it cannot find any solution with zero features.

@wolfv
Copy link
Contributor Author

wolfv commented Jun 25, 2021

Thanks for getting back at us @mlschroe :)
It sounds like a good idea (although I am not 100% sure how to achieve this). However, we would also want to cover the case where a user installs e.g. xeus-cling which requires a special version of clang that comes with a track feature. So disallowing all track features is also not desired. But if we could back-track, and only check branches with track features after having exhaustively searched, that would be perfect.

@mlschroe
Copy link
Member

The disallow is just an internal sat-solver mechanism to make it backtrack. It will automatically enable the track feature if this is the only option.

@mlschroe
Copy link
Member

(basically like the automatic uninstall works if SOLVER_FLAG_ALLOW_UNINSTALL is set.)

@wolfv
Copy link
Contributor Author

wolfv commented Jun 28, 2021

I am closing this PR in favor of #457

The new one is simpler and builds more on top of existing stuff. Still would love to get some feedback on how to write better integrated C code :)

This improves package resolutions quite a bit in several cases. E.g. mamba install numpy resolves to python 3.9 etc.

@wolfv wolfv closed this Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants