Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rip backtracks too far back (performance issue) #174

Open
notatallshaw opened this issue Jan 25, 2024 · 11 comments
Open

rip backtracks too far back (performance issue) #174

notatallshaw opened this issue Jan 25, 2024 · 11 comments

Comments

@notatallshaw
Copy link

notatallshaw commented Jan 25, 2024

Environment: Linux Python 3.11

Command: cargo r -- apache-airflow[all]==2.8.1

Error:

2024-01-26T16:45:56.613739Z ERROR rattler_installs_packages::index::package_database: Error from source distributions 'apache-beam-2.42.0.zip' skipped: 
 could not build wheel: <string>:28: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
Traceback (most recent call last):
  File "/tmp/.tmpG0tRNU/build_frontend.py", line 124, in <module>
    get_requires_for_build_wheel(backend, work_dir)
  File "/tmp/.tmpG0tRNU/build_frontend.py", line 58, in get_requires_for_build_wheel
    result = f()
             ^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
    return self._get_build_requires(config_settings, requirements=['wheel'])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
    self.run_setup()
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 480, in run_setup
    super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 311, in run_setup
    exec(code, locals())
  File "<string>", line 99, in <module>
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 528, in get_distribution
    dist = get_provider(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 400, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 968, in require
    needed = self.resolve(parse_requirements(requirements))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 829, in resolve
    dist = self._resolve_dist(
           ^^^^^^^^^^^^^^^^^^^
  File "/tmp/.tmpG0tRNU/venv/lib/python3.11/site-packages/pkg_resources/__init__.py", line 870, in _resolve_dist
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pip' distribution was not found and is required by the application

  × Could not solve for requested requirements
  ╰─▶ could not find metadata for any sdist or wheel for this package. No metadata could be extracted for the following available artifacts:
        - apache-beam-2.42.0.zip
  help: Probably an error during processing of source distributions. Please check the error message above.

Expected Behavior: The issue isn't the error itself, this package is too old to install for my environment.

Rather there is some performance characteristic that feels wrong with rips resoluition here. It should not be backtracking so far on apache-beam, and I had a previous case where it was going this with snowflake-connector-python. Pip does not need to backtrack so far back and is able to solve this requirement.

This might be very trickly to fix, such complicated resolution graphs are not easy to pick apart and understand why something happened.

Edit: I've updated the error with the latest output I get on main.

@notatallshaw
Copy link
Author

notatallshaw commented Jan 27, 2024

I don't have a good understand on the interaction between resolvo and rip, but I would strongly suggest if there are multiple packages to check to somehow guide resolvo to try packages which have a valid wheel for the current Python before trying to build sdists, it will result in better performance in a lot of situations.

@baszalmstra
Copy link
Contributor

We have the option to first try all wheels before resorting to sdists for a specific package but now that I think about this it would be nice if we could also instruct resolvo to first backtrack further and only if that fails to also include sdists. 🤔

@notatallshaw
Copy link
Author

notatallshaw commented Jan 27, 2024

It's something I've suggested for Pip (pypa/pip#12035), and I now have an idea in my head how one would implement it for Pip. But I have other resolution improvements I am working on first over there, so I can't give you any real world data on how much iut helps..

Other than sdists are slow, even when they're not causing build problems.

@tdejager
Copy link
Contributor

But its hard not to use them :)

@notatallshaw
Copy link
Author

notatallshaw commented Jan 29, 2024

But its hard not to use them :)

Well it's not that you avoid using them altogether. It's just that often when backtracking to older versions of a project and the next older version requires building an sdist instead of wheel it's likely that means the Python version you're on is too new for that old of a package on that project.

It hence makes sense to try backtracking on different projects, at least until you've exhausted all other equally likely backtracking opportunities.

At least that's my hypothesis.

@baszalmstra
Copy link
Contributor

I think this can be done if we allow DependencyProviders control over which requirement to decide on next. We can then first try requirement clauses for packages that would result in a wheel to be picked instead of one that would pick an sdist.

@aochagavia
Copy link
Contributor

I have a hunch that this problem might be caused by a bug in resolvo. I think this line should be a continue instead of a return. Not totally sure, though.

@baszalmstra
Copy link
Contributor

That is indeed a bug! It means not all clauses for a solvable are added!

@tdejager
Copy link
Contributor

Might be a bug, but had some discussion with @aochagavia offline, and changing it does not change the backtracking behavior. Makes sense because it only gets in the Unknown clause when its already at the apache-beam-2.42.0.zip version, then the extensive backtracking has already occured.

@tdejager
Copy link
Contributor

But its hard not to use them :)

Well it's not that you avoid using them altogether. It's just that often when backtracking to older versions of a project and the next older version requires building an sdist instead of wheel it's likely that means the Python version you're on is too new for that old of a package on that project.

It hence makes sense to try backtracking on different projects, at least until you've exhausted all other equally likely backtracking opportunities.

At least that's my hypothesis.

I think that this idea.

I think this can be done if we allow DependencyProviders control over which requirement to decide on next. We can then first try requirement clauses for packages that would result in a wheel to be picked instead of one that would pick an sdist.

With this implementation still seems the way to go, I wonder what ripple effect such a change but I think we can only know by trying.

@notatallshaw
Copy link
Author

I wonder what ripple effect such a change but I think we can only know by trying.

Yeah, this is a very opinionated optimization I'm suggesting, and the specific details on the implementation are likely to make a big difference on resolving certain requirements.

I do think though the PyPI packing ecosystem does require opinionated optimizations to get "good" resolution behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants