Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip install is unpredictable and often breaks Colab usage #620

Closed
sdenton4 opened this issue Mar 1, 2024 · 6 comments
Closed

Pip install is unpredictable and often breaks Colab usage #620

sdenton4 opened this issue Mar 1, 2024 · 6 comments

Comments

@sdenton4
Copy link
Collaborator

sdenton4 commented Mar 1, 2024

I discovered that pip install doesn't actually make use of the poetry lock file, and essentially makes up the dependency tree on the fly from the pyproject.toml file.... This means that it's pretty easy to get into a weird state when we do the colab pip install: the lock file gives us a specific tested combination of dependency versions, with CI tests, but we don't have any real way to test what's going on with the pip-installed version.

Ideally, we should have pip install the exact set of dependency versions specified in the lock file, to ensure that our CI testing actually tells us that the Colab notebooks are working.

There's some pretty extensive discussion here of the problem:
python-poetry/poetry#2778 (comment)

@sdenton4
Copy link
Collaborator Author

sdenton4 commented Mar 2, 2024

Sounds like the answer may be some combination of publishing a pre-built wheel and exporting a requirements.txt with poetry...

python-poetry/poetry#2778 (comment)

all of this moves us in the direction of a 'real' release process, which... fair.

@sdenton4
Copy link
Collaborator Author

sdenton4 commented Mar 5, 2024

I spent a good chunk of time experimenting with pip install in Colab, using various permutations of requirements.txt files. Short version is I don't think that simply publishing a requirements file with locked dependencies is a workable solution:

# !pip install git+https://github.com/google-research/perch.git@8cc4468afaac730e77d84ac447f0874f09d10a25

# add requirements.txt - installs jax 0.4.23
# !pip install git+https://github.com/google-research/perch.git@08eb6b62605a5db436f6ee36c27c7963ab831369

# jax 0.4.25 is now the selected version in requirements.txt
# Fails to install because no pyproject.toml - so the pyproject file is required.
# !pip install git+https://github.com/google-research/perch.git@0b696dd69144a7550fd2e3d697467250d22f382f

# Put the pyproject.toml back in place. Pip ignores the requirements.txt file:
# Requirement already satisfied: jax<0.5.0,>=0.4.16 in /usr/local/lib/python3.10/dist-packages (from chirp==0.1.0) (0.4.23)
# !pip install git+https://github.com/google-research/perch.git@6007556871af08347396e52355cac0ec7d3d5100

# Try installying directly from the requirements.txt file?
# Hashes can't be verified
# !pip install -r https://raw.githubusercontent.com/google-research/perch/6007556871af08347396e52355cac0ec7d3d5100/chirp/requirements.txt

# Try installying directly from thee requirements.txt file?
# Try again after exporting with --without-hashes
# Successfully picks up jax 0.4.25.
# However, pip is still doing some work to resolve dependencies for some reason:
# INFO: pip is looking at multiple versions of scenic to determine which version is compatible with other requirements. This could take a while.
# And fails with a resolution conflict:
# The conflict is caused by:
#     The user requested optax 0.2.0 (from git+https://github.com/google-deepmind/optax.git@81c50220ba2479d066ec762202e0f627a41e3fef)
#     flax 0.8.1 depends on optax
#     scenic 0.0.1 depends on optax 0.2.0 (from git+https://github.com/google-deepmind/optax.git@main)

!pip install -r https://raw.githubusercontent.com/google-research/perch/c3701fc5e1ff1567e9e6a01000054a089c211b7c/chirp/requirements.txt

@sdenton4
Copy link
Collaborator Author

sdenton4 commented Mar 5, 2024

Notes on what to try next:

  • Possible (partial) solution: Get scenic to release a fscking numbered version.

    • No numbered versions means we depend on them at head, only, which creates a volatile dependency situation.
  • Possible (partial) solution: Get rid of TF-IO.

    • It forces a strict version of Tensorflow, which reduces flexiblity. We currently only use it for resampling in TFDataset mapping function used during data ingest...
  • Possible (partial) solution: Use optional dependency groups for Jax training and inference sub-sets of the code.

    • This is pretty easy to implement, and could solve lots of problems.
    • Need to figure out how to get Colab pip install to pick up the right stuff.
    • A challenge is that sometimes version constraints (eg, when running poetry lock) are computed via the union of all optional deps, which doesn't solve anything...
  • Possible (partial) solution: Split perch train/inference libraries.

    • Most of our volatile dependencies are Jax libraries.
    • Expect continued random breakages in the train library, but more stability in inference.
  • [MOSTLY DEAD] Possible solution: Publish a requirements.txt file.

    • Seems not to work. After some messing around, got pip to read the requirements.txt file (it strongly prefers the toml file), but it still ended with an error instead of a consistent environment.
  • [MOSTLY IRRELEVANT] Possible (partial) solution: Get rid of poetry and just use pip?

    • I don’t think this actually solves the problem of shifting untested dependencies.
  • [DEAD] Possible solution: Publish a wheel?

    • This creates a binary for the package, but the dependencies are basically just handled by an embedded set of constraints in the METADATA, copying the toml constraints.

@sdenton4
Copy link
Collaborator Author

sdenton4 commented Mar 6, 2024

I investigated wheels... Same problem; it's a binary of the current repository, with a description of dependencies. We get a METADATA file which includes basically a version of the toml constratints:

Metadata-Version: 2.1
Name: chirp
Version: 0.1.0
Summary: A bioacoustics research project.
License: Apache 2.0
Author: Chirp Team
Author-email: chirp-bio@google.com
Requires-Python: >=3.10,<3.12
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: SPARQLWrapper (>=2.0.0,<3.0.0)
Requires-Dist: absl-py (>=1.4.0,<2.0.0)
Requires-Dist: apache-beam[gcp] (>=2.50.0,<3.0.0)
Requires-Dist: aqtp (>=0.5.0,<0.6.0)
Requires-Dist: chex (>=0.1.7,<0.2.0)
Requires-Dist: clu (>=0.0.9,<0.0.10)
Requires-Dist: etils[epath] (>=1.5.0,<2.0.0)
Requires-Dist: flax (>=0.8.1,<0.9.0)
Requires-Dist: imageio (>=2.5.0,<3.0.0)
Requires-Dist: jax (>=0.4.16,<0.5.0)
Requires-Dist: matplotlib (>=3.6.1,<4.0.0)
Requires-Dist: ml-collections (>=0.1.1,<0.2.0)
Requires-Dist: notebook (>=7.0.4,<8.0.0)
Requires-Dist: numba (>=0.57,<0.58)
Requires-Dist: optax (>=0.1.7)
Requires-Dist: pandas[gcp] (>=2.1.1,<3.0.0)
Requires-Dist: ratelimiter (>=1.2.0.post0,<2.0.0)
Requires-Dist: scenic @ git+https://github.com/google-research/scenic.git
Requires-Dist: tensorflow (==2.15.0)
Requires-Dist: tensorflow-datasets[dev] (>=4.9.3,<5.0.0)
Requires-Dist: tensorflow-hub (>=0.14.0,<0.15.0)
Requires-Dist: tensorflow-io (==0.36)

Pip (or whatever) then goes looking for versions which satisfy these constraints.

I'm moving more heavily towards splitting training code into a separate repository, as that will isolate the damage from Scenic. We might also be able to drop the tensorflow-io dependency, which complicates tensorflow by locking us to a specific version.

@sdenton4
Copy link
Collaborator Author

sdenton4 commented Mar 6, 2024

Ok, fun complication:
Colab relies pretty fundamentally on ipython, jupyter, zmq, oauth, and tornado for basic runtime stuff, and forcing specific versions can cause Extra Chaos:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 0.22.0 requires pandas<2.1.4,>=1.5.0, but you have pandas 2.2.1 which is incompatible.
google-colab 1.0.0 requires notebook==6.5.5, but you have notebook 7.1.1 which is incompatible.
google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.2.1 which is incompatible.
[...]
WARNING: Upgrading ipython, ipykernel, tornado, prompt-toolkit, pyzmq can
cause your runtime to repeatedly crash or behave in unexpected ways and is not
recommended. If your runtime won't connect or execute code, you can reset it
with "Disconnect and delete runtime" from the "Runtime" menu.

This pushes us back in the direction of /wanting/ to support loose dependency specification (so we can just use the Colab defaults as much as possible), which in turn means dealing with the problematic dependencies (TF-IO and Scenic).

@sdenton4
Copy link
Collaborator Author

Landed this pull request:
#624
which should fix things nicely for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant