Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fmt (to 11.0.2) and spdlog (to 1.14.1), add those libraries to libcuml conda host dependencies #6071

Merged
merged 19 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions ci/build_cpp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ source rapids-date-string

export CMAKE_GENERATOR=Ninja

cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../
source ./ci/use_conda_packages_from_prs.sh

rapids-print-env

rapids-logger "Begin cpp build"
Expand Down
3 changes: 3 additions & 0 deletions ci/build_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ source rapids-date-string

export CMAKE_GENERATOR=Ninja

cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../
source ./ci/use_conda_packages_from_prs.sh

rapids-print-env

rapids-generate-version > ./VERSION
Expand Down
4 changes: 4 additions & 0 deletions ci/build_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
# everywhere except in the final wheel name.
PACKAGE_CUDA_SUFFIX="-${RAPIDS_PY_CUDA_SUFFIX}"

cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../
source ./ci/use_wheels_from_prs.sh
export PIP_CONSTRAINT=/tmp/constraints.txt

rapids-generate-version > ./VERSION

cd ${package_dir}
Expand Down
2 changes: 2 additions & 0 deletions ci/test_cpp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../

. /opt/conda/etc/profile.d/conda.sh

source ./ci/use_conda_packages_from_prs.sh

rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)

Expand Down
1 change: 1 addition & 0 deletions ci/test_notebooks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
set -euo pipefail

. /opt/conda/etc/profile.d/conda.sh
source ./ci/use_conda_packages_from_prs.sh

rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
Expand Down
1 change: 1 addition & 0 deletions ci/test_python_common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
set -euo pipefail

. /opt/conda/etc/profile.d/conda.sh
source ./ci/use_conda_packages_from_prs.sh
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specific to this line, but just choosing a place to start a threaded conversation.

All conda-based tests are failing with issues like this:

Fatal Python error: Aborted
Thread 0x00007f1903eea640 (most recent call first):
  File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 534 in read
  File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 567 in from_io
  File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 1160 in _thread_receiver
  File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 341 in run
  File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn
Current thread 0x00007f1904b0e740 (most recent call first):
  File "/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 190 in wrapper
...
Extension modules: cython.cimports.libc.math, Cython.Utils, numpy._core._multiarray_umath, numpy._core._multiarray_tests
...
.[gw13] node down: Not properly terminated

(build link)

I'll investigate this right now. I suspect it might be something related to symbol visibility (like rapidsai/cudf#15483 (comment)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah!! It looks like either I missed some dependencies in the test scripts, or something else is holding back the fmt / spdlog version?

I see that RAPIDS libraries are coming from rapidsai-nightly and not the CI artifacts, and that the older versions of fmt and spdlog are being used.

...
cudf                      24.10.00a369    cuda11_py310_240923_g9b4c4c721c_369    rapidsai-nightly
cuml                      24.10.00a70     cuda11_py310_240923_ga1d76d4da_70    file:///tmp/python_channel
...
distributed-ucxx          0.40.00a        py3.10_240923_ga07a40f_28    rapidsai-nightly
...
fmt                       10.2.1               h00ab1b0_0    conda-forge
...

(build link)

Will try replicating that conda solve locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing locally, I found that if I added constraints like fmt>=11.0.2, spdlog>=1.14.1 in the conda environment, the solver was able to solve with all of cuml's runtime dependencies... including the versions of cudf, rmm, ucxx, etc. produced from the PRs for rapidsai/build-planning#56.

I think that libcuml conda packages need host: dependencies on spdlog and fmt.

That library's code directly uses both of those:

#include <spdlog/sinks/stdout_color_sinks.h> // NOLINT
#include <spdlog/spdlog.h> // NOLINT

std::string msg_string = fmt::to_string(formatted);

And having those dependencies would prevent these situations during development where the solver is able to choose to fall back to older RAPIDS nightlies from within the same release (but supporting different fmt / spdlog).

cudf is doing the same thing: https://github.com/rapidsai/cudf/blob/9b4c4c721c399bae9e88733da79daa1a10644481/conda/recipes/libcudf/meta.yaml#L69-L71

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the right solution, thanks for investigating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up not being enough 😭

Look at https://github.com/rapidsai/cuml/actions/runs/10998530224/job/30541086357?pr=6071.

The environment is still solving with fmt==10.2.1, spdlog==1.12.0, and older RAPIDS 24.10.* nightlies that support those (build link)... even though it used the cuml / libcuml packages from CI here.

cudf                      24.10.00a371    cuda11_py310_240923_g625590686f_371    rapidsai-nightly
cuml                      24.10.00a71     cuda11_py310_240923_g5b812f0e9_71    file:///tmp/python_channel
...
fmt                       10.2.1               h00ab1b0_0    conda-forge
...
libcudf                   24.10.00a371    cuda11_240923_g625590686f_371    rapidsai-nightly
...
libcuml                   24.10.00a71     cuda11_240923_g5b812f0e9_71    file:///tmp/cpp_channel
libcumlprims              24.10.00a       cuda11_240923_gddfb9dd_8    rapidsai-nightly
...
rmm                       24.10.00a40     cuda11_py310_240923_g58039d3c_40    rapidsai-nightly
...
spdlog                    1.12.0               hd2e6256_2    conda-forge
...

I just pushed another commit explicitly pinning fmt and spdlog in the conda test environments, to force the solver to use the newer packages.

This should be safe to do... as soon as there are librmm packages published to rapids-nightly containing the changes from rapidsai/rmm#1678, every environment containing the latest librmm / rmm will have those same pins in it.

If that works, that's evidence that the fmt / spdlog update is safe and we can start moving forward.

I think both of those changes are worth merging here

  • adding the host: dependencies = to make CPM unnecessary for spdlog and avoid unnecessary vendoring and therefore file clobbering
  • explicitly pinning in test environments = make such fallbacks during development less likely the next time we do an update like this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 I see tests now passing and fmt==11.0.2 + spdlog==1.14.1 getting installed!

Build link: https://github.com/rapidsai/cuml/actions/runs/11000193616/job/30544094053?pr=6071

Another thing that I think happened here.... it's been several days since the cudf CI artifacts being pulled in here were produced, so the rapidsai-nightly channel's versions are now newer.

e.g. rapidsai/cudf#16806 produces cudf==24.10.00a364 but rapidsai-nightly now contains cudf==24.10.00a371. That would also explain why some of the CI artifacts are being ignored, in favor of those from rapidsai-nightly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here: #6071 (comment)


rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
Expand Down
4 changes: 4 additions & 0 deletions ci/test_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@

set -euo pipefail

cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../
source ./ci/use_wheels_from_prs.sh
export PIP_CONSTRAINT=/tmp/constraints.txt

mkdir -p ./dist
RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist
Expand Down
28 changes: 28 additions & 0 deletions ci/use_conda_packages_from_prs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash
# Copyright (c) 2024, NVIDIA CORPORATION.

LIBRMM_CHANNEL=$(rapids-get-pr-conda-artifact rmm 1678 cpp)
RMM_CHANNEL=$(rapids-get-pr-conda-artifact rmm 1678 python)

CUDF_CPP_CHANNEL=$(rapids-get-pr-conda-artifact cudf 16806 cpp)
CUDF_PYTHON_CHANNEL=$(rapids-get-pr-conda-artifact cudf 16806 python)

UCXX_CHANNEL=$(rapids-get-pr-conda-artifact ucxx 278 cpp)

LIBRAFT_CHANNEL=$(rapids-get-pr-conda-artifact raft 2433 cpp)
RAFT_CHANNEL=$(rapids-get-pr-conda-artifact raft 2433 python)

# NOTE: cloning private repos with rapids-get-pr-conda-artifact doesn't work,
# so need to explicitly set the SHA to use
CUMLPRIMS_CHANNEL=$(
RAPIDS_SHA=6f9f474 rapids-get-pr-conda-artifact cumlprims_mg 211 cpp 6f9f474
)

conda config --system --add channels "${LIBRMM_CHANNEL}"
conda config --system --add channels "${RMM_CHANNEL}"
conda config --system --add channels "${CUDF_CPP_CHANNEL}"
conda config --system --add channels "${CUDF_PYTHON_CHANNEL}"
conda config --system --add channels "${UCXX_CHANNEL}"
conda config --system --add channels "${LIBRAFT_CHANNEL}"
conda config --system --add channels "${RAFT_CHANNEL}"
conda config --system --add channels "${CUMLPRIMS_CHANNEL}"
59 changes: 59 additions & 0 deletions ci/use_wheels_from_prs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash
# Copyright (c) 2024, NVIDIA CORPORATION.

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

LIBRMM_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=rmm_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact rmm 1678 cpp
)
RMM_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=rmm_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact rmm 1678 python
)

UCXX_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=ucxx_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact ucxx 278 python
)
LIBUCXX_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=libucxx_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact ucxx 278 cpp
)
DISTRIBUTED_UCXX_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=distributed_ucxx_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact ucxx 278 python
)

CUDF_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=cudf_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact cudf 16806 python
)
LIBCUDF_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=libcudf_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact cudf 16806 cpp
)
PYLIBCUDF_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=pylibcudf_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact cudf 16806 python
)
DASK_CUDF_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=dask_cudf_${RAPIDS_PY_CUDA_SUFFIX} \
RAPIDS_PY_WHEEL_PURE=1 \
rapids-get-pr-wheel-artifact cudf 16806 python
)

RAFT_DASK_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=raft_dask_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact raft 2433 python
)
PYLIBRAFT_CHANNEL=$(
RAPIDS_PY_WHEEL_NAME=pylibraft_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact raft 2433 python
)

cat > /tmp/constraints.txt <<EOF
librmm-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBRMM_CHANNEL}/librmm_*.whl)
rmm-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${RMM_CHANNEL}/rmm_*.whl)
cudf-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${CUDF_CHANNEL}/cudf_*.whl)
libcudf-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBCUDF_CHANNEL}/libcudf_*.whl)
pylibcudf-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${PYLIBCUDF_CHANNEL}/pylibcudf_*.whl)
dask-cudf-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${DASK_CUDF_CHANNEL}/dask_cudf_*.whl)
ucxx-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${UCXX_CHANNEL}/ucxx_*.whl)
libucxx-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBUCXX_CHANNEL}/libucxx_*.whl)
distributed-ucxx-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${DISTRIBUTED_UCXX_CHANNEL}/distributed_ucxx_*.whl)
raft-dask-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${RAFT_DASK_CHANNEL}/raft_dask_*.whl)
pylibraft-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${PYLIBRAFT_CHANNEL}/pylibraft_*.whl)
EOF

export PIP_CONSTRAINT=/tmp/constraints.txt
2 changes: 2 additions & 0 deletions rapids_config.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ else()
)
endif()

set(rapids-cmake-repo jameslamb/rapids-cmake)
set(rapids-cmake-branch fmt-and-spdlog)
if(NOT EXISTS "${CMAKE_CURRENT_BINARY_DIR}/CUML_RAPIDS-${RAPIDS_VERSION_MAJOR_MINOR}.cmake")
file(
DOWNLOAD
Expand Down
Loading