Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Python 3.12 #276

Merged
merged 3 commits into from
Sep 9, 2024
Merged

Conversation

jameslamb
Copy link
Member

Description

Contributes to rapidsai/build-planning#40

This PR adds support for Python 3.12.

Notes for Reviewers

This is part of ongoing work to add Python 3.12 support across RAPIDS.
It temporarily introduces a build/test matrix including Python 3.12, from rapidsai/shared-workflows#213.

A follow-up PR will revert back to pointing at the branch-24.10 branch of shared-workflows once all
RAPIDS repos have added Python 3.12 support.

This will fail until all dependencies have been updates to Python 3.12

CI here is expected to fail until all of this project's upstream dependencies support Python 3.12.

This can be merged whenever all CI jobs are passing.

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Sep 5, 2024
@bdice bdice marked this pull request as ready for review September 6, 2024 12:21
@bdice bdice requested review from a team as code owners September 6, 2024 12:21
@bdice bdice requested a review from AyodeAwe September 6, 2024 12:21
@bdice
Copy link
Contributor

bdice commented Sep 6, 2024

@jameslamb @pentschev It looks like there are segfaults (from libucx?) in all the Python 3.12 tests (on multiple OSes, CUDA versions, amd64/arm64, both pip and conda). I don't have a good explanation for this. Can we investigate further?

Segfault logs
[239dc3a82a70:7206 :0:7206] Caught signal 11 (Segmentation fault: tkill(2) or tgkill(2) at address 0x1c26)
==== backtrace (tid:   7206) ====
 0  /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/../../../.././libucs.so.0(ucs_handle_error+0x2fd) [0x7f9178ce3c1d]
 1  /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/../../../.././libucs.so.0(+0x2de11) [0x7f9178ce3e11]
 2  /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/../../../.././libucs.so.0(+0x2dfda) [0x7f9178ce3fda]
 3  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f917ae57520]
 4  /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f917aeab9fc]
 5  /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f917ae57476]
 6  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f917ae57520]
 7  python(_PyArg_UnpackKeywords+0x47) [0x55a0471fc7e7]
 8  /opt/conda/envs/test/lib/python3.12/lib-dynload/_asyncio.cpython-312-x86_64-linux-gnu.so(+0x6f7d) [0x7f9179740f7d]
 9  /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/../../../../libucxx_python.so(_ZN4ucxx6python17future_set_resultEP7_objectS2_+0x27) [0x7f917958dee7]
10  /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/../../../../libucxx_python.so(_ZN4ucxx6python8Notifier18runRequestNotifierEv+0xcd) [0x7f917958f26d]
11  /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/libucxx.cpython-312-x86_64-linux-gnu.so(+0x3d214) [0x7f91795d6214]
12  python(PyObject_Vectorcall+0x4f) [0x55a0471aafcf]
13  python(+0x113f65) [0x55a04709cf65]
14  python(+0x2de852) [0x55a047267852]
15  /opt/conda/envs/test/lib/python3.12/lib-dynload/_asyncio.cpython-312-x86_64-linux-gnu.so(+0x8200) [0x7f9179742200]
16  /opt/conda/envs/test/lib/python3.12/lib-dynload/_asyncio.cpython-312-x86_64-linux-gnu.so(+0x89b4) [0x7f91797429b4]
17  python(_PyObject_MakeTpCall+0x2eb) [0x55a04719609b]
18  python(+0x1ca7b3) [0x55a0471537b3]
19  python(+0x222246) [0x55a0471ab246]
20  python(+0x114ca6) [0x55a04709dca6]
21  python(_PyObject_FastCallDictTstate+0x291) [0x55a047198cb1]
22  python(_PyObject_Call_Prepend+0x69) [0x55a0471c4819]
23  python(+0x311693) [0x55a04729a693]
24  python(_PyObject_MakeTpCall+0x2eb) [0x55a04719609b]
25  python(+0x113f65) [0x55a04709cf65]
26  python(_PyObject_FastCallDictTstate+0x291) [0x55a047198cb1]
27  python(_PyObject_Call_Prepend+0x69) [0x55a0471c4819]
28  python(+0x311693) [0x55a04729a693]
29  python(_PyObject_Call+0xb5) [0x55a0471c72e5]
30  python(+0x114ca6) [0x55a04709dca6]
31  python(_PyObject_FastCallDictTstate+0x291) [0x55a047198cb1]
32  python(_PyObject_Call_Prepend+0x69) [0x55a0471c4819]
33  python(+0x311693) [0x55a04729a693]
34  python(_PyObject_MakeTpCall+0x2eb) [0x55a04719609b]
35  python(+0x113f65) [0x55a04709cf65]
36  python(_PyObject_FastCallDictTstate+0x291) [0x55a047198cb1]
37  python(_PyObject_Call_Prepend+0x69) [0x55a0471c4819]
38  python(+0x311693) [0x55a04729a693]
39  python(_PyObject_MakeTpCall+0x2eb) [0x55a04719609b]
40  python(+0x113f65) [0x55a04709cf65]
41  python(_PyObject_FastCallDictTstate+0x291) [0x55a047198cb1]
42  python(_PyObject_Call_Prepend+0x69) [0x55a0471c4819]
43  python(+0x311693) [0x55a04729a693]
44  python(_PyObject_MakeTpCall+0x2eb) [0x55a04719609b]
45  python(+0x113f65) [0x55a04709cf65]
46  python(PyEval_EvalCode+0xae) [0x55a047250ece]
47  python(+0x2e3edd) [0x55a04726cedd]
48  python(+0x222246) [0x55a0471ab246]
49  python(PyObject_Vectorcall+0x4f) [0x55a0471aafcf]
50  python(+0x113f65) [0x55a04709cf65]
51  python(+0x2f94b8) [0x55a0472824b8]
52  python(Py_RunMain+0x3c5) [0x55a047282015]
53  python(Py_BytesMain+0x37) [0x55a0472397e7]
54  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f917ae3ed90]
55  /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f917ae3ee40]
56  python(+0x2b0681) [0x55a047239681]
=================================
timeout: the monitored command dumped core
ci/test_common.sh: line 125:  7205 Segmentation fault      UCXPY_PROGRESS_MODE=${PROGRESS_MODE} UCXPY_ENABLE_DELAYED_SUBMISSION=${ENABLE_DELAYED_SUBMISSION} UCXPY_ENABLE_PYTHON_FUTURE=${ENABLE_PYTHON_FUTURE} timeout 20m python -m pytest -vs python/ucxx/ucxx/_lib_async/tests/ --durations=50
python/ucxx/ucxx/_lib_async/tests/test_custom_send_recv.py::test_send_recv_cudf[<lambda>0] 

@jameslamb
Copy link
Member Author

jameslamb commented Sep 6, 2024

It looks like there are segfaults (from libucx?) in all the Python 3.12 tests (on multiple OSes, CUDA versions, amd64/arm64, both pip and conda).

Worth noting that last night, I restarted only the failing jobs here (which means only the Python 3.12 ones).

It's possible that some change happened in some dependency, between when those other jobs passed and these were re-run, which would cause these segfaults on even more jobs. To check that, I just restarted all CI here.

(edited): yes it looks like all Python 3.12 jobs, and only Python 3.12 jobs, are failing deterministically

@jameslamb jameslamb changed the title WIP: Add support for Python 3.12 Add support for Python 3.12 Sep 6, 2024
@jameslamb
Copy link
Member Author

jameslamb commented Sep 6, 2024

Ok, think I found the problem! Or at least, one problem.

Narrowed this down to the failing test case

code to do that (click me)

On an x86_64 machine with CUDA 12.2, this reproduces the segfault:

docker run \
    --rm \
    --gpus 1 \
    --env RAPIDS_BUILD_TYPE="pull-request" \
    --env RAPIDS_REF_NAME=pull-request/276 \
    --env RAPIDS_REPOSITORY=rapidsai/ucxx \
    --env RAPIDS_SHA=7b9f38614547684cd15efbb9c5bdff209ced45a3 \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/ci-conda:cuda12.5.1-ubuntu22.04-py3.12 \
    bash

. /opt/conda/etc/profile.d/conda.sh

rapids-dependency-file-generator \
  --output conda \
  --file-key test_python \
  --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test
conda activate test

CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)

rapids-mamba-retry install \
  --channel "${CPP_CHANNEL}" \
  libucxx ucxx

UCXPY_PROGRESS_MODE=thread \
UCXPY_ENABLE_PYTHON_FUTURE=1 \
    python -m pytest \
        -vs \
        'python/ucxx/ucxx/_lib_async/tests/test_custom_send_recv.py[<lambda>0]'

Ran that with gdb

UCXPY_PROGRESS_MODE=thread \
UCXPY_ENABLE_PYTHON_FUTURE=1 \
    gdb --args \
    python -m pytest \
        -vs \
        'python/ucxx/ucxx/_lib_async/tests/test_custom_send_recv.py[<lambda>0]'

# gdb commands:
#
# $ run
# $ bt
UCXPY_PROGRESS_MODE=thread \
UCXPY_ENABLE_PYTHON_FUTURE=1 \
    python -m pytest \
        -vs \
        'python/ucxx/ucxx/_lib_async/tests/test_custom_send_recv.py[<lambda>0]'

and was able to get this more informative trace:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x000055e5b8a1d7e7 in _PyArg_UnpackKeywords (args=0x7f14fb884ae0 <__pthread_keys>, nargs=1, kwargs=kwargs@entry=0x0, 
    kwnames=<unknown at remote 0x55e5b8e49058>, parser=parser@entry=0x7f14f9f9dc20 <_parser.17>, 
    minpos=minpos@entry=1, maxpos=1, minkw=0, buf=0x7ffc1a7fe860)
    at /usr/local/src/conda/python-3.12.5/Python/getargs.c:2374
warning: 2374   /usr/local/src/conda/python-3.12.5/Python/getargs.c: No such file or directory
(gdb) bt
#0  0x000055e5b8a1d7e7 in _PyArg_UnpackKeywords (args=0x7f14fb884ae0 <__pthread_keys>, nargs=1, 
    kwargs=kwargs@entry=0x0, kwnames=<unknown at remote 0x55e5b8e49058>, 
    parser=parser@entry=0x7f14f9f9dc20 <_parser.17>, minpos=minpos@entry=1, maxpos=1, minkw=0, buf=0x7ffc1a7fe860)
    at /usr/local/src/conda/python-3.12.5/Python/getargs.c:2374
#1  0x00007f14f9f93f7d in _asyncio_Future_set_result (self=0x7f1208f885e0, cls=0x55e5b8dd8920 <_Py_TrueStruct>, 
    args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.12.5/Modules/clinic/_asynciomodule.c.h:166
#2  0x00007f14f9de0ee7 in ucxx::python::future_set_result(_object*, _object*) ()
   from /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/../../../../libucxx_python.so
#3  0x00007f14f9de226d in ucxx::python::Notifier::runRequestNotifier() ()
   from /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/../../../../libucxx_python.so
#4  0x00007f14f9e29214 in ?? ()
   from /opt/conda/envs/test/lib/python3.12/site-packages/ucxx/_lib/libucxx.cpython-312-x86_64-linux-gnu.so

It looks to me like the signature of _asyncio_Future_set_result() changed, and ucxx is now not passing enough arguments to it.

Relevant code in ucxx:

PyObject* future_set_result(PyObject* future, PyObject* value)
{
PyObject* result = NULL;
PyGILState_STATE state = PyGILState_Ensure();
PyCFunction f = get_future_method("set_result");
result = f(future, value);

The signature for that call in asyncio changed from Python 3.11 to Python 3.12.

Python 3.11 (code link):

static PyObject *
future_set_result(FutureObj *fut, PyObject *res)

Python 3.12 (code link)

static PyObject *
future_set_result(asyncio_state *state, FutureObj *fut, PyObject *res)

Looks like that happened around https://github.com/python/cpython/pull/99122/files#diff-6bd9e39980b88a721d902bcd915bbb3f24762f7f253430c45e52c42a2c5afd01R578

I think ucxx will need to be updated to pass in that asyncio_state argument to such calls as well.

@pentschev
Copy link
Member

Thanks @bdice for reporting and @jameslamb for investigating. It looks like you're right, I'll work on a proper fix for this today.

@pentschev pentschev requested a review from a team as a code owner September 9, 2024 12:38
@pentschev
Copy link
Member

Thanks again @jameslamb for investigating, it looks like Future.set_result() C implementation previously used a PyCFunction signature which was safe to call as a regular C function, but in Python 3.12 it was moved to implement a PyCMethod signature which must be called with PyObject_CallMethod (or its variants). In any case, we should have always have used the PyObject_CallMethod API which is the safer way (and simpler for the use case in UCXX) to implement the Python call protocol, so switching to that should be safe for all Python 3.x versions.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests passing now, thanks @jameslamb !

@jameslamb
Copy link
Member Author

Ahhhh awesome!!! Thanks so much for the fix and the thorough explanation @pentschev , much appreciated!

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit bb30a22 into rapidsai:branch-0.40 Sep 9, 2024
67 checks passed
@jameslamb jameslamb deleted the python-3.12 branch September 9, 2024 14:14
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Sep 9, 2024
Follow-up to #1380.

Now that both `cudf` (rapidsai/cudf#16745) and `ucxx` (rapidsai/ucxx#276) have Python 3.12 wheels available, it should be possible to test `dask-cuda` against Python 3.12 in CI.

This proposes that.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1382
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants