diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml new file mode 100644 index 0000000..379a03f --- /dev/null +++ b/.github/workflows/deploy.yml @@ -0,0 +1,52 @@ +name: Deploy to GitHub Pages + +on: + push: + branches: + - main + paths: + - '.github/worksflows/deploy.yml' + - 'docs/**' + - 'mkdocs.yml' + - 'requirements.txt' + pull_request: + branches: + - main + paths: + - '.github/worksflows/deploy.yml' + - 'docs/**' + - 'mkdocs.yml' + - 'requirements.txt' + +jobs: + deploy: + runs-on: ubuntu-22.04 + permissions: + contents: write + concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + + steps: + - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + with: + fetch-depth: 0 + + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5.1.0 + with: + python-version: 3.12 + + - name: Install MkDocs & co + run: pip install -r requirements.txt + + - name: Build site + run: mkdocs build + + - name: Deploy + uses: peaceiris/actions-gh-pages@4f9cc6602d3f66b9c108549d475ec49e8ef4d45e # v4.0.0 + if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request' + with: + personal_token: ${{ secrets.PYFREETHREADING_DEPLOY_KEY }} + external_repository: py-free-threading/py-free-threading.github.io + publish_dir: ./site + user_name: 'github-actions[bot]' + user_email: 'github-actions[bot]@users.noreply.github.com' diff --git a/README.md b/README.md index cc6baac..c9539a4 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,3 @@ -## Improving Ecosystem Compatibility with Free-Threaded Python - -Quansight Labs is working with the Python runtime team at Meta and stakeholders -across the ecosystem to jumpstart work on converting the libraries that make up -the scientific Python and AI/ML stacks to work with the free-threaded (nogil) -build of CPython 3.13. Additionally, we will look at libraries like PyO3 -that are needed to interface with CPython from other languages. - -Our initial goal is to ensure libraries at the bottom of the stack like -NumPy, pybind11, and Cython are usable with free-threaded CPython. We will also -be updating packaging tools like meson-python needed to support building wheels -for free-threaded CPython. Once those tools and libraries are in a stable -enough state, we will begin looking at libraries higher in the stack. - ### What is this repository? This repository is for coordinating ecosystem-wide work. We will use @@ -20,381 +6,7 @@ dealing with issues that we find are common across many libraries. Issues that are specific to a project should be reported in that project's issue tracker. -### Building Free-Threaded CPython - -Currently we suggest building CPython from source using the latest version of -the CPython `main` branch. See [the -build -instructions](https://devguide.python.org/getting-started/setup-building/index.html) -in the CPython developer guide. You will need to install [needed third-party -dependencies](https://devguide.python.org/getting-started/setup-building/index.html#install-dependencies) -before building. To build the free-threaded version of CPython, pass -`--disable-gil` to the `configure` script: - -```bash -./configure --with-pydebug --disable-gil -``` - -If you will be switching Python versions often, it may make sense to -build CPython using [pyenv](https://github.com/pyenv/pyenv). The easiest way -to select a free-threaded build is to install a python version with a `t` at -the end of the name. For example: - -```bash -pyenv install --debug 3.13.0b3t -``` - -Will install a free-threaded build of Python 3.13.0b3 that includes debug -symbols. See the pyenv docs for more information about using pyenv and -switching between python versions. - -### Running Python With the GIL disabled - -Much of the material in this subsection is also covered in the Python 3.13 -[release -notes](https://docs.python.org/3.13/whatsnew/3.13.html#free-threaded-cpython). - -You can verify your build has the GIL disabled with the following incantation: - -```bash -python -VV -``` - -If you are using Python 3.13b1 or newer, you should see a message like: - -```bash -Python 3.13.0b1+ experimental free-threading build (heads/3.13:d93c4f9, May 21 2024, 10:54:14) [Clang 15.0.0 (clang-1500.1.0.2.5)] -``` - -Verify that the GIL is disabled at runtime with the following incantation: - -```bash -python -c "import sys; print(sys._is_gil_enabled())" -``` - -Extension modules need to explicitly indicate they support running with the GIL -disabled, otherwise a warning is printed and the GIL is re-enabled at -runtime after importing a module that does not support the GIL. In order to do so, -extension modules that support multi-phase initialization can specify the -[`Py_mod_gil`](https://docs.python.org/3.13/c-api/module.html#c.Py_mod_gil) -module slot like this (the slot has no effect in the non-free-threaded build): - -```c -static PyModuleDef_Slot module_slots[] = { - ... -#ifdef Py_GIL_DISABLED - {Py_mod_gil, Py_MOD_GIL_NOT_USED}, -#endif - {0, NULL} -}; -``` - -Extensions that use single-phase initialization need to call -[`PyUnstable_Module_SetGIL()`](https://docs.python.org/3.13/c-api/module.html#c.PyUnstable_Module_SetGIL) -in the module's initialization function: - -```c -PyMODINIT_FUNC -PyInit__module(void) -{ - PyObject *mod = PyModule_Create(&module); - if (mod == NULL) { - return NULL; - } - -#ifdef Py_GIL_DISABLED - PyUnstable_Module_SetGIL(mod, Py_MOD_GIL_NOT_USED); -#endif -} -``` - -To force Python to keep the GIL disabled even after importing a module -that does not support running without it, use the `PYTHON_GIL` environment -variable or the `-X gil` command line option: - -```bash -# these are equivalent -PYTHON_GIL=0 python -python -Xgil=0 -``` - -### Porting Extension Modules to Support Free-Threading - -Many Python extension modules are not thread-safe in the free-threaded build as -of mid-2024. Up until now, the GIL has added implicit locking around any -operation in Python or C that holds the GIL and the GIL must be explicitly -dropped before many thread-safety issues become problematic. Also, because of -the GIL, attempting to parallelize many workflows using the Python -[threading](https://docs.python.org/3/library/threading.html) module will not -produce any speedups, so thread-safety issues that are possible even with the -GIL are not hit often since users do not make use of threading as much as other -parallelization strategies. This means many codebases have threading bugs that -up-until-now have only been theoretical or present in niche use-cases. With -free-threading, many more users will want to use Python threads. - -This means we must analyze the codebases of extension modules to identify -thread-safety issues and make changes to thread-unsafe low-level code, -including C, C++, and Cython code exposed to Python. - -#### Suggested Plan of Attack - -Put priority on thread-safety issues surfaced by real-world testing. Especially -if there is a lot of low-level code (e.g. most of the C code in NumPy, and -Cython code inside a `with nogil` block) then it's likely that assumptions -about the GIL have introduced thread-safety issues in any real-world code. - -The CPython C API exposes the `Py_GIL_DISABLED` macro, which is defined in the -free-threaded build. You can use it to enable code that only runs under the -free-threaded build, isolating possibly performance-impacting changes to the -free-threaded build. - -We suggest focusing on safety over single-threaded performance. For example, if -adding locking to a global cache would be more trouble than just disabling it -for a small performance hit, consider doing the simpler thing and disabling the -cache in the free-threaded build. Single-threaded performance can always be -improved later, once you've established free-threaded support and hopefully -improved test coverage for multithreaded workflows. - -Definitely run your existing test suite with the GIL disabled, but unless your -tests make heavy use of the `threading` module, you will likely not hit many -issues, so also consider constructing multithreaded tests to expose bugs based -on workflows you want to support. Issues found in these tests are the issues -your users will most likely hit first. The -[`concurrent.futures.ThreadPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) -class is a lightweight way to create multithreaded tests where many threads -repeatedly call a function simultaneously. You can also use the `threading` -module directly. Adding a `threading.Barrier` before your test code is a good -way to synchronize workers and encourage a race condition. - -If you initialize extensions using the C API directly, plan to eventually -support the free-threaded build, and want to encourage people to test it, we -suggest marking that your extensions support disabling the GIL with the -[`Py_mod_gil` -slot](https://docs.python.org/3.13/c-api/module.html#c.Py_mod_gil) for -extensions using multi-phase initialization and -[`PyUnstable_Module_SetGIL()`](https://docs.python.org/3.13/c-api/module.html#c.PyUnstable_Module_SetGIL) -for extensions using single-phase intialization, even if you have not fully -completed porting and testing with the GIL disabled. Cython and `pybind11` do -not yet do this, but currently the plan is for tools wrapping CPython to handle -setting these flags for users, possibly with an override via a configuration -flag. - -We are generally assuming users will not do pathological things like resizing an -array while another thread is reading from or writing to it. Eventually we will -need to add locking around data structures to avoid issues like this, but in -this early stage of porting we are not focusing on adding locking on every -operation exposed to users that mutates data. - -##### Locking and Synchronization Primitives - -If your extension is written in C++, Rust, or another modern language that -exposes locking primitives in the standard library, you should use the locking -primitives provided by your language or framework to add locks when needed. - -C does have threading primitives in the standard library, but C compilers do -not uniformly provide `threads.h`. The CPython C API exposes -`PyThread_type_lock`, which provides a portable low-level locking primitive in -the C API. Internally, CPython uses `PyMutex`, a substantially more performant -and memory-efficient mutex, that may be exposed publicly in the future. If that -happens then uses of `PyThread_type_lock` can be replaced with -`PyMutex`. Consider hiding the locking and unlocking details behind a macro or -static inline function to abstract away the underlying locking implementation. - -#### Global state - -##### Global Settings - -* [`threading.local`](https://docs.python.org/3/library/threading.html#thread-local-data). -* [`Py_tss -API`](https://docs.python.org/3/c-api/init.html#thread-specific-storage-tss-api), -also see [PEP 539](https://peps.python.org/pep-0539). -* `thread_local` in C++ or platform-specific equivalent in C - -##### Caches - -* Disable -* Global locks for single-initializaton -* Per-cache locks if there might be contention -* Atomic initialization flag - -#### Shared state - -##### Adding thread-safety to data structures - -* Shared mutable state is unsafe unless updating the state requires acquiring a - lock and stopping all other reads and writes while the update happens. - * Easiest: make more things immutable - * Locking - * More sophisticated locking like RW locks. - -##### Dealing with thread-unsafe libraries - -* Add locking around library usage - - * Re-entrant: Can have a lock per low-level data structure - * Non-reentrant: Must have a global lock guarding calling the library - -#### Cython thread-safety - -If your extension is written in Cython, you can generally assume that -"Python-level" code that compiles to CPython C API operations on Python objects -is thread safe, but "C-level" code (e.g. code that will compile inside a `with -nogil` block) may have thread-safety issues. Note that not all code outside -`with nogil` blocks is thread safe. For example, a Python wrapper for a -thread-unsafe C library is thread-unsafe if the GIL is disabled unless there is -locking around uses of the thread-unsafe library. Another example: using -thread-unsafe C-level constructs like a global variable is also thread-unsafe -if the GIL is disabled. - -#### CPython C API uses - -In the free-threaded build it is possible for the reference count of an object -to change "underneath" a running thread when it is mutated by another -thread. This means that many APIs that assume reference counts cannot be -updated by another thread while it is running are no longer thread safe. In -particular, C code returning "borrowed" references to Python objects in mutable -containers like lists may introduce thread-safety issues. A borrowed reference -happens when a C API function does not increment the reference count of a -Python object before returning the object to the caller. "New" references are -safe to use until the owning thread releases the reference, as in non -free-threaded code. - -Most direct uses of the CPython C API are thread safe. There is no need to add -locking for scenarios that should be bugs in CPython. You can assume, for -example, that the initializer for a Python object can only be called by one -thread and the C-level implementation of a Python function can only be called on -one thread. Accessing the arguments of a Python function is thread safe no -matter what C API constructs are used and no matter whether the reference is -borrowed or owned because two threads can't simultaneously call the same -function with the same arguments from the same Python-level context. Of course -it's possible to implement argument parsing in a thread-unsafe manner using -thread-unsafe C or C++ constructs, but it's not possible to do so using the -CPython C API. - -##### Unsafe APIs returning borrowed references - -The `PyDict` and `PyList` APIs contain many functions returning borrowed -references to items in dicts and lists. Since these containers are mutable, -it's possible for another thread to delete the item from the container, leading -to the item being de-allocated while the borrowed reference is still -"alive". Even code like this: - -```C -PyObject *item = Py_NewRef(PyList_GetItem(list_object, 0)) -``` - -Is not thread safe, because in principle it's possible for the list item to be -de-allocated before `Py_NewRef` gets a chance to increment the reference count. - -For that reason, you should inspect Python C API code to look for patterns -where a borrowed reference is returned to a shared, mutable data structure, and -replace uses of APIs like `PyList_GetItem` with APIs exposed by the CPython C -API returning strong references like `PyList_GetItemRef`. Not all usages are -problematic (see above) and we do not currently suggest converting all usages of -possibly unsafe APIs returning borrowed references to return new reference. This -would introduce unnecessary reference count churn in situations that are -thread-safe by construction and also likely introduce new reference counting -bugs in C or C++ code using the C API directly. However, many usages *are* -unsafe, and maintaining a borrowed reference to an objects that could be exposed -to another thread is unsafe. - -##### Adopt `pythoncapi-compat` to use new C API functions - -Rather than maintaining compatibility shims to use functions added to the C API -for Python 3.13 like `PyList_GetItemRef` while maintaining compatibility with -earlier Python versions, we suggest adopting the -[`pythoncapi-compat`](https://github.com/python/pythoncapi-compat) project as a -build-time dependency. This is a header-only library that can be vendored as -e.g. a git submodule and included to expose shims for C API functions on older -versions of Python that do not have implementations. - -##### Some low-level APIs don't enforce locking - -Some low-level functions like `PyList_SET_ITEM` and `PyTuple_SET_ITEM` do not -do any internal locking and should only be used to build newly created -values. Do *not* use them to modify existing containers in the free-threaded -build. - -##### Limited API support - -The free-threaded build does not support the limited CPython C API. If you -currently use the limited API you will not be able to use it while shipping -binaries for the free-threaded build. This also means that code inside `#ifdef -Py_GIL_DISABLED` checks can use C API constructs outside the limited API if you -would like to do that, although these uses will need to be removed once the -free-threaded build gains support for compiling with the limited API. - -### Continuous Integration - -Currently the `setup-python` GitHub Action [does not -support](https://github.com/actions/setup-python/issues/771) installing a -free-threaded build. For now, the easiest way to get a free-threaded Python -build on a CI runner is with the `deadsnakes` Ubuntu PPA and the -`deadsnakes-action` GitHub Action: - -```yaml -jobs: - free-threaded: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@... - - uses: deadsnakes/action@... - with: - python-version: '3.13.0b2' - nogil: true -``` - -You should replace the ellipses with versions for the actions. If there is a -newer CPython 3.13 release available since this document was written or -updated, use that version instead. - -The [cibuildwheel](https://cibuildwheel.pypa.io/en/stable/) project has support -for building free-threaded wheels on all platforms. If your project releases -nightly wheels, we suggest configuring cibuildwheel to build nightly -free-threaded wheels. - -You will need to specify the following variables in the environment for the -cibuildwheel action: - -``` - - name: Build wheels - uses: pypa/cibuildwheel@... - env: - CIBW_PRERELEASE_PYTHONS: True - CIBW_FREE_THREADED_SUPPORT: True - CIBW_BUILD: cp313t-${{ matrix.buildplat }} - # TODO: remove along with installing build deps in - # cibw_before_build.sh when a released cython can build numpy - CIBW_BUILD_FRONTEND: "pip; args: --no-build-isolation" -``` - -As above, replace the ellipses with a `cibuildwheel` version. - -If your project depends on Cython, you will need to install a Cython nightly -wheel in the build, as the newest stable release of Cython cannot generate code -that will compile under the free-threaded build. You likely do not need to -disable pip build isolation if your project does not depend on Cython. - -The newest pip release does not support installing free-threaded wheels, you -will need to update to pip 24.1b1 or newer to install free-threaded wheels: - -``` -pip install -U --pre pip -``` - -You can install nightly wheels for Cython, NumPy, and SciPy using the following -command: - -``` -pip install -i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple cython -``` - -Note that nightly wheels may not be available on all platforms yet. Windows -wheels, in particular, are not currently available for NumPy or SciPy. +### Documentation -You will also likely need to manually pass `-Xgil=0` or set `PYTHON_GIL=0` in -your shell environment while running tests to ensure the GIL is actually -disabled during tests, at least until you can register that your extension -modules support disabling the GIL via `Py_mod_gil` and all of your runtime test -dependencies do the same. It is not currently possible to mark that a Cython -module supports running without the GIL. +You can find documentation for various free-threading topics +[on py-free-threading.github.io](https://py-free-threading.github.io). diff --git a/docs/ci.md b/docs/ci.md new file mode 100644 index 0000000..dd8b0b1 --- /dev/null +++ b/docs/ci.md @@ -0,0 +1,68 @@ +# Setting up CI + +Currently the `setup-python` GitHub Action [does not +support](https://github.com/actions/setup-python/issues/771) installing a +free-threaded build. For now, the easiest way to get a free-threaded Python +build on a CI runner is with the `deadsnakes` Ubuntu PPA and the +`deadsnakes-action` GitHub Action: + +```yaml +jobs: + free-threaded: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@... + - uses: deadsnakes/action@... + with: + python-version: '3.13.0b2' + nogil: true +``` + +You should replace the ellipses with versions for the actions. If there is a +newer CPython 3.13 release available since this document was written or +updated, use that version instead. + +[cibuildwheel](https://cibuildwheel.pypa.io/en/stable/) has support +for building free-threaded wheels on all platforms. If your project releases +nightly wheels, we suggest configuring `cibuildwheel` to build nightly +free-threaded wheels. + +You will need to specify the following variables in the environment for the +cibuildwheel action: + +```yaml + - name: Build wheels + uses: pypa/cibuildwheel@... + env: + CIBW_PRERELEASE_PYTHONS: True + CIBW_FREE_THREADED_SUPPORT: True + CIBW_BUILD: cp313t-${{ matrix.buildplat }} + # TODO: remove along with installing build deps in + # cibw_before_build.sh when a released cython can build numpy + CIBW_BUILD_FRONTEND: "pip; args: --no-build-isolation" +``` + +As above, replace the ellipses with a `cibuildwheel` version. + +If your project depends on Cython, you will need to install a Cython nightly +wheel in the build, as the newest stable release of Cython cannot generate code +that will compile under the free-threaded build. You likely do not need to +disable `pip`'s build isolation if your project does not depend on Cython. + +You can install nightly wheels for Cython and NumPy using the following +install command: + +``` +pip install -i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple cython numpy +``` + +Note that nightly wheels may not be available on all platforms yet. Windows +wheels, in particular, are not currently available for NumPy or projects that +depend on NumPy (e.g., SciPy). + +You will also likely need to manually pass `-Xgil=0` or set `PYTHON_GIL=0` in +your shell environment while running tests to ensure the GIL is actually +disabled during tests, at least until you can register that your extension +modules support disabling the GIL via `Py_mod_gil` and all of your runtime test +dependencies do the same. It is not currently possible to mark that a Cython +module supports running without the GIL. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..3e49bf6 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,45 @@ +--- +description: py-free-threading is a centralized collection of documentation and trackers around compatibility with free-threaded CPython for the Python open source ecosystem +--- + +# Introduction + +Free-threaded CPython is coming! :material-language-python: :thread: + +After the [acceptance by the Python Steering Council](https://discuss.python.org/t/a-steering-council-notice-about-pep-703-making-the-global-interpreter-lock-optional-in-cpython/30474) +of, and the [gradual rollout strategy](https://discuss.python.org/t/pep-703-making-the-global-interpreter-lock-optional-in-cpython-acceptance/37075) for, +[PEP 703 - Making the Global Interpreter Lock Optional in CPython](https://peps.python.org/pep-0703/), +a lot of work is happening both in CPython itself and across the Python ecosystem. + +This website aims to serve as a centralized resource both for Python package +maintainers and end users interested in supporting or experimenting with +free-threaded Python. An overview of the compatibility status of various Python +libraries is maintained in: + +- [Compatibility status tracking](tracking.md) + +This website also provide documentation and porting guidance - with a focus on +extension modules using the Python C API, because that's where most of the work +will be. The following resources should get you started: + +- [Installing free-threaded CPython](installing_cpython.md) +- [Running Python with the GIL disabled](running-gil-disabled.md) +- [Porting extension modules to support free-threading](porting.md) +- [Setting up CI](ci.md) + + + +## About this site + +Any contributions are very much welcome - please open issues or pull requests +[on this repo](https://github.com/Quansight-Labs/free-threaded-compatibility) +for anything that seems in scope for this site or for tracking issues related +to support for free-threaded Python across the ecosystem. + +This site is maintained primarily by Quansight Labs, where a team is working +together with the Python runtime team at Meta and stakeholders across the +ecosystem to jumpstart work on converting the libraries that make up the +scientific Python and AI/ML stacks to work with the free-threaded build of +CPython 3.13. Additionally, that effort will look at libraries like PyO3 that +are needed to interface with CPython from other languages. + diff --git a/docs/installing_cpython.md b/docs/installing_cpython.md new file mode 100644 index 0000000..2aa6f3b --- /dev/null +++ b/docs/installing_cpython.md @@ -0,0 +1,84 @@ +# Installing a free-threaded Python + +To install a free-threaded CPython interpreter, you can either use a pre-built +binary or build CPython from source. The former is quickest to get started +with. Building from source is not too difficult either though, and in case you +hit a bug that may involve CPython itself then you may want to build from +source. + + +## Binary install options + +There are a growing number of options to install a free-threaded interpreter, +from the python.org installers to Linux distro and Conda package managers. + +!!! note + + For any of these options, please check after the install succeeds that you + have a `pip` version that is recent enough (`>=24.1`), and upgrade it if + that isn't the case. Older `pip` versions will select wheels with the + `cp313` tag (binary-incompatible) rather than the `cp313t` tag. + + +### python.org installers + +The [python.org downloads page](https://www.python.org/download/pre-releases/) +provides macOS and Windows installers that have experimental support. Note +that you have to customize the install - e.g., for Windows there is a +_Download free-threaded binaries_ checkbox under "Advanced Options". +See also the [Using Python on Windows](https://docs.python.org/3.13/using/windows.html#installing-free-threaded-binaries) +section of the Python 3.13 docs. + + +### Linux distros + +=== "Fedora" + + Fedora ships a packaged version, which you can install with: + ``` + sudo dnf install python3.13-freethreading + ``` + This will install the interpreter at `/usr/bin/python3.13t`. + +=== "Ubuntu" + + For Ubuntu you can use the [deadsnakes PPA](https://launchpad.net/%7Edeadsnakes/+archive/ubuntu/ppa/+packages) + by adding it to your repositories and then installing `python3.13-nogil`: + ``` + sudo add-apt-repository ppa:deadsnakes + sudo apt-get update + sudo apt-get install python3.13-nogil + ``` + + +### Conda + +Conda packages are currently available for macOS arm64 and Linux x86-64 under a +label in the `ad-testing` (`ad` means "anaconda distribution") channel: + +``` +conda create -n nogil -c defaults -c ad-testing/label/py313_nogil python=3.13 +``` + + +## Building from source + +Currently we suggest building CPython from source using the latest version of +the CPython `main` branch. See +[the build instructions](https://devguide.python.org/getting-started/setup-building/index.html) +in the CPython developer guide. You will need to install [needed third-party +dependencies](https://devguide.python.org/getting-started/setup-building/index.html#install-dependencies) +before building. To build the free-threaded version of CPython, pass +`--disable-gil` to the `configure` script: + +```bash +./configure --with-pydebug --disable-gil +``` + +If you will be switching Python versions often, it may make sense to +build CPython using [pyenv](https://github.com/pyenv/pyenv). In order to +do that, you can use the following: + +```bash +pyenv install --debug 3.13t-dev +``` diff --git a/docs/porting.md b/docs/porting.md new file mode 100644 index 0000000..1ffa6ce --- /dev/null +++ b/docs/porting.md @@ -0,0 +1,444 @@ +# Porting Extension Modules to Support Free-Threading + +Many Python extension modules are not thread-safe in the free-threaded build as +of mid-2024. Up until now, the GIL has added implicit locking around any +operation in Python or C that holds the GIL, and the GIL must be explicitly +dropped before many thread-safety issues become problematic. Also, because of +the GIL, attempting to parallelize many workflows using the Python +[threading](https://docs.python.org/3/library/threading.html) module will not +produce any speedups, so thread-safety issues that are possible even with the +GIL are not hit often since users do not make use of threading as much as other +parallelization strategies. This means many codebases have threading bugs that +up-until-now have only been theoretical or present in niche use cases. With +free-threading, many more users will want to use Python threads. + +This means we must analyze the codebases of extension modules to identify +thread-safety issues and make changes to thread-unsafe low-level code, +including C, C++, and Cython code exposed to Python. + +### Declaring free-threaded support + +Extension modules need to explicitly indicate they support running with the GIL +disabled, otherwise a warning is printed and the GIL is re-enabled at runtime +after importing a module that does not support the GIL. + +!!! note + + Currently it is not possible for extensions written in Cython to declare + they support running without the GIL. Work is under way to add support + (see [cython#6242](https://github.com/cython/cython/pull/6242)). + +C++ extension modules making use of `pybind11` can easily declare support for +running with the GIL disabled via the +[`gil_not_used`](https://pybind11.readthedocs.io/en/stable/reference.html#_CPPv4N7module_23create_extension_moduleEPKcPKcP10module_def16mod_gil_not_used) +argument to `create_extension_module`. + +C or C++ extension modules using multi-phase initialization can specify the +[`Py_mod_gil`](https://docs.python.org/3.13/c-api/module.html#c.Py_mod_gil) +module slot like this: + +```c +static PyModuleDef_Slot module_slots[] = { + ... +#ifdef Py_GIL_DISABLED + {Py_mod_gil, Py_MOD_GIL_NOT_USED}, +#endif + {0, NULL} +}; +``` + +The `Py_mod_gil` slot has no effect in the non-free-threaded build. + +Extensions that use single-phase initialization need to call +[`PyUnstable_Module_SetGIL()`](https://docs.python.org/3.13/c-api/module.html#c.PyUnstable_Module_SetGIL) +in the module's initialization function: + +```c +PyMODINIT_FUNC +PyInit__module(void) +{ + PyObject *mod = PyModule_Create(&module); + if (mod == NULL) { + return NULL; + } + +#ifdef Py_GIL_DISABLED + PyUnstable_Module_SetGIL(mod, Py_MOD_GIL_NOT_USED); +#endif +} +``` + +If you publish binaries and have downstream libraries that depend on your +library, we suggest adding the `Py_mod_gil` slot and uploading nightly wheels +as soon as basic support for the free-threaded build is established in the +development branch. This will ease the work of libraries that depend on yours +to also add support for the free-threaded build. + + +## Suggested Plan of Attack + +Put priority on thread-safety issues surfaced by real-world testing. Run the +test suite for your project and fix any failures that occure only with the GIL +disabled. Some issues may be due to changes in Python 3.13 that are not +specific to the free-threaded build. + +Definitely run your existing test suite with the GIL disabled, but unless your +tests make heavy use of the `threading` module, you will likely not hit many +issues, so also consider constructing multithreaded tests to expose bugs based +on workflows you want to support. Issues found in these tests are the issues +your users will most likely hit first. The +[`concurrent.futures.ThreadPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) +class is a lightweight way to create multithreaded tests where many threads +repeatedly call a function simultaneously. You can also use the `threading` +module directly. Adding a `threading.Barrier` before your test code is a good +way to synchronize workers and encourage a race condition. + +Many C and C++ extensions assume the GIL serializes access to state shared +between threads, introducing the possibility of data races and race conditions +that were impossible before when the GIL is disabled. + +Cython code can also be thread-unsafe and exhibit undefined behavior due to +data races just like any other C or C++ code. However, code operating on Python +objects should not exhibit any low-level data races or undefined behavior due +to Python-level semantics. If you find such a case, it may be a Cython or +CPython bug and should be reported as such. That said, race conditions are +allowed in Python and therefore Cython as well, so you will still need to add +locking or synchronization where appropriate to ensure reproducible results +when running a multithreaded algorithm on shared mutable data. + +The CPython C API exposes the `Py_GIL_DISABLED` macro in the free-threaded +build. You can use it to enable low-level code that only runs under the +free-threaded build, isolating possibly performance-impacting changes to the +free-threaded build: + +```c +#ifdef Py_GIL_DISABLED +// free-threaded specific code goes here +#endif + +#ifndef Py_GIL_DISABLED +// code for gil-enabled builds goes here +#endif +``` + +We suggest focusing on safety over single-threaded performance. For example, if +adding a lock to a global cache would harm multithreaded scaling, and turning +off the cache implies a a small performance hit, consider doing the simpler +thing and disabling the cache in the free-threaded build. Single-threaded +performance can always be improved later, once you've established free-threaded +support and hopefully improved test coverage for multithreaded workflows. + +For NumPy, we are generally assuming users will not do pathological things like +resizing an array while another thread is reading from or writing to it and do +not explicitly account for this. Eventually we will need to add locking around +data structures to avoid races caused by issues like this, but in this early +stage of porting we are not planning to add locking on every operation exposed +to users that mutates data. Locking will likely need to be added in the future, +but that should be done carefully and with experience informed by real-world +multithreaded scaling. + +For your libraries, we suggest a similar approach for now. Focus on thread +safety issues that only occur with the GIL disabled. Any non-critical +pre-existing thread safety issues can be dealt with later once the +free-threaded build is used more. The goal for now should be to enable further +refinement and experimentation by fixing issues that prevent using the library +at all. + +### Locking and Synchronization Primitives + +If your extension is written in C++, Rust, or another modern language that +exposes locking primitives in the standard library, you should consider using +the locking primitives provided by your language or framework to add locks when +needed. + +For C code or C-like C++ code, the CPython 3.13 C API exposes +[`PyMutex`](https://docs.python.org/3.13/c-api/init.html#c.PyMutex), a +high-performance locking primitive that supports static allocation. As of +CPython 3.13, the mutex requires only one byte for storage, but future versions +of CPython may change that, so you should not rely on the size of `PyMutex` in +your code. + +## Global state + +Many CPython C extensions make strong assumptions about the GIL. For example, +before NumPy 2.1.0, the C code in NumPy made extensive use of C static global +variables for storing settings, state, and caches. With the GIL, it is possible +for Python threads to produce non-deterministic results from a calculation, but +it is not possible for two C threads to simultaneously see the state of the C +global variables, so no data races are possible. + +In free-threaded Python, global state like this is no longer safe against data +races and undefined behavior in C code. A cache of `PyObject`s stored +in a C global pointer array can be overwritten simultaneously by multiple +Python threads, leading to memory corruption and segfaults. + +### Converting to thread local state + +Often the easiest way to fix data races due to global state is to convert the +global state to thread local state. + +Python and Cython code can make use of +[`threading.local`](https://docs.python.org/3/library/threading.html#thread-local-data) +to declare a thread-local Python object. C and C++ code can also use the +[`Py_tss +API`](https://docs.python.org/3/c-api/init.html#thread-specific-storage-tss-api) +to store thread-local Python object references. [PEP +539](https://peps.python.org/pep-0539) has more details about the `Py_tss` API. + +Low-level C or C++ code can make use of the +[`thread_local`](https://en.cppreference.com/w/c/thread/thread_local) storage +specified by recent standard versions. Note that standardization of +thread-local storage in C has been slower than C++, so you may need to use +platform-specific definitions to declare variables with thread-local +storage. Also note that thread-local storage on MSVC has +[caveats](https://learn.microsoft.com/en-us/cpp/parallel/thread-local-storage-tls?view=msvc-170#rules-and-limitations), +and you should not use thread-local storage for anything besides statically +defined integers and pointers. + +NumPy has a [`NPY_TLS` +macro](https://github.com/numpy/numpy/blob/b77d2c6cc214cdcde567f356688ebddb2a5e7c8c/numpy/_core/include/numpy/npy_common.h#L116-L128) +in the `numpy/npy_common.h` header. While you can include the numpy header and +use `NPY_TLS` directly on NumPy 2.1 or newer, you can also add the definition +to your own codebase, along with some build configuration tests to test for the +correct definition to use. + +### Caches + +Global caches are also a common source of thread safety issues. For example, if +a function requires an expensive intermediate result that only needs to be +calculated once, many C extensions store the result in a global variable. This +can lead to data races and memory corruption if more than one thread +simultaneously tries to fill the cache. + +If the cache is not critical for performance, consider simply disabling the +cache in the free-threaded build: + +```c +static int *cache = NULL; + +int my_function_with_a_cache(void) { + int *my_cache = NULL; +#ifndef Py_GIL_DISABLED + if (cache == NULL) { + cache = get_expensive_result(); + } + my_cache = cache; +#else + my_cache = get_expensive_result(); +#endif; + // use the cache +} +``` + +If the cache is set up at import time during module initialization, then you +can assume that module initialization is guaranteed to only happen on one +thread, so you can initialize static globals safely during module +initialization. + +```c +static int *cache = NULL; + +PyMODINIT_FUNC +PyInit__module(void) +{ + PyObject *mod = PyModule_Create(&module); + if (mod == NULL) { + return NULL; + } + + // don't need to lock or do anything special + cache = setup_cache(); + + // do rest of initialization +} +``` + +If the cache is critical for performance, cannot be generated at import time, +but generally gets filled quickly after a program begins then you will need to +use a single-initialization API to ensure the cache is only ever initialized +once. In C++, use +[`std::once_flag`](https://en.cppreference.com/w/cpp/thread/once_flag) or +[`std::call_once`](https://en.cppreference.com/w/cpp/thread/call_once). + +C does not have an equivalent portable API for single initialization. If you +need that, take a look at [this NumPy +PR](https://github.com/numpy/numpy/pull/26780) for an example using atomic +operations and a global mutex. + +If the cache is in the form of a data container, then you can lock access to +the container, like in the following example: + +```c + +#ifdef Py_GIL_DISABLED +static PyMutex cache_lock = {0}; +#define LOCK() PyMutex_Lock(&cache_lock) +#define UNLOCK() PyMutex_Unlock(&cache_lock) +#else +#define LOCK() +#define UNLOCK() +#endif + +static int *cache = NULL; +static PyObject *global_table = NULL; + +int initialize_table(void) { + // called during module initialization + global_table = PyDict_New(); + return; +} + +int function_accessing_the_cache(void) { + LOCK(); + // use the cache + + UNLOCK(); +} + +``` + +### Dealing with thread-unsafe libraries + +Many C, C++, and Fortran libraries are not written in a thread-safe manner. It +is still possible to call these libraries from free-threaded Python, but +wrappers must add appropriate locks to prevent undefined behavior. + +There are two kinds of thread unsafe libraries: reentrant and non-reentrant. A +reentrant library generally will expose state as a struct that must be passed +to library functions. So long as the state struct is not shared between +threads, functions in the library can be safely executed simultaneously. + +Wrapping reentrant libraries requires adding locking whenever the state struct +is accessed. + +```c +typedef struct lib_state_struct { + low_level_library_state *state; + PyMutex lock; +} lib_state_struct; + +int call_library_function(lib_state_struct *lib_state) { + PyMutex_Lock(state->lock); + library_function(lib_state->state); + PyMutex_Unlock(state->lock) +} + +int call_another_library_function(lib_state_struct *lib_state) { + PyMutex_Lock(state->lock); + another_library_function(lib_state->state); + PyMutex_Unlock(state->lock) +} +``` + +With this setup, if two threads call `library_function` and +`another_library_functions` simultaneously, one thread will block until the +other thread finishes, preventing concurrent access to `lib_state->state`. + +Non-reentrant libraries provide an even weaker guarantee: threads cannot +call library functions simultaneously without causing undefined +behavior. Generally this is due to use of global static state in the +library. This means that non-reentrant libraries require a global lock: + +```c + +static PyMutex global_lock = {0}; + +int call_library_function(int *argument) { + PyMutex_Lock(global_lock); + library_function(argument); + PyMutex_Unlock(global_lock); +} +``` + +Any other wrapped function needs similar locking around each call into the +library. + +## Cython thread-safety + +If your extension is written in Cython, you can generally assume that +"Python-level" code that compiles to CPython C API operations on Python objects +is thread safe, but "C-level" code (e.g. code that will compile inside a `with +nogil` block) may have thread-safety issues. Note that not all code outside +`with nogil` blocks is thread safe. For example, a Python wrapper for a +thread-unsafe C library is thread-unsafe if the GIL is disabled unless there is +locking around uses of the thread-unsafe library. Another example: using +thread-unsafe C-level constructs like a global variable is also thread-unsafe +if the GIL is disabled. + +## CPython C API usage + +In the free-threaded build it is possible for the reference count of an object +to change "underneath" a running thread when it is mutated by another +thread. This means that many APIs that assume reference counts cannot be +updated by another thread while it is running are no longer thread safe. In +particular, C code returning "borrowed" references to Python objects in mutable +containers like lists may introduce thread-safety issues. A borrowed reference +happens when a C API function does not increment the reference count of a +Python object before returning the object to the caller. "New" references are +safe to use until the owning thread releases the reference, as in non +free-threaded code. + +Most direct uses of the CPython C API are thread safe. There is no need to add +locking for scenarios that should be bugs in CPython. You can assume, for +example, that the initializer for a Python object can only be called by one +thread and the C-level implementation of a Python function can only be called on +one thread. Accessing the arguments of a Python function is thread safe no +matter what C API constructs are used and no matter whether the reference is +borrowed or owned because two threads can't simultaneously call the same +function with the same arguments from the same Python-level context. Of course +it's possible to implement argument parsing in a thread-unsafe manner using +thread-unsafe C or C++ constructs, but it's not possible to do so using the +CPython C API. + +### Unsafe APIs returning borrowed references + +The `PyDict` and `PyList` APIs contain many functions returning borrowed +references to items in dicts and lists. Since these containers are mutable, +it's possible for another thread to delete the item from the container, leading +to the item being de-allocated while the borrowed reference is still +"alive". Even code like this: + +```C +PyObject *item = Py_NewRef(PyList_GetItem(list_object, 0)) +``` + +Is not thread safe, because in principle it's possible for the list item to be +de-allocated before `Py_NewRef` gets a chance to increment the reference count. + +For that reason, you should inspect Python C API code to look for patterns +where a borrowed reference is returned to a shared, mutable data structure, and +replace uses of APIs like `PyList_GetItem` with APIs exposed by the CPython C +API returning strong references like `PyList_GetItemRef`. Not all usages are +problematic (see above) and we do not currently suggest converting all usages of +possibly unsafe APIs returning borrowed references to return new reference. This +would introduce unnecessary reference count churn in situations that are +thread-safe by construction and also likely introduce new reference counting +bugs in C or C++ code using the C API directly. However, many usages *are* +unsafe, and maintaining a borrowed reference to an objects that could be exposed +to another thread is unsafe. + +### Adopt `pythoncapi-compat` to use new C API functions + +Rather than maintaining compatibility shims to use functions added to the C API +for Python 3.13 like `PyList_GetItemRef` while maintaining compatibility with +earlier Python versions, we suggest adopting the +[`pythoncapi-compat`](https://github.com/python/pythoncapi-compat) project as a +build-time dependency. This is a header-only library that can be vendored as +e.g. a git submodule and included to expose shims for C API functions on older +versions of Python that do not have implementations. + +### Some low-level APIs don't enforce locking + +Some low-level functions like `PyList_SET_ITEM` and `PyTuple_SET_ITEM` do not +do any internal locking and should only be used to build newly created +values. Do *not* use them to modify existing containers in the free-threaded +build. + +### Limited API support + +The free-threaded build does not support the limited CPython C API. If you +currently use the limited API you will not be able to use it while shipping +binaries for the free-threaded build. This also means that code inside `#ifdef +Py_GIL_DISABLED` checks can use C API constructs outside the limited API if you +would like to do that, although these uses will need to be removed once the +free-threaded build gains support for compiling with the limited API. diff --git a/docs/running-gil-disabled.md b/docs/running-gil-disabled.md new file mode 100644 index 0000000..b2ff6b6 --- /dev/null +++ b/docs/running-gil-disabled.md @@ -0,0 +1,35 @@ +# Running with the GIL disabled + +!!! info + + Most of the content on this page is also covered in the Python 3.13 + [release notes](https://docs.python.org/3.13/whatsnew/3.13.html#free-threaded-cpython). + +You can verify your build of CPython itself has the GIL disabled with the +following incantation: + +```bash +python -VV +``` + +If you are using Python 3.13b1 or newer, you should see a message like: + +```bash +Python 3.13.0b1+ experimental free-threading build (heads/3.13:d93c4f9, May 21 2024, 10:54:14) [Clang 15.0.0 (clang-1500.1.0.2.5)] +``` + +Verify that the GIL is disabled at runtime with the following incantation: + +```bash +python -c "import sys; print(sys._is_gil_enabled())" +``` + +To force Python to keep the GIL disabled even after importing a module +that does not support running without it, use the `PYTHON_GIL` environment +variable or the `-X gil` command line option: + +```bash +# these are equivalent +PYTHON_GIL=0 python +python -Xgil=0 +``` diff --git a/docs/tracking.md b/docs/tracking.md new file mode 100644 index 0000000..3409cd4 --- /dev/null +++ b/docs/tracking.md @@ -0,0 +1,46 @@ +# Compatibility status tracking + +This page tracks the status of packages for which we're aware of active work on +free-threaded support. It contains packages with extension modules, as well +as build tools and packages that needed code specifically to support +free-threading. Note that pure Python code works without changes by design, +hence this page does not aim to track pure Python packages. + +!!! tip + + It's early days for free-threaded support - bugs in CPython itself and in + widely used libraries with extension modules are being fixed every week. + It may be useful to use nightly wheels (when available) of packages + like `cython` or `numpy`, even if a first release is available on PyPI. + + + + + +| Project | Tested in CI | PyPI release | First version with support | Nightly wheels | Nightly link | +|:--------------|:----------------:|:---------------:|:-----------------------------:|:-----------------:|:---------------:| +| cibuildwheel | :material-check-bold: | :material-check-bold: | 2.19 | | | +| CMake | | :material-check-bold: | 3.30.0 [^cmake] | | | +| Cython | :material-check-bold: | | 3.1.0 | :simple-linux: :simple-apple: :material-microsoft-windows: | [:simple-anaconda:](https://anaconda.org/scientific-python-nightly-wheels/cython/) | +| joblib | :material-check-bold: | :material-check-bold: | 1.4.2 | | | +| Meson | | `--pre` | 1.5.0 [^meson] | | | +| meson-python | :material-check-bold: | :material-check-bold: | 0.16.0 | | | +| NumPy | :material-check-bold: | | 2.1.0 | :simple-linux: :simple-apple: | [:simple-anaconda:](https://anaconda.org/scientific-python-nightly-wheels/numpy/) | +| packaging | :material-check-bold: | :material-check-bold: | 24.0 | | | +| pandas | :material-check-bold: | | 3.0.0 | | | +| Pillow | :material-check-bold: || | | | | +| pip | :material-check-bold: | :material-check-bold: | 24.1 | | | +| pybind11 | :material-check-bold: | :material-check-bold: | 2.13 | | | +| PyWavelets | :material-check-bold: | | 1.7.0 | :simple-linux: :simple-apple: | [:simple-anaconda:](https://anaconda.org/scientific-python-nightly-wheels/pywavelets/) | +| scikit-build-core | :material-check-bold: | :material-check-bold: | 0.9.5 | | | +| scikit-learn | :material-check-bold: | | 1.6.0 | :simple-linux: | [:simple-anaconda:](https://anaconda.org/scientific-python-nightly-wheels/scikit-learn/) | +| SciPy | :material-check-bold: | | 1.15.0 | :simple-linux: :simple-apple: | [:simple-anaconda:](https://anaconda.org/scientific-python-nightly-wheels/scipy/) | +| setuptools | :material-check-bold: | :material-check-bold: | 69.5.0 | | | + + + +[^cmake]: + Windows isn't correctly handled yet in CMake 3.30.0, see [cmake#26016](https://gitlab.kitware.com/cmake/cmake/-/issues/26016) + +[^meson]: + Meson 1.5.0 is only needed for Windows support, older versions work fine for all other platforms diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..7345065 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,54 @@ +site_name: py-free-threading +repo_url: https://github.com/Quansight-Labs/free-threaded-compatibility +copyright: Copyright © 2024- Quansight Labs & open source contributors + +theme: + name: material + features: + - header.autohide + palette: + # Palette toggle for dark mode + - scheme: slate + primary: blue grey + toggle: + icon: material/brightness-4 + name: Switch to light mode + + # Palette toggle for light mode + - scheme: default + primary: blue grey + toggle: + icon: material/brightness-7 + name: Switch to dark mode + +nav: + - 'index.md' + - 'tracking.md' + - 'installing_cpython.md' + - 'running-gil-disabled.md' + - 'porting.md' + - 'ci.md' + +plugins: + - search + - git-revision-date-localized: + enable_creation_date: true + +markdown_extensions: + - admonition + - footnotes + - attr_list + - pymdownx.details + - pymdownx.superfences + - pymdownx.tabbed: + alternate_style: true + - pymdownx.emoji: + emoji_index: !!python/name:material.extensions.emoji.twemoji + emoji_generator: !!python/name:material.extensions.emoji.to_svg + +extra: + social: + - icon: fontawesome/brands/github + link: https://github.com/Quansight-Labs/free-threaded-compatibility + - icon: material/license + link: https://github.com/Quansight-Labs/free-threaded-compatibility/blob/main/LICENSE diff --git a/pyenv/README.md b/pyenv/README.md deleted file mode 100644 index e69de29..0000000 diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..8fdf8ba --- /dev/null +++ b/requirements.txt @@ -0,0 +1,2 @@ +mkdocs-material +mkdocs-git-revision-date-localized-plugin