Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pyarrow as hard dependency #581

Closed
6 of 12 tasks
kylebarron opened this issue Jul 24, 2024 · 2 comments
Closed
6 of 12 tasks

Remove pyarrow as hard dependency #581

kylebarron opened this issue Jul 24, 2024 · 2 comments

Comments

@kylebarron
Copy link
Member

kylebarron commented Jul 24, 2024

Is your feature request related to a problem? Please describe.

pyarrow is a massive, monolithic dependency. It can be hard to install in some places, and can't currently be installed in Pyodide. It's certainly a monumental effort to get it to work in Pyodide, but I think it would be valuable for lonboard to wean off of pyarrow.

The core enabling factor here is the Arrow PyCapsule Interface. It allows Python Arrow libraries to exchange Arrow data at the C level at no cost. This means that we can interface at no cost with any user who's already using pyarrow, but not be required to use pyarrow ourselves. I've been promoting its use throughout the Python Arrow ecosystem (apache/arrow#39195 (comment)), and hoping this grows into something as core to tabular data processing as the buffer protocol is to numpy.

As part of working to build the ecosystem, I created arro3, a new, very minimal Python Arrow implementation that wraps the Rust Arrow implementation.

I think that it should be possible to swap out pyarrow for arro3, which is about 1% of the normal pyarrow installation size.

It's also symbiotic for the ecosystem if Lonboard shows the benefits of modular Arrow libraries in Python.

Describe the solution you'd like

We'll keep pyarrow as a required dependency for GeoPandas/Pandas interop. pyarrow has implemented pyarrow.Table.from_pandas and that's not something I want to even think about replicating.

But aside from that, pretty much everything is doable in arro3 and geoarrow-rust.

CLI only:

Other notes:

  • Add numpy as direct dependency
@kylebarron
Copy link
Member Author

Primarily closed by #582

@kylebarron
Copy link
Member Author

This will be closed with #598 and #601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant