Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom into_arrow and from_arrow implementations for extension types #1167

Open
a10y opened this issue Oct 30, 2024 · 0 comments
Open

Add custom into_arrow and from_arrow implementations for extension types #1167

a10y opened this issue Oct 30, 2024 · 0 comments

Comments

@a10y
Copy link
Contributor

a10y commented Oct 30, 2024

Our extension type system allows us to model any type that can be represented by another DType plus some opaque metadata.

We currently only have one Encoding, vortex.extension, that is reused for all extension-typed arrays. This falls short for the purposes of encoding extension types that we want to decode into specific Arrow types.

For example, Arrow has a GeoArrow extension that is quite popular. We should enable the import of GeoArrow datasets into vortex, and then be able to send GeoArrow back out.

This will require us to make a few decisions

  • How do we want to make a vtable/function registry available at runtime so that we can specialize into_arrow for ExtensionArray. This will require us to have some sort of global context that is writable by extension authors and accessible from compute functions
  • Arrow extension arrays like GeoArrow require additional metadata, which is attached to a Field schema and not just a DataType or ArrayRef. I don't think there's currently a place in our API to return that information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant