-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chunking support to vectorize.table()
#142
Comments
💎 $200 bounty • TemboSteps to solve:
Thank you for contributing to tembo-io/pg_vectorize! Add a bounty • Share on socials
|
/attempt #142 Options |
/attempt #142
|
💡 @asr2003 submitted a pull request that claims the bounty. You can visit your bounty board to reward. |
Provide ability to automatically chunk text in the input columns to the vectorize.table function, or provide a utility function (
vectorize.chunk_table()
?) that takes an input table, chunks the data in each row to multiple rows, and puts the output into a new table. I suppose vectorize.table could call vectorize.chunk_table under the hood as a convenience.Use case is when there are giant documents, then user might want be able to retrieve just a subset of that document. Retrieving a subset of the document means that the chunk would hopefully be more relevant and specific than the entire document.
Langchain’s recursive_text_splitter for an example of this: https://python.langchain.com/docs/how_to/recursive_text_splitter/
The text was updated successfully, but these errors were encountered: