Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunking support to vectorize.table() #142

Open
ChuckHend opened this issue Oct 11, 2024 · 4 comments · May be fixed by #162
Open

chunking support to vectorize.table() #142

ChuckHend opened this issue Oct 11, 2024 · 4 comments · May be fixed by #162
Labels

Comments

@ChuckHend
Copy link
Member

Provide ability to automatically chunk text in the input columns to the vectorize.table function, or provide a utility function (vectorize.chunk_table()?) that takes an input table, chunks the data in each row to multiple rows, and puts the output into a new table. I suppose vectorize.table could call vectorize.chunk_table under the hood as a convenience.

Use case is when there are giant documents, then user might want be able to retrieve just a subset of that document. Retrieving a subset of the document means that the chunk would hopefully be more relevant and specific than the entire document.

Langchain’s recursive_text_splitter for an example of this: https://python.langchain.com/docs/how_to/recursive_text_splitter/

@ChuckHend ChuckHend added the enhancement New feature or request label Oct 11, 2024
Copy link

algora-pbc bot commented Oct 17, 2024

💎 $200 bounty • Tembo

Steps to solve:

  1. Start working: Comment /attempt #142 with your implementation plan
  2. Submit work: Create a pull request including /claim #142 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to tembo-io/pg_vectorize!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @harshtech123 Oct 17, 2024, 1:07:27 PM #161
🟢 @asr2003 Oct 17, 2024, 1:14:09 PM #162

@harshtech123
Copy link

harshtech123 commented Oct 17, 2024

/attempt #142
we can simply add chunking functionality to transform.py and modify the batch_transform function to use chunk_table as a preprocessing step.
i will make pr in some meaning while time and also we can test this by sending a large text data.

@asr2003
Copy link
Contributor

asr2003 commented Oct 17, 2024

/attempt #142

Algora profile Completed bounties Tech Active attempts Options
@asr2003 6 bounties from 3 projects
Go, Scala,
Rust & more
Cancel attempt

Copy link

algora-pbc bot commented Oct 24, 2024

💡 @asr2003 submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants