chunking support to `vectorize.table()` #142

ChuckHend · 2024-10-11T01:29:51Z

Provide ability to automatically chunk text in the input columns to the vectorize.table function, or provide a utility function (vectorize.chunk_table()?) that takes an input table, chunks the data in each row to multiple rows, and puts the output into a new table. I suppose vectorize.table could call vectorize.chunk_table under the hood as a convenience.

Use case is when there are giant documents, then user might want be able to retrieve just a subset of that document. Retrieving a subset of the document means that the chunk would hopefully be more relevant and specific than the entire document.

Langchain’s recursive_text_splitter for an example of this: https://python.langchain.com/docs/how_to/recursive_text_splitter/

The text was updated successfully, but these errors were encountered:

algora-pbc · 2024-10-17T12:15:09Z

💎 $200 bounty • Tembo

Steps to solve:

Start working: Comment /attempt #142 with your implementation plan
Submit work: Create a pull request including /claim #142 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to tembo-io/pg_vectorize!

Add a bounty • Share on socials

Attempt	Started (GMT+0)	Solution
🟢 @harshtech123	Oct 17, 2024, 1:07:27 PM	#161
🟢 @asr2003	Oct 17, 2024, 1:14:09 PM	#162

harshtech123 · 2024-10-17T13:07:23Z

/attempt #142
we can simply add chunking functionality to transform.py and modify the batch_transform function to use chunk_table as a preprocessing step.
i will make pr in some meaning while time and also we can test this by sending a large text data.

Options

Cancel my attempt

asr2003 · 2024-10-17T13:14:06Z

/attempt #142

Algora profile	Completed bounties	Tech	Active attempts	Options
@asr2003	6 bounties from 3 projects	Go, Scala, Rust & more		Cancel attempt

algora-pbc · 2024-10-24T11:48:04Z

💡 @asr2003 submitted a pull request that claims the bounty. You can visit your bounty board to reward.

ChuckHend added the enhancement New feature or request label Oct 11, 2024

FloorD added the hacktoberfest label Oct 14, 2024

algora-pbc bot added the 💎 Bounty label Oct 17, 2024

harshtech123 mentioned this issue Oct 17, 2024

Added chunking support to vectorize.table() #161

Closed

asr2003 linked a pull request Oct 24, 2024 that will close this issue

Add chunking support to vectorize.table() #162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunking support to `vectorize.table()` #142

chunking support to `vectorize.table()` #142

ChuckHend commented Oct 11, 2024

algora-pbc bot commented Oct 17, 2024 •

edited

Loading

harshtech123 commented Oct 17, 2024 •

edited by algora-pbc bot

Loading

asr2003 commented Oct 17, 2024 •

edited by algora-pbc bot

Loading

algora-pbc bot commented Oct 24, 2024

chunking support to vectorize.table() #142

chunking support to vectorize.table() #142

Comments

ChuckHend commented Oct 11, 2024

algora-pbc bot commented Oct 17, 2024 • edited Loading

💎 $200 bounty • Tembo

Steps to solve:

harshtech123 commented Oct 17, 2024 • edited by algora-pbc bot Loading

asr2003 commented Oct 17, 2024 • edited by algora-pbc bot Loading

algora-pbc bot commented Oct 24, 2024

chunking support to `vectorize.table()` #142

chunking support to `vectorize.table()` #142

algora-pbc bot commented Oct 17, 2024 •

edited

Loading

harshtech123 commented Oct 17, 2024 •

edited by algora-pbc bot

Loading

asr2003 commented Oct 17, 2024 •

edited by algora-pbc bot

Loading