Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add page number to Memory Records. #717

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alkampfergit
Copy link
Contributor

@alkampfergit alkampfergit commented Jul 25, 2024

Motivation and Context (Why the change? What's the scenario?)

I'd like to solve discussion #669 the ability to insert page number inside Memory Record in the Section property.

High level description (Approach, Design)

Introduced a new TextChunker2 that allows for specification of a series of line with a tag, then it will split based on token number, but instead of creating simple lines in output it will preserve the tag of the original line. I've chosen a tag object so we can add more info if needed, in this PR I've added the plain page number.

Needs to check if the overall implant of the PR is ok, then I'll finish implementation and testing of TextChunk2. Actually we can even validate that TextChunk2 is the only one chunker (it works exactly as TextChunk with the only addition of keeping tracks of a tag associated to a piece of text).

@alkampfergit alkampfergit force-pushed the feature/pages-in-memoryrecords branch 2 times, most recently from 47ee726 to 045cc1b Compare July 26, 2024 10:05
@alkampfergit alkampfergit marked this pull request as ready for review July 26, 2024 10:17
@alkampfergit alkampfergit requested a review from dluc as a code owner July 26, 2024 10:17
@alkampfergit alkampfergit changed the title Spike - add page number. Ask for OK to proceed. Add page number to Memory Records. Jul 26, 2024
@alkampfergit alkampfergit force-pushed the feature/pages-in-memoryrecords branch 2 times, most recently from 71bd5ac to 2e331c1 Compare July 29, 2024 07:05
This branch introduces a new TextPartitioning Handler
that supports pages numbering in pdf.
@alkampfergit alkampfergit force-pushed the feature/pages-in-memoryrecords branch from 2e331c1 to ec6cfdd Compare July 31, 2024 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants