Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare v0.2.0 #5

Merged
merged 1 commit into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,37 +92,37 @@ Represents a page in the document:

This node represent a paragraph, a heading or any text within the document.

- `category`: The classification of the text within the document.
- `category`: The [category](#category) of the text within the document, e.g. `heading`, `title`
- `content`: A string representing the textual content.
- `marks`: List of [marks](#marks) applied to the text, such as bold, italic, etc.
- `attributes`: Can contain metadata like the bounding box representing where this portion of text is located in the page.

### Category
Below are the various categories of text that may be found within a document:

**Category Type**
- `page-header`: Represents the header of the page.
- `footer`: Represents the footer of the page.
- `heading`: Any heading within the document.
- `figure`: Represents a figure or an image.
- `other`: Any other unclassified text.
- `appendix`: Text within an appendix.
- `keywords`: List of keywords.
Each block of text is assigned a _category_.

- `abstract`: The abstract of the document.
- `acknowledgments`: Section acknowledging contributors.
- `affiliation`: Author's institutional affiliation.
- `appendix`: Text within an appendix.
- `authors`: List of authors.
- `body`: Main body text of the document.
- `caption`: Caption associated with a figure or table.
- `toc`: Table of contents.
- `abstract`: The abstract of the document.
- `categories`: Categories or topics listed in the document.
- `figure`: Represents a figure or an image.
- `footer`: Represents the footer of the page.
- `footnote`: Text at the bottom of the page providing additional information.
- `body`: Main body text of the document.
- `formula`: Mathematical formula or equation.
- `general-terms`: General terms section.
- `heading`: Any heading within the document.
- `keywords`: List of keywords.
- `itemize-item`: Item in a list or bullet point.
- `title`: The title of the document.
- `other`: Any other unclassified text.
- `page-header`: Represents the header of the page.
- `reference`: References or citations within the document.
- `affiliation`: Author's institutional affiliation.
- `general-terms`: General terms section.
- `formula`: Mathematical formula or equation.
- `categories`: Categories or topics listed in the document.
- `table`: Represents a table.
- `authors`: List of authors.
- `title`: The title of the document.
- `toc`: Table of contents.

### Marks

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

setup(
name='parse-document-model',
version='0.1.0',
version='0.2.0',
description='Pydantic models for representing a text document as a hierarchical structure.',
long_description=long_description,
long_description_content_type='text/markdown',
Expand Down