Skip to content

Commit

Permalink
Prepare v0.2.0 (#5)
Browse files Browse the repository at this point in the history
  • Loading branch information
avvertix authored Sep 24, 2024
1 parent 81649c3 commit 6d14111
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 20 deletions.
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,37 +92,37 @@ Represents a page in the document:

This node represent a paragraph, a heading or any text within the document.

- `category`: The classification of the text within the document.
- `category`: The [category](#category) of the text within the document, e.g. `heading`, `title`
- `content`: A string representing the textual content.
- `marks`: List of [marks](#marks) applied to the text, such as bold, italic, etc.
- `attributes`: Can contain metadata like the bounding box representing where this portion of text is located in the page.

### Category
Below are the various categories of text that may be found within a document:

**Category Type**
- `page-header`: Represents the header of the page.
- `footer`: Represents the footer of the page.
- `heading`: Any heading within the document.
- `figure`: Represents a figure or an image.
- `other`: Any other unclassified text.
- `appendix`: Text within an appendix.
- `keywords`: List of keywords.
Each block of text is assigned a _category_.

- `abstract`: The abstract of the document.
- `acknowledgments`: Section acknowledging contributors.
- `affiliation`: Author's institutional affiliation.
- `appendix`: Text within an appendix.
- `authors`: List of authors.
- `body`: Main body text of the document.
- `caption`: Caption associated with a figure or table.
- `toc`: Table of contents.
- `abstract`: The abstract of the document.
- `categories`: Categories or topics listed in the document.
- `figure`: Represents a figure or an image.
- `footer`: Represents the footer of the page.
- `footnote`: Text at the bottom of the page providing additional information.
- `body`: Main body text of the document.
- `formula`: Mathematical formula or equation.
- `general-terms`: General terms section.
- `heading`: Any heading within the document.
- `keywords`: List of keywords.
- `itemize-item`: Item in a list or bullet point.
- `title`: The title of the document.
- `other`: Any other unclassified text.
- `page-header`: Represents the header of the page.
- `reference`: References or citations within the document.
- `affiliation`: Author's institutional affiliation.
- `general-terms`: General terms section.
- `formula`: Mathematical formula or equation.
- `categories`: Categories or topics listed in the document.
- `table`: Represents a table.
- `authors`: List of authors.
- `title`: The title of the document.
- `toc`: Table of contents.

### Marks

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

setup(
name='parse-document-model',
version='0.1.0',
version='0.2.0',
description='Pydantic models for representing a text document as a hierarchical structure.',
long_description=long_description,
long_description_content_type='text/markdown',
Expand Down

0 comments on commit 6d14111

Please sign in to comment.