diff --git a/README.md b/README.md index 232ab2c..101d569 100644 --- a/README.md +++ b/README.md @@ -92,37 +92,37 @@ Represents a page in the document: This node represent a paragraph, a heading or any text within the document. -- `category`: The classification of the text within the document. +- `category`: The [category](#category) of the text within the document, e.g. `heading`, `title` - `content`: A string representing the textual content. - `marks`: List of [marks](#marks) applied to the text, such as bold, italic, etc. - `attributes`: Can contain metadata like the bounding box representing where this portion of text is located in the page. ### Category -Below are the various categories of text that may be found within a document: -**Category Type** -- `page-header`: Represents the header of the page. -- `footer`: Represents the footer of the page. -- `heading`: Any heading within the document. -- `figure`: Represents a figure or an image. -- `other`: Any other unclassified text. -- `appendix`: Text within an appendix. -- `keywords`: List of keywords. +Each block of text is assigned a _category_. + +- `abstract`: The abstract of the document. - `acknowledgments`: Section acknowledging contributors. +- `affiliation`: Author's institutional affiliation. +- `appendix`: Text within an appendix. +- `authors`: List of authors. +- `body`: Main body text of the document. - `caption`: Caption associated with a figure or table. -- `toc`: Table of contents. -- `abstract`: The abstract of the document. +- `categories`: Categories or topics listed in the document. +- `figure`: Represents a figure or an image. +- `footer`: Represents the footer of the page. - `footnote`: Text at the bottom of the page providing additional information. -- `body`: Main body text of the document. +- `formula`: Mathematical formula or equation. +- `general-terms`: General terms section. +- `heading`: Any heading within the document. +- `keywords`: List of keywords. - `itemize-item`: Item in a list or bullet point. -- `title`: The title of the document. +- `other`: Any other unclassified text. +- `page-header`: Represents the header of the page. - `reference`: References or citations within the document. -- `affiliation`: Author's institutional affiliation. -- `general-terms`: General terms section. -- `formula`: Mathematical formula or equation. -- `categories`: Categories or topics listed in the document. - `table`: Represents a table. -- `authors`: List of authors. +- `title`: The title of the document. +- `toc`: Table of contents. ### Marks diff --git a/setup.py b/setup.py index 771b31c..c9f75aa 100644 --- a/setup.py +++ b/setup.py @@ -10,7 +10,7 @@ setup( name='parse-document-model', - version='0.1.0', + version='0.2.0', description='Pydantic models for representing a text document as a hierarchical structure.', long_description=long_description, long_description_content_type='text/markdown',