Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup of documentation, to be generated with sphinx #34

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
37 changes: 37 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'pdf2doi'
copyright = '2024, Michele Cotrufo'
author = 'Michele Cotrufo'
release = '1.6'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx_autodoc_typehints',
]

templates_path = ['_templates']
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']



# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

# html_theme = 'alabaster'
html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']

import sys, os
sys.path.insert(0, os.path.abspath('../pdf2doi'))
22 changes: 22 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. pdf2doi documentation master file, created by
sphinx-quickstart on Sat Jul 13 15:27:07 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to pdf2doi's documentation!
===================================

.. toctree::
:maxdepth: 2
:caption: Contents:

source/modules



.. Indices and tables
.. ==================

.. * :ref:`genindex`
.. * :ref:`modindex`
.. * :ref:`search`
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
7 changes: 7 additions & 0 deletions docs/source/modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pdf2doi
=======

.. toctree::
:maxdepth: 4

pdf2doi
53 changes: 53 additions & 0 deletions docs/source/pdf2doi.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
pdf2doi package
===============

Submodules
----------

pdf2doi.config module
---------------------

.. automodule:: pdf2doi.config
:members:
:undoc-members:
:show-inheritance:

pdf2doi.find\_title\_via\_pymupdf module
----------------------------------------

.. automodule:: pdf2doi.find_title_via_pymupdf
:members:
:undoc-members:
:show-inheritance:

pdf2doi.finders module
----------------------

.. automodule:: pdf2doi.finders
:members:
:undoc-members:
:show-inheritance:

pdf2doi.main module
-------------------

.. automodule:: pdf2doi.main
:members:
:undoc-members:
:show-inheritance:

pdf2doi.patterns module
-----------------------

.. automodule:: pdf2doi.patterns
:members:
:undoc-members:
:show-inheritance:

pdf2doi.utils\_registry module
------------------------------

.. automodule:: pdf2doi.utils_registry
:members:
:undoc-members:
:show-inheritance:
2 changes: 2 additions & 0 deletions pdf2doi/find_title_via_pymupdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ def fonts(doc, granularity=False):

def font_tags(font_counts, styles):
"""Returns dictionary with font sizes as keys and tags as value.

:param font_counts: (font_size, count) for all fonts occuring in document
:type font_counts: list
:param styles: all styles found in the document
Expand Down Expand Up @@ -67,6 +68,7 @@ def font_tags(font_counts, styles):

def headers_para(doc, size_tag):
"""Scrapes headers & paragraphs from PDF and return texts with element tags.

:param doc: PDF document to iterate through
:type doc: <class 'fitz.fitz.Document'>
:param size_tag: textual element tags for each size
Expand Down
34 changes: 19 additions & 15 deletions pdf2doi/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,16 @@
# import pyperclip

def pdf2doi(target):
''' This is the main routine of the library. When the library is used as a command-line tool (via the entry-point "pdf2doi") the input arguments
r''' This is the main routine of the library. When the library is used as a command-line tool (via the entry-point "pdf2doi") the input arguments
are collected, validated and sent to this function (see the function main() below).
The function tries to extract the DOI (or other identifiers) of the publication in the pdf files whose path is specified in the input variable target.
If target contains the valid path of a folder, the function tries to extract the DOI/identifer of all pdf files in the folder.
It returns a dictionary (or a list of dictionaries) containing info(s) about the file(s) examined, or None if an error occurred.

Example:

.. code-block:: python

import pdf2doi
path = r"Path\to\folder"
result = pdf2doi.pdf2doi(path)
Expand All @@ -35,12 +38,12 @@ def pdf2doi(target):
The output is a single dictionary if target is a file, or a list of dictionaries if target is a directory,
each element of the list describing one file. Each dictionary has the following keys

result['identifier'] = DOI or other identifier (or None if nothing is found)
result['identifier_type'] = string specifying the type of identifier (e.g. 'doi' or 'arxiv')
result['validation_info'] = Additional info on the paper. If config.get('webvalidation') = True, then result['validation_info']
will typically contain raw bibtex data for this paper. Otherwise it will just contain True
result['path'] = path of the pdf file
result['method'] = method used to find the identifier
- result['identifier'] = DOI or other identifier (or None if nothing is found)
- result['identifier_type'] = string specifying the type of identifier (e.g. 'doi' or 'arxiv')
- result['validation_info'] = Additional info on the paper. If config.get('webvalidation') = True, then result['validation_info']
will typically contain raw bibtex data for this paper. Otherwise it will just contain True
- result['path'] = path of the pdf file
- result['method'] = method used to find the identifier

'''

Expand Down Expand Up @@ -119,12 +122,12 @@ def pdf2doi_singlefile(file):
result, dictionary
The output is a single dictionary with the following keys

result['identifier'] = DOI or other identifier (or None if nothing is found)
result['identifier_type'] = string specifying the type of identifier (e.g. 'doi' or 'arxiv')
result['validation_info'] = Additional info on the paper. If config.get('webvalidation') = True, then result['validation_info']
will typically contain raw bibtex data for this paper. Otherwise it will just contain True
result['path'] = path of the pdf file
result['method'] = method used to find the identifier
- result['identifier'] = DOI or other identifier (or None if nothing is found)
- result['identifier_type'] = string specifying the type of identifier (e.g. 'doi' or 'arxiv')
- result['validation_info'] = Additional info on the paper. If config.get('webvalidation') = True, then result['validation_info']
will typically contain raw bibtex data for this paper. Otherwise it will just contain True
- result['path'] = path of the pdf file
- result['method'] = method used to find the identifier

"""

Expand Down Expand Up @@ -193,8 +196,9 @@ def __find_doi(file: io.IOBase) -> dict:


def save_identifiers(filename_identifiers, results, clipboard=False):
''' Write all identifiers contained in the input list 'results' into a text file with a path specified by filename_identifiers (if filename_identifiers is a
valid string) and/or into the clipboard (if clipboard = True).
'''
Write all identifiers contained in the input list 'results' into a text file with a path specified by filename_identifiers (if filename_identifiers is a
valid string) and/or into the clipboard (if clipboard = True).

Parameters
----------
Expand Down