Skip to content

hellpanderrr/pdf2xml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

The script converts journal articles in a PDF format into a XML file. It determines the most used font size all over the pages and considers it to be the main text. Then script makes Convex Hull of all text block with the main text capturing all the headers inbetween and puts them into a "< body >" tag.

Releases

No releases published

Packages

No packages published

Languages