The script converts journal articles in a PDF format into a XML file. It determines the most used font size all over the pages and considers it to be the main text. Then script makes Convex Hull of all text block with the main text capturing all the headers inbetween and puts them into a "< body >" tag.
-
Notifications
You must be signed in to change notification settings - Fork 5
hellpanderrr/pdf2xml
About
pdf2xml converter using pdfMiner
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published