Skip to content

Ra-Na/pdfsandwich-without-unpaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdfsandwich-without-unpaper

Pdfsandwich is a wrapper that embeds OCR-fonts in scanned PDFs. It uses (currently version 0.1.7) unpaper, which deskews and trims the scanned pages. Unfortunately, unpaper appears to be discontinued, the last update was 2015 . It also occasionally cuts off parts of the text. Luckily, there is an alternative: scantailor.

The shell script in this repo circumvents unpaper, using pdfimages, scantailor and imagemagick to prepare a PDF that is then OCRed by pdfsandwich. It just takes the filename as only argument, like so:

./ocrthispdf testpage.pdf

About

pdfsandwich without unpaper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages