16091b3e6b
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. Main features: * Generates a searchable PDF/A file from a regular PDF * Places OCR text accurately below the image to ease copy / paste * Keeps the exact resolution of the original embedded images * When possible, inserts OCR information as a "lossless" operation without disrupting any other content * Optimizes PDF images, often producing files smaller than the input file * If requested deskews and/or cleans the image before performing OCR * Validates input and output files * Distributes work across all available CPU cores * Uses Tesseract OCR engine to recognize more than 100 languages * Scales properly to handle files with thousands of pages * Battle-tested on millions of PDFs WWW: https://github.com/jbarlow83/OCRmyPDF Reviewed by: 0mp, koobs Differential Revision: https://reviews.freebsd.org/D20927
19 lines
835 B
Text
19 lines
835 B
Text
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be
|
|
searched or copy-pasted.
|
|
|
|
Main features:
|
|
|
|
* Generates a searchable PDF/A file from a regular PDF
|
|
* Places OCR text accurately below the image to ease copy / paste
|
|
* Keeps the exact resolution of the original embedded images
|
|
* When possible, inserts OCR information as a "lossless" operation without
|
|
disrupting any other content
|
|
* Optimizes PDF images, often producing files smaller than the input file
|
|
* If requested deskews and/or cleans the image before performing OCR
|
|
* Validates input and output files
|
|
* Distributes work across all available CPU cores
|
|
* Uses Tesseract OCR engine to recognize more than 100 languages
|
|
* Scales properly to handle files with thousands of pages
|
|
* Battle-tested on millions of PDFs
|
|
|
|
WWW: https://github.com/jbarlow83/OCRmyPDF
|