Update ruby-pdf-reader to 2.4.0.
2.4.0 (21st November 2019)
- Optimise overlapping characters code introduced in 2.3.0. Text extraction
of pages with thousands of characters is still slower than it was in
2.2.1, but it might tolerable for now.
See https://github.com/yob/pdf-reader/pull/308 for details.
- Implement very basic font substitution for Type1 and TrueType fonts that
aren't embedded
- Remove PDF::Hash class. It's been deprecated since 2010, and it's hard to
believe anyone is still using it.
- Several small bug fixes
2.3.0 (7th November 2019)
- Text extraction now makes an effort to skip duplicate characters that
overlap, a common approach used for a fake "bold" effect, This will make
text extraction a bit slower - if that turns out to be an issue I'll look
into further optimisations or provide a toggle to turn it off
- Several small bug fixes
v2.2.0 (18th December 2018)
- Support additional XRef Stream variants (thanks Stefan Wienert)
- Add frozen_strings pragma to reduce object allocations on ruby 2.3+
- various bug fixes
pkgsrc change:
* Add missing ALTERNATIVES forgot from 2015.
v2.1.0 (15th Februar 2018)
- Support extra encrypted PDF variants (thanks to Gyuchang Jun)
- various bug fixes
v2.0.0 (25th February 2017)
- various bug fixes
v2.0.0.beta1 (15th February 2017)
- BREAKING CHANGE: remove all methods that were deprecated in 1.0.0
- Bug: Support extra encrypted PDF variants (thanks to Gyuchang Jun)
- various bug fixes
v1.4.1 (2nd January 2017)
- improve compatability with ruby 2.4 (thanks Akira Matsuda)
- various bug fixes
v1.4.0 (22nd February 2016)
- raise minimum ruby version to 1.9.3
- print warnings to stderr when deprecated methods are used. These methods have been
deprecated for 4 years, so hopefully few people are depending on them
- Fix exception when a npn-breakng space (character 160) is used with a
built-in fint (helvetica, etc)
- various bug fixes
Problems found locating distfiles:
Package acroread7: missing distfile AdobeReader_enu-7.0.9-1.i386.tar.gz
Package acroread8: missing distfile AdobeReader_enu-8.1.7-1.sparc.tar.gz
Package cups-filters: missing distfile cups-filters-1.1.0.tar.xz
Package dvidvi: missing distfile dvidvi-1.0.tar.gz
Package lgrind: missing distfile lgrind.tar.bz2
Otherwise, existing SHA1 digests verified and found to be the same on
the machine holding the existing distfiles (morden). All existing
SHA1 digests retained for now as an audit trail.
v1.3.0 (30th December 2012)
- Numerous performance optimisations (thanks Alex Dowad)
- Improved text extraction (thanks Nathaniel Madura)
- Load less of the hashery gem to reduce core monkey patches
- various bug fixes
v1.2.0 (28th AUgust 2012)
- Feature: correctly extract text using surrogate pairs and ligatures
(thanks Nathaniel Madura)
- Speed optimisation: cache tokenised Form XObjects to avoid re-parsing them
- Feature: support opening documents with some junk bytes prepended to file
(thanks Paul Gallagher)
- Acrobat does this, so it seemed reasonable to add support
v1.1.0 (25th March 2012)
- new PageState class for handling common state tracking in page receivers
- see PageTextReceiver for example usage
- various bugfixes to support reading more PDF dialects
v1.0.0 (16th January 2012)
- support a new encryption variation
- bugfix in PageTextRender (thanks Paul Gallagher)
v1.0.0.rc1 (19th December 2011)
- performance optimisations (all by Bernerd Schaefer)
- some improvements to text extraction from form xobjects
- assume invalid font encodings are StandardEncoding
- use binary mode when opening PDFs to stop ruby being helpful and transcoding
bytes for us
v1.0.0.beta1 (6th October 2011)
- ensure inline images that contain "EI" are correctly parsed
(thanks Bernard Schaefer)
- fix parsing of inline image data
v0.12.0.alpha (28th August 2011)
- small breaking changes to the page-based API - it's alpha for a reason
- resource related methods on Page object return raw PDF objects
- if the caller wants the resources wrapped in a more convenient
Ruby object (like PDF::Reader::Font or PDF::Reader::FormXObject) will
need to do so themselves
- add support for RunLengthDecode filters (thanks Bernerd Schaefer)
- add support for standard PDF encryption (thanks Evan Brunner)
- add support for decoding stream with TIFF prediction
- new PDF::Reader::FormXObject class to simplify working with form XObjects
v0.11.0.alpha (19th July 2011)
- introduce experimental new page-based API
- old API is deprecated but will continue to work with no warnings
- add transparent caching of common objects to ObjectHash
The PDF::Reader library implements a PDF parser conforming as much as
possible to the PDF specification from Adobe.
It provides programmatic access to the contents of a PDF file with
a high degree of flexibility.
The PDF 1.7 specification is a weighty document and not all aspects
are currently supported. I welcome submission of PDF files that
exhibit unsupported aspects of the spec to assist with improving out
support.