Changes since 0.5.18a:
Fixed endianess issues in mp3 extractor.
Fixed build issues (need to link C++ code explicitly against
libstdc++ on BSD).
Releasing libextractor 0.5.20c.
Fixed concurrency issues in plugin (un-)loading by adding locking
around libltdl functions.
Added an FFmpeg-based thumbnail extractor plugin, initially
supporting only bmp and png files.
Fixed security issues in XPDF-based PDF extractor.
Added track number and ISRC for FLAC/mp3/ogg files.
Added a plugin for AppleSingle/AppleDouble files.
Various minor code cleanups.
Fixed security issues in XPDF-based PDF extractor.
Added a FLAC (.flac) plugin.
Added a Flash Video (.flv) plugin.
Add support for some common iTunes tags to qtextractor.
Disable libgsf logging (for corrupt files).
Added escape (\n) handling to split extractor.
Fixed problem with newer versions of libgsf.
Fixed problem with automake 1.10 not setting MKDIR_P.
Releasing libextractor 0.5.18a.
This release adds support for NSFE files. Removal of duplicate keywords
is now biased against keywords obtained from splitting. The build process
should now work properly if no C++ compiler is found. The thumbnail-extractors
should now load properly in all cases (resolved a symbol naming problem).
Added dictionaries for Finnish, French, Gaelic and Swedish
(for printable extractors).
Word history extraction works (wordleaker).
exiv2 works.
Added Python binding.
libextractor becomes a GNU package.
Thumbnails work.
* Yet another round of XPDF-related security fixes.
* Mis-detection of man pages as part of TAR archives fixed.
* More Mime-types for the OLE2 extractor. Also ignore (harmless)
libc errors in plugins when extracting.
* More TAR improvements: keywords 'date' and 'format' are
extracted. More checksums variants were added. Long filenames
as produced by GNU and Schilling tar (possibly Solaris pax also)
are extracted.
Changes 0.5.9:
* Made TAR extractor parsing more robust.
* Fixing crash in MIME-extractor due to typo in the code.
* Fixed security problems in PDF extractor
* Fixed bugs in the exiv2, OpenOffice, and OLE2 plug-ins.
* Static relocatable glib no longer required.
* getKeywords2 function is now included in the code.
Bugfixes in exiv2 extractor fixing remaining issues.
Changed plugins to not use filename but always only
rely on mmapped memory. Extended API with function
that allows running getKeywords on data in memory
(instead of filename). Extended API with encode
and decode functions for binary metadata.
Changes 0.5.1:
Preliminary integration of exiv2 support (not enabled
by default due to bugs). Moved Python and Java
bindings into separate packages.
print keywords.
Fixes (second half of) Secunia SA: http://secunia.com/advisories/15651/
(first half was fixed in 0.3.11nb1)
ChangeLog excerpts:
Thu Feb 24 00:32:44 EST 2005
Added extractor that extracts binary (!) thumbnails from
images using ImageMagick. Decoder function for the binary
string is in the thumbnailextractor.c source.
Sun Feb 20 16:36:17 EST 2005
Fixed similar problem in REAL extractor. Added support
for new Helix/Real format to REAL extractor.
Sun Feb 20 12:48:15 EST 2005
Fixed (rare) integer overflow bug in PNG extractor.
Fri Jan 21 15:23:43 PST 2005
Fixed security problem in PDF extractor.
Fri Dec 24 13:28:59 CET 2004
Added support for Unicode to the pdf extractor.
Thu Dec 23 18:14:10 CET 2004
Avoided exporting symbol OPEN (conflicts on OSX
with same symbol from GNUnet). Added conversion
to utf8 to various plugqins (see todo) and
added conversion from utf8 to current locale to
print keywords.
Fri Nov 12 19:20:37 EST 2004
Fixed bug in PDF extractor (extremely rare segfault).
Fixed#787.
Fixed bug in man extractor (undocumented return value running on
4 GB file not taken care of properly).
Sat Oct 30 20:18:21 EST 2004
Fixing various problems on Sparc64 (bus errors).
Workaround for re-load glib problem of OLE2 extractor.
libextractor is a simple library for keyword extraction. libextractor
does not support all formats but supports a simple plugging mechanism
such that you can quickly add extractors for additional formats, even
without recompiling libextractor. libextractor typically ships with a
dozen helper-libraries that can be used to obtain keywords from common
file-types.