Changelog:
Version 1.2.1 - March 2013, by Martin von Gagern
Added Esperanto translation.
Updated Czech, German, Spanish, Finnish, Galician, Italian, Dutch, Polish, Slovenian, Serbian, Swedish, Ukrainian and Vietnamese translations.
Updated gnulib.
Recreated build system using recent versions of autotools. This will avoid security issues in "make distcheck" target. (CVE-2012-3386)
from the git log:
* allow module to be imported
* set UTF-8 as default encoding
* better handling of Google Docs HTML
* better handling of more edge-case inputs
* nitpicky bugfixes to whitespace, emphasis, etc.
* new config options
1.2.15.0 Tue Apr 16 23:43:24 UTC 2013
[Changes contributed by Olly Betts]
- Remove superfluous duplicate method wrappers from WritableDatabase
for methods wrapped in Database parent class.
- Improve test coverage.
- Fix minor typo in POD documentation.
1.2.14.0 Thu Mar 14 23:12:38 UTC 2013
[Changes contributed by Olly Betts]
- Perl 5.16.1 adds a '.' after "at foo line 123" so adjust regexp in
testcase t/10query.t to allow an optional '.' there. (ticket#610)
1.2.13.0 Wed Jan 9 11:19:20 UTC 2013
[Changes contributed by Adam Sjøgren]
- Wrap 2 and 3 parameter forms of StringValueRangeProcessor
constructor so prefixes and suffixes can be specified.
(ticket#607)
[Changes contributed by Olly Betts]
- Wrap the zero argument Query constructor.
- Add wrappers for Query::MatchAll and Query::MatchNothing.
- Don't pointlessly reinitialise default std::string parameters to an
empty string (performance and code size micro-optimisation).
1.2.12.0 Wed Jun 27 12:17:26 UTC 2012
- No change except for bumping the version to indicate compatibility
with Xapian 1.2.12.
1.2.11.0 Tue Jun 26 12:13:39 UTC 2012
- No change except for bumping the version to indicate compatibility
with Xapian 1.2.11.
1.2.10.0 Wed May 9 10:45:51 UTC 2012
[Changes contributed by Olly Betts]
- Wrap Database::close() (was previously only wrapped for
WritableDatabase).
- Suppress warnings about "not a known MakeMaker parameter name" in a
way which also works for newer versions of Perl.
1.2.9.0 Thu Mar 8 07:19:27 UTC 2012
[Changes contributed by Olly Betts]
- Wrap Document::get_docid() method.
- Fix "Use of qw(...) as parentheses is deprecated" warnings in tests
with Perl 5.14.
- Improve test coverage of TermGenerator (backported from trunk).
Omega 1.2.15 (2013-04-16):
omega:
* Don't pointlessly link utf8convert.o into the omega CGI.
Omega 1.2.14 (2013-03-14):
indexers:
* omindex:
+ Correct "max" -> "min" when reserving space for shared strings in .xlsx
files. This just means we now reserve a more appropriate amount of space
to start with.
+ Ignore .com files by default.
Omega 1.2.13 (2013-01-09):
indexers:
* omindex:
+ Extracting text using external filters now works for filenames containing a
newline character - previously the newline got lost during escaping for the
shell.
+ Fix segfault when -F option without a ':' is passed.
+ Skip a file if we get a read error while calculating the MD5 checksum (used
for duplicate detection) - previously we used a checksum of the file up to
that point.
+ Avoid rereading SVG and Atom files when we calculate their MD5 checksums.
+ Improvement --help output and man page, most notably:
- Say explicitly that --sample-size accepts the same formats as --max-size.
- Note default size limit on files to index is unlimited.
+ When generating a sample for a CSV file, limit the size we pre-allocate to
the CSV file size if that's smaller than the requested sample size, in case
the user sets that limit very high.
omega:
* Fix to decode %-encoded character at the end of the query string.
Omega 1.2.12 (2012-06-27):
No changes since 1.2.11 except to bump the version - this release was made to
fix an incorrect library version information update in xapian-core 1.2.11.
Omega 1.2.11 (2012-06-26):
indexers:
* Change HTML parser's handling of multiple <body> tags and of text outside of
<body> to match the behaviour of modern web browsers. (ticket#599)
* omindex:
+ Add command line option to control the size of the document sample stored.
Patch from Mihai Bivol.
+ Rework .xlsx parsing to substitute the shared strings into the positions
they are used in, so that the sample actually matches what appears in the
spreadsheet, and to index calculated cell contents.
+ Improve handling of headers and footers in OpenDocument documents.
+ pdftotext outputs a formfeed between each page, which messes up our "empty
body" check, so trim any trailing formfeeds before this check.
Omega 1.2.10 (2012-05-09):
indexers:
* Add support for CDATA to HTML/XML parser.
* omindex:
+ Add --max-size option, based on patch from ndaley in ticket#587.
+ Add support for atom feed files, patch from Mihai Bivol in ticket#595.
+ If the document with the highest existing docid before the run was updated,
we were reporting it as "added", but now we correctly report it as
"updated". (Backported from 1.3.0).
+ Catch and report std::exception explicitly, so failing to allocate memory
is no longer reported as "Unknown exception". (Backported from 1.3.0).
Omega 1.2.9 (2012-03-08):
documentation:
* docs/overview.html:
+ Document that libmagic is used to determine the MIME type if the extension
isn't known. Partly addresses ticket#569.
+ We now limit time as well as CPU and memory for external filters.
indexers:
* Our HTML parser now ignores sections bracketed by <!--UdmComment--> and
<!--/UdmComment-->, like we already do for <!--htdig_noindex-->.
* omindex: Add more extensions to the default ignore list: bin dat db fon jar
lnk pyc pyd pyo sqlite sqlite3 sqlite-journal tmp ttf
Xapian-core 1.2.15 (2013-04-16):
API:
* QueryParser/TermGenerator: Don't include CJK codepoints which are
punctuation in N-grams.
* TermGenerator: Fix bug where we failed to generate the first bigram
from the second sequence of N-grammable CJK characters in a piece of text.
brass backend:
* Call fdatasync()/fsync() when creating the "iambrass" file.
chert backend:
* Call fdatasync()/fsync() when creating the "iamchert" file.
flint backend:
* Call fdatasync()/fsync() when creating the "iamflint" file.
tools:
* delve: If -v is specified more than once, show even more info in some cases.
Xapian-core 1.2.14 (2013-03-14):
API:
* MSet::get_document(): Don't cache retrieved Document objects unless they
were requested with fetch(). This avoids using a lot of memory when many
MSet entries are retrieved. (Fixes#604)
matcher:
* Check if a candidate document has at least the minimum weight needed
before checking positional information, which speeds up slow phrase
searches (partly addresses #394).
brass backend:
* Fix multipass compaction not to damage document values, and to merge the
database stats correctly. (fixes#615)
chert backend:
* Fix multipass compaction not to damage document values, and to merge the
database stats correctly. (fixes#615)
flint backend:
* Fix multipass compaction bug. (fixes#615)
tools:
* xapian-replicate:
+ Fix handling of delays between replication events - the subtraction of the
target time and the current time was reversed, so we wouldn't sleep when
before the deadline, but would sleep after it for the amount we'd missed it
by.
+ On Microsoft Windows, we no longer sleep for more than 43 years if the
target time for a replication event had already passed. (Fixes#472)
Xapian-core 1.2.13 (2013-01-09):
API:
* TermGenerator: Add new method TermGenerator::set_max_word_length() to allow
this limit to be adjusted by the user.
* QueryParser: Implicitly close any unclosed brackets at the end of the query
string. Patch from Sehaj Singh Kalra.
* DateValueRangeProcessor: Add extra constructor overloaded form so that in
DateValueRangeProcessor(1, "date:"), the const char * gets interpreted as
std::string rather than bool.
matcher:
* Improved fix for #590 - count all matching LeafPostList objects with a Weight
object rather than trying to prune at the MultiAndPostList level based on
max_wt (if wdf is always zero for a term, BM25 gives max_wt of 0, which lead
to us never counting that subquery.
* Fix calculation of 0.0/0.0 in some cases. This then got used as a minimum
weight, but it seems this gives -nan (at least on x86-64 Linux) so it may
have been harmless in practice.
* We no longer use the highest weighted MSet entry to calculate percentages, so
remove code which finds it.
brass backend:
* Close excess file handles before we get the fcntl lock, which avoids the
lock being released again if one is open on the lock file. Notably this
avoids a situation where multiple threads in the same process could succeed
in locking a database concurrently.
chert backend:
* Close excess file handles before we get the fcntl lock, which avoids the
lock being released again if one is open on the lock file. Notably this
avoids a situation where multiple threads in the same process could succeed
in locking a database concurrently.
flint backend:
* Close excess file handles before we get the fcntl lock, which avoids the
lock being released again if one is open on the lock file. Notably this
avoids a situation where multiple threads in the same process could succeed
in locking a database concurrently.
remote backend:
* Improve the UnimplementedError message for a MatchSpy subclass which doesn't
implement name() so it's clearer that it is this particular subclass which
can't be used remotely, rather than all MatchSpy objects.
documentation:
* valueranges.html: Update documentation to reflect change in Xapian 1.1.2 -
DateValueRangeProcessor and StringValueRangeProcessor now support a prefix or
suffix.
* Clarify that the "reverse" parameter of set_sort_by_relevance_then_value()
and set_sort_by_relevance_then_key() only affects the ordering of the
value/key part of the sort.
* docs/quickstart.html: Fix seriously outdated statement that Xapian doesn't
create the database directory - that changed in 0.7.2 (released 2003-07-11).
* HACKING: Try to make it clearer we're looking for a dual-licence on submitted
patches.
tools:
* xapian-replicate:
+ Add a --full-copy option to force a full copy to be sent. (ticket#436)
+ Add --quiet option, and be a little more verbose by default.
+ Allow files > 32G to be be copied by replication.
+ Fix "if (fd > 0)" tests in some replication code to be "if (fd >= 0)".
In practice this is unlikely to actually have caused problems since
stdin is typically still open and using fd 0.
+ Simplify how we open the .DB file on the replication slave to just call
open() once with O_CREAT, rather than once without, than stat() if that
fails, and then again with O_CREAT|O_TRUNC if stat() doesn't show an
ordinary file exists.
examples:
* quest:
+ New --flags command line option to allow setting arbitrary QueryParser
flags.
+ Align option descriptions in --help output, and make the initial letter of
such descriptions consistently lowercase.
Bug fixes:
* fix for enumset.h not being installed on Windows
* zOS pkgdata fix
* Test fixes
* Region enumeration fix
* make stable sort faster
* host failures for DateFormatTest
* LayoutEngine security patches (see above)
* ubrk fix for word_POSIX infinite loop
* fix memory leak/crash in LayoutEngine
* fix header guard typo in layout/TibetanReordering.h
to address issues with NetBSD-6(and earlier)'s fontconfig not being
new enough for pango.
While doing that, also bump freetype2 dependency to current pkgsrc
version.
Suggested by tron in PR 47882
a) refer 'perl' in their Makefile, or
b) have a directory name of p5-*, or
c) have any dependency on any p5-* package
Like last time, where this caused no complaints.
Features:
Support for Python3,
Add xmlXPathSetContextNode and xmlXPathNodeEval
Documentation:
Add documentation for xmllint --xpath
Fix the URL of the SAX documentation from James
Fix spelling of "length"
Portability:
Fix python bindings with versions older than 2.7
rebuild docs:Makefile.am
elfgcchack.h after rebuild in doc
elfgcchack for buf module
Fix a uneeded and wrong extra link parameter
Few cleanup patches for Windows
Fix rpmbuild --nocheck
Fix for win32/configure.js and WITH_THREAD_ALLOC
Fix Broken multi-arch support in xml2-config
Fix a portability issue for GCC < 3.4.0
Windows build fixes
Fix a thread portability problem
Downgrade autoconf requirement to 2.63
Bug Fixes:
Fix a linking error for python bindings
Fix a couple of return without value
Improve the hashing functions
Improve handling of xmlStopParser()
Remove risk of lockup in dictionary initialization
Activate detection of encoding in external subset
Fix an output buffer flushing conversion bug
Fix an old bug in xmlSchemaValidateOneElement
Fix configure cannot remove messages
fix schema validation in combination with xsi:nil
xmlCtxtReadFile doesn't work with literal IPv6 URLs
Fix a few problems with setEntityLoader
Detect excessive entities expansion upon replacement
Fix the flushing out of raw buffers on encoding conversions
Fix some buffer conversion issues
When calling xmlNodeDump make sure we grow the buffer quickly
Fix an error in the progressive DTD parsing code
xmllint should not load DTD by default when using the reader
Try IBM-037 when looking for EBCDIC handlers
Fix potential out of bound access
Fix large parse of file from memory
Fix a bug in the nsclean option of the parser
Fix a regression in 2.9.0 breaking validation while streaming
Remove potential calls to exit()
Improvements:
Regenerated API, and testapi, rebuild documentation
Fix tree iterators broken by 2to3 script
update all tests for Python3 and Python2
A few more fixes for python 3 affecting libxml2.py
Fix compilation on Python3
Converting apibuild.py to python3
First pass at starting porting to python3
updated configure.in for python3
Add support for xpathRegisterVariable in Python
Added a regression tests from bug 694228 data
Cache presence of '<' in entities content
Avoid extra processing on entities
Python binding for xmlRegisterInputCallback
Python bindings: DOM casts everything to xmlNode
Define LIBXML_THREAD_ALLOC_ENABLED via xmlversion.h
Adding streaming validation to runtest checks
Add a --pushsmall option to xmllint
Cleanups:
Switched comment in file to UTF-8 encoding
Extend gitignore
Silent the new python test on input
Cleanup of a duplicate test
Cleanup on duplicate test expressions
Fix compiler warning after 153cf15905cf4ec080612ada6703757d10caba1e
Spec cleanups and a fix for multiarch support
Silence a clang warning
Cleanup the Copyright to be pure MIT Licence wording
rand_seed should be static in dict.c
Fix typos in parser comments
Upstream changes:
1.52 2013-05-21
- Add t/style-trailing-space.t.
- Got rid of trailing space.
- Convert to t/cpan-changes.t .
1.51 2013-05-11
- Sort the XML namespaces before outputting.
- became broken in perl-5.18.0-RC1.
CRF++ is a simple, customizable, and open source implementation of Conditional
Random Fields (CRFs) for segmenting/labeling sequential data. CRF++ is designed
for generic purpose and will be applied to a variety of NLP tasks, such as Named
Entity Recognition, Information Extraction and Text Chunking.
Version 2.7
-----------
(codename Translation, released on May 20th 2013)
- Choice and prefix loaders now dispatch source and template lookup
separately in order to work in combination with module loaders as
advertised.
- Fixed filesizeformat.
- Added a non-silent option for babel extraction.
- Added `urlencode` filter that automatically quotes values for
URL safe usage with utf-8 as only supported encoding. If applications
want to change this encoding they can override the filter.
- Added `keep-trailing-newline` configuration to environments and
templates to optionally preserve the final trailing newline.
- Accessing `last` on the loop context no longer causes the iterator
to be consumed into a list.
- Python requirement changed: 2.6, 2.7 or >= 3.3 are required now,
supported by same source code, using the "six" compatibility library.
- Allow `contextfunction` and other decorators to be applied to `__call__`.
- Added support for changing from newline to different signs in the `wordwrap`
filter.
- Added support for ignoring memcache errors silently.
- Added support for keeping the trailing newline in templates.
- Added finer grained support for stripping whitespace on the left side
of blocks.
- Added `map`, `select`, `reject`, `selectattr` and `rejectattr`
filters.
- Added support for `loop.depth` to figure out how deep inside a recursive
loop the code is.
- Disabled py_compile for pypy and python 3.
* Added "center" style hint for block title and desc
* Added style hints for new Getting Started designs
* Fixed incorrect automatic link role for guide links
* Support for Mallard conditional language test tokens
* Added style="continues" for Mallard lists and steps
* Better handling of broken internal Mallard links
* MathML support for Mallard and DocBook
* Audio and video support for DocBook
* Ability to set poster image for video
* TTML subtitles for audio and inline media
* Updated media controls
* Fixed #hash auto-expanding and colorizing
* Translator and publisher credits
* Use quote char on :before for blockquote
* ARIA landmark roles
* Changed marker for expanders
* Switched ui:expanded to non-experimental
* Experimental Mallard UI overlays
* Dropped unused DocBook utility XSLT
* Use itstool "join mode" for translations
* More experimental thumbnail link styles
* Many translation updates
3.2.1 (2013-05-11)
==================
Features added
--------------
* The methods ``apply_templates()`` and ``process_children()`` of XSLT
extension elements have gained two new boolean options ``elements_only``
and ``remove_blank_text`` that discard either all strings or whitespace-only
strings from the result list.
Bugs fixed
----------
* When moving Elements to another tree, the namespace cleanup mechanism
no longer drops namespace prefixes from attributes for which it finds
a default namespace declaration, to prevent them from appearing as
unnamespaced attributes after serialisation.
* Returning non-type objects from a custom class lookup method could lead
to a crash.
* Instantiating and using subtypes of Comments and ProcessingInstructions
crashed.
Other changes
-------------
3.2.0 (2013-04-28)
==================
Features added
--------------
Bugs fixed
----------
* LP#690319: Leading whitespace could change the behaviour of the string
parsing functions in ``lxml.html``.
* LP#599318: The string parsing functions in ``lxml.html`` are more robust
in the face of uncommon HTML content like framesets or missing body tags.
Patch by Stefan Seelmann.
* LP#712941: I/O errors while trying to access files with paths that contain
non-ASCII characters could raise ``UnicodeDecodeError`` instead of properly
reporting the ``IOError``.
* LP#673205: Parsing from in-memory strings disabled network access in the
default parser and made subsequent attempts to parse from a URL fail.
* LP#971754: lxml.html.clean appends 'nofollow' to 'rel' attributes instead
of overwriting the current value.
* LP#715687: lxml.html.clean no longer discards scripts that are explicitly
allowed by the user provided whitelist. Patch by Christine Koppelt.
Other changes
-------------
3.1.2 (2013-04-12)
==================
Features added
--------------
Bugs fixed
----------
* LP#1136509: Passing attributes through the namespace-unaware API of
the sax bridge (i.e. the ``handler.startElement()`` method) failed
with a ``TypeError``. Patch by Mike Bayer.
* LP#1123074: Fix serialisation error in XSLT output when converting
the result tree to a Unicode string.
* GH#105: Replace illegal usage of ``xmlBufLength()`` in libxml2 2.9.0
by properly exported API function ``xmlBufUse()``.
Other changes
-------------
3.1.1 (2013-03-29)
==================
Features added
--------------
Bugs fixed
----------
* LP#1160386: Write access to ``lxml.html.FormElement.fields`` raised
an AttributeError in Py3.
* Illegal memory access during cleanup in incremental xmlfile writer.
Other changes
-------------
* The externally useless class ``lxml.etree._BaseParser`` was removed
from the module dict.
Java 5. Changes from 9.4 to 9.5:
The bulk of the open source parts of Saxon (the parts maintained by Saxonica) are now licensed under Mozilla Public License version 2.0, replacing MPL 1.0.
There have been some other changes to third-party open source components, notably the introduction of a new regular expression engine derived from Jakarta (Apache license), and the dropping of the old Base64 conversion code (Netscape license).
Users interested in building the product from source code need to be aware that the build process now includes a preprocessing phase that generate separate Java code for the EE, PE, and HE editions. The raw (before preprocessing) source code is published in the Subversion repository, but for building Saxon-HE, a more convenient place to start is the post-preprocessing Java code issued on SourceForge as a source.zip download.
A consequence of this change is that the JAR files for Saxon-HE, Saxon-PE, and Saxon-EE contain different class files having the same names. Therefore, the JAR files for different editions should not co-exist on the classpath. If you use internal Saxon APIs in an application, you may need to check that the methods you call are available in all three editions. This won't be a problem for interfaces that are clearly user-facing, but it could be an issue for applications that penetrate deeper into the internals.
More changes here:
http://www.saxonica.com/documentation/index.html#!changes
Common Changes
==============
CLDR 23: Collation tailorings put native script first; non-Gregorian calendar formats are more consistent; much improved data for Armenian (hy), Georgian (ka), Mongolian (mn), and Welsh (cy); …
Time zone data: 2013b
Date format/parse now supports CLDR short weekday names ("EEEEEE", "cccccc").
Support DisplayContext for date formatting, locale display names.
DateTimePatternGenerator behavior is now much more consistent between C and J.
Support new timezone pattern characters in LDML spec: X+, x+, O, OOOO, V, VV, VVV.
Updated SpoofChecker for v5 of UTS39.
AlphabeticIndex enhancements:
New thread-safe ImmutableIndex sub-API
Build an index for a custom Collator.
Make data-driven for Chinese collations.
New API for CLDR script metadata.
ICU4C Specific Changes
======================
Support for “dangi” Korean luni-solar calendar (already in ICU4J).
Add CompactDecimalFormat (already in ICU4J).
Add TerritoryContainment APIs (already in ICU4J).
UnicodeString default constructor and destructor now inline.
Layout engine now supports 'morx' tables.
Fixed some ICU 50 regressions:
Affixes set with e.g. DecimalFormat::setPositivePrefix were ignored for parse.
UNUM_PARSE_INT_ONLY no longer handled grouping separator.
Add ucal_getTimeZoneID.
The C++ AlphabeticIndex implementation is now on par with Java, including full support for all Chinese collation tailorings.
U8_NEXT() and similar low-level macros now support NUL-terminated UTF-8 strings.
New macros like U8_NEXT_OR_FFFD() return U+FFFD for an ill-formed sequence.
Conversion: New "good one-way" mapping type, for example for Variation Selector sequences.