Update to 1.4.4. From the changelog:
indexers:
* omindex:
+ 1.4.3 added a new --sample option, but contrary to the documentation
the default behaviour was to take the sample from the meta description
(which was the hard-wired behaviour in 1.4.2 and earlier). The default
has now been changed to take the sample from the body.
+ Index .shtm, .xhtml and .xhtm as HTML by default - .shtm is another
extension used for server-parsed HTML (in addition to the more common
.shtml), and .xhtm and .xhtml are XHTML.
+ Fix fallback lookup for extension containing upper case. User mappings
worked, but built-in extension to MIME type mappings were effectively being
ignored (because the result of the function call was not being checked).
Bug introduced in 1.3.4.
+ Fix term-based date ranges, broken by changes in 1.4.2. Found and
diagnosed by Gaurav Arora.
+ Handle date range with start after end better - with term-based ranges,
this used to generate a bogus filter, but now just generates Dlatest.
+ Use Y-term when range starts/ends at year start/end. Previously we used 12
M-terms for these cases.
+ Use full leap-year check when constructing term-based date ranges -
previous code was good until 2100, but even then it would only result
in an extra term being included for a non-existent February 29th in
rare cases.
+ Add support for indexing vCard files if Perl and its Text::vCard module
are available.
+ Recognise application/x-rpm as alternative type since libmagic reports this
rather than application/x-redhat-package-manager.
+ Use official MIME type application/vnd.debian.binary-package for debian
packages. We used to map .deb and .udeb to application/x-debian-package,
but in 2014 (after we added that support for .deb) an official type was
registered with IANA. We now map extensions .deb and .udeb to the official
type, but the unofficial type is still recognised (older versions of
libmagic probably report it, and users may be mapping to it).
+ Handle PHP as MIME type text/x-php. The main difference this makes is that
PHP files which don't have extension '.php' (e.g. .phtml, .phps, .php5,
.ph4, etc) get identified by libmagic as text/x-php and will now be indexed.
It also means that the user can now more easily configure different filters
for HTML and PHP.
+ Don't use meta description as sample by default. Now we have dynamic
snippets (via $snippet), the body text is a better default. Also generated
HTML sometimes has unhelpful content in the meta description. To get the
previous behaviour, use the new omindex command line option:
--sample=description
omega:
* New OmegaScript command $cgiparams which returns a list of the parameter
names.
* Handle tab in a CGI parameter name in the same way as space. Mostly this is
a way to avoid having tabs in CGI parameter names - they aren't useful, but
if they could have tabs in we can't put CGI parameter names in a list.
templates:
* query: Fix highlighting of matching terms. We were using both $snippet and
$highlight, which results in double highlighting and HTML escaping, most
noticeable by literal <strong> and </strong> appearing around matching terms
in the rendered HTML snippet. Reported by Mark Thomas on xapian-discuss.
build system:
* If gen-mimemap failed after creating mimemap.h, the rule wouldn't get rerun.
2017-07-10 00:27:47 +02:00
|
|
|
@comment $NetBSD: PLIST,v 1.7 2017/07/09 22:27:47 schmonz Exp $
|
2008-07-27 01:37:29 +02:00
|
|
|
bin/dbi2omega
|
|
|
|
bin/htdig2omega
|
|
|
|
bin/mbox2omega
|
|
|
|
bin/omindex
|
Update to 1.4.1. From the changelog:
omindex:
+ Also index leafname with _ and & replaced by spaces. Literal spaces are
often avoided in filenames, and "hello_world.txt" ought to be searchable for
via "hello" and "world". Partly addresses #618, reported by Julien
Pfefferkorn.
+ Make named entity look-up (e.g. é -> 233) use the same keyword-lookup
table approach we already use for HTML tags and built-in MIME content-types,
rather than a std::map, which makes it faster while using less memory.
+ Avoid using the shell to run most external commands as it's unnecessary
overhead. For the built-in filters, the only cases which now use a shell
are where we run two unzip commands. For user-specified commands, a simple
and slightly conservative test is used, which should avoid a shell in most
common cases where it isn't needed. Notably, environment variables set
before the command are handled.
+ Track files which couldn't be indexed in the user metadata and skip them by
default on subsequent runs to avoid the costs of repeatedly running a
filter on a file it can't handle. Run omindex with --retry-failed to retry
such files.
+ Overhaul the "per-site" terms:
- 'H' prefix is hostname as before, except that if the term would be > 240
bytes (unlikely but possible) the end is hashed is the same way 'U'
prefix terms are.
- 'P' terms are now added for every directory level, not just the start
URL's path.
- A new 'J' prefix term is added with the start URL (less any trailing
'/'), which means all files indexed from a particular "site" are now
indexed by one term. See #376.
+ Add 'skip' pseudo-mimetype which extensions can be mapped to, and they will
then be reported and skipped (to complement the existing 'ignore'
pseudo-mimetype which causes files with the specified extension to be
quietly ignored).
+ Treat a command of 'true' specially as meaning make the text extraction a
no-op (as actually running /bin/true effectively would). This provides a
way to index some file types by only meta-data. Fixes #519, reported by
Brian Burton.
+ Add support for wildcard mimetypes */* and *. Combined with filter command
``true`` for indexing by meta-data only, you can specify a fall back case
of indexing by meta-data only using ``--filter '*:true'``. From a
suggestion by Brian Burton on xapian-discuss.
+ Index message/rfc822 and message/news. These are individually saved email
messages and news articles.
+ Index archived web page formats MAFF and MHTML.
+ Handle .xla, yet another XL extension.
+ Handle metadata in LibreOffice HTML export (dcterms.subject,
dcterms.description, dcterms.creator and dcterms.contributor).
+ Use zlib's gzopen() instead of invoking "gzip -dc" for compressed Abiword
documents.
+ Add support for %f in command passed to --filter to allow specifying
commands where the input file is not the final argument. Fixed #570,
reported by Charles Atkinson.
+ Allow --filter to handle commands which produce output in a temporary file
rather than on stdout.
+ Allow --filter to specify the character set of the output the filter
produces.
+ Handle application/vnd.ms-excel, text/x-perl and application/x-dvi via
default --filter settings instead of hardcoded cases (now possible thanks
to the new abilities that --filter has).
+ Add support for specifying a MIME subtype of '*' in --filter arguments.
+ Add -track-ctime option to allow omindex to pick up changes to file
ownership and permissions.
+ Index terms from the leafname with an 'F' prefix, rather than treating them
as more body text. (Fixes #633, reported by Emmanuel Garette)
+ The starting URL wasn't previously URL encoded. In 1.2.18, a minimally
intrusive fix was implemented. In 1.3.2, we now encode the starting URL
as we do for the rest of the filename.
+ Don't assume .doc is application/msword but let libmagic decide, since .doc
files may actually be RTF, and sometimes people use .doc for plain-text
documentation.
+ Add support for indexing 'topic' and 'created date' meta-data for
OpenDocument format and HTML.
+ Index "topic" for PDF documents.
+ Commit changes and exit, rather than skipping the current file on most
unexpected errors reading directories or initialising libmagic - otherwise
we can end up deleting a lot of database entries on errors like EHOSTDOWN
when indexing network mounts.
+ Add --opendir-sleep=SECS option to allow working around problems with
indexing files on Microsoft DFS shares.
+ If we get ENOTDIR trying to index a file, skip it quietly (unless in
verbose mode) as we already do if we get ENOENT, since ENOTDIR is what we
get if the file and the directory it was in got removed between us getting
the filename and trying to open it.
+ Handle ENOENT, ENOTDIR and EACCES from readdir().
+ If we've already opened the file (as we often will have if using a modern
libmagic with magic_descriptor() available), then use fstat() on that fd
rather than stat()/lstat() on the pathname.
+ Pass error message string and errno value in ReadError exceptions.
+ Report strerror(errno) if we can't read a file.
+ Filtering via text/html now handles HTML documents which specify a charset.
+ Add support for indexing Microsoft Publisher files using pub2xhtml.
+ Restrict the length of what we consider to be an extension, currently to 7
characters or whatever the longest extension in the mime_map is if it is
longer.
+ Avoid '//' in temporary filenames (cosmetic only).
+ Extend --filter to handle commands which produce HTML on stdout.
+ Don't report an error if a file is deleted (or renamed) between us reading
the directory entry for it and trying to read the file itself by default.
In --verbose mode, the situation is still reported, but now with a
specific message.
+ If omindex receives any of the signals SIGHUP, SIGINT, SIGQUIT or SIGTERM,
then kill any active external filter child process, then handle the signal
as we did before. If setpgid() is available, put each external filter in
its own process group and kill the whole process group when we get a
signal.
+ Use magic_descriptor() if the version of libmagic we're building against
is new enough to have it. This eliminates an extra opening of a file
being indexed in certain cases.
+ Use rst2html to handle .rst and .rest files.
+ Index title with an 'S' prefix rather than no prefix.
+ If the document with the highest existing docid before the run was updated,
we were reporting it as "added", but now we correctly report it as
"updated".
+ Catch and report std::exception explicitly, so failing to allocate memory
is no longer reported as "Unknown exception".
omindex-list: New tool to list URLs of all the documents in a database
(or list of databases) indexed by omindex.
* The HTML parser now explicitly handles <APPLET>, <OBJECT> and <TR>.
* Use a generated compact and efficient table to convert HTML tag names
to enum codes - this is both faster and smaller than the approach we were
using, with the benefit that the table is auto-generated.
* Always use our built-in conversion code for the character sets it can handle
(previously we'd use iconv if available; now we only use iconv for other
character sets). This gives us more consistent results, and in particular
means we now handle BOMs better (at least when using GNU iconv).
* A lot of data labelled as "iso-8859-1" is actually "windows-1252". The two
only differ in characters which are control characters in iso-8859-1, so
assume the latter when we see the former.
scriptindex:
+ Remove special error handling case noting that index=nopos was replaced
with indexnopos - this was removed in 1.1.0 so there's been enough time to
upgrade.
omega:
* Add support for sorting by more than one value - e.g. SORT=+1,-2
* Add $msizelower and $msizeupper which provide access to the lower and upper
bounds on the number of matches.
* Add support for $set{weighting,coord}.
* Add weightingpurefilter option. Normally a query consisting only of filter
terms won't have relevance weights calculated. This new option allows you to
specify a weighting scheme to use for such queries, with the same values
supported as for the existing weighting option. For example,
$set{weightingpurefilter,coord} will weight such queries by how many filter
terms match each document.
* $filters now includes DATEVALUE, which means we'll force the first page when
reloading or changing page starting from existing URLs upon upgrade to 1.4.1,
but the exact same existing URL could be for a search without the date filter
where we want to force the first page, so there's an inherent ambiguity
there. Forcing first page in this case seems the least problematic
side-effect.
* Implement $match command for omegascript. Patch from Richhiey Thomas.
* Add optional prefix argument to $terms.
* $snippet now uses MSet::snippet() instead of the Snipper class.
* Add $contains{STRING1,STRING2}. Contributed by Ayush Gupta.
* Add support for negated boolean filter terms, specified by CGI parameter "N".
* Support a direction prefix on SORT: '+' for ascending, '-' for descending.
SORTREVERSE set to non-0 now flips the direction. Fixes #697, reported by
Andy Chilton.
* Add options argument to $transform.
* Cache compiled regexps used in $transform.
* Add $ord OmegaScript command which returns the Unicode codepoint for the
first character of a UTF-8 string.
* Add $chr OmegaScript command which returns the UTF-8 string for given Unicode
codepoint.
* Add $csv OmegaScript command which escapes a string for use as a field in a
CSV file ("always quote" mode inspired by patch from Gaurav Arora.)
* New $filters encoding which avoids collisions. We also compare CGI parameter
xFILTERS to what $filters would have returned in previous releases, so that
on upgrades old format serialised filters are handled correctly.
* Fix $jsonarray not to prepend ']' to the first array element.
* Skip weighting scheme setup for a pure date range query - it won't be
weighted anyway, so we can avoid having to parse weighting scheme parameters,
etc.
* Use value ranges when date range filtering by value. Should be more
efficient than a MatchDecider, and will automatically take advantage of any
future value range optimisations in xapian-core.
* Add default_db and default_template config options. These allow the default
template and default database name to be set via the config file, rather than
being stuck with the respective defaults of "default" and "query". Fixes
#310, reported by Marco Hennigs.
* Add support for non-exclusive filters. Fixes #234, reported by Thomas
Viehmann.
* Fix handling of multiple P.<prefix> fields - previously only the first seen
was used. These fields are also now taken into account when deciding if the
query has changed. $query now returns an OmegaScript list with one entry for
each CGI parameter passed.
* Allow setting query expansion scheme to "bo1".
* Make the $json and $jsonarray force the text to be valid UTF-8, since
otherwise the output isn't valid JSON.
* Check parameters to $set{weighting,bm25 ...} and $set{weighting,trad ...}
converted OK. Based on patch from Aarsh Shah.
* Add support to $set{weighting,...} for bb2, dlh, dph, ifb2, ineb2, inl2, lm,
pl2 when we're built against a xapian-core which is new enough to have these
schemes.
* Add $snippet to generate a snippet of text tailored to the search.
* Add new $json and $jsonarray OmegaScript commands to support producing JSON
output.
* Add $truncate command which truncates a string after a word.
* Add support for $set{weighting,tfidf} to allow the new TfIdfWeight weighting
scheme to be used.
+ DEFAULTOP now defaults to AND rather than OR, since that matches what pretty
much every search engine does these days. Closes ticket#512.
* Allow mapping a query string prefix to more than one term prefix (which
xapian-core has supported since 1.0.4).
* Add support for search inputs for multiple probabilistic prefixes, with
support for per-prefix stemmers.
* Drop legacy support for handling '.' separated terms in xP - that changed in
Omega 0.9.7, more than 5 years ago now.
* Remove support for OLDP CGI parameter which was superseded by xP
approximately a decade ago, and isn't even documented!
* Drop special handling for R-prefixed terms in $prettyterm - we stopped
generating these in Xapian 1.0.
templates:
* Lower case all HTML tags, attributes and values; explicitly close <option>
tags. Patches from Vivek Pal and Nirmal Singhania.
* Migrate Omega Templates to HTML5. Patch from Nirmal Sighania.
* templates/query: Remove stray double quote from generated URL for spelling
suggestion when THRESHOLD is set. Patch from Nirmal Singhania.
* templates/opensearch: Change response feeds to support OpenSearch 1.1.
Patch from Nirmal Singhania.
* templates/query: Fix setting setting of prefix map for P - in 1.3.2, this
would failed to also search in the subject. Now it also searches in the
subject and topic.
* templates/query:
+ We now map unprefixed queries to include S-prefixed terms to match the
change in omindex to prefixing terms from the title with S. You may want
to make the same update to your own templates.
+ Set up prefixes for 'author:' and 'title:'.
2016-11-07 14:02:45 +01:00
|
|
|
bin/omindex-list
|
2008-07-27 01:37:29 +02:00
|
|
|
bin/scriptindex
|
Update to 1.4.1. From the changelog:
omindex:
+ Also index leafname with _ and & replaced by spaces. Literal spaces are
often avoided in filenames, and "hello_world.txt" ought to be searchable for
via "hello" and "world". Partly addresses #618, reported by Julien
Pfefferkorn.
+ Make named entity look-up (e.g. é -> 233) use the same keyword-lookup
table approach we already use for HTML tags and built-in MIME content-types,
rather than a std::map, which makes it faster while using less memory.
+ Avoid using the shell to run most external commands as it's unnecessary
overhead. For the built-in filters, the only cases which now use a shell
are where we run two unzip commands. For user-specified commands, a simple
and slightly conservative test is used, which should avoid a shell in most
common cases where it isn't needed. Notably, environment variables set
before the command are handled.
+ Track files which couldn't be indexed in the user metadata and skip them by
default on subsequent runs to avoid the costs of repeatedly running a
filter on a file it can't handle. Run omindex with --retry-failed to retry
such files.
+ Overhaul the "per-site" terms:
- 'H' prefix is hostname as before, except that if the term would be > 240
bytes (unlikely but possible) the end is hashed is the same way 'U'
prefix terms are.
- 'P' terms are now added for every directory level, not just the start
URL's path.
- A new 'J' prefix term is added with the start URL (less any trailing
'/'), which means all files indexed from a particular "site" are now
indexed by one term. See #376.
+ Add 'skip' pseudo-mimetype which extensions can be mapped to, and they will
then be reported and skipped (to complement the existing 'ignore'
pseudo-mimetype which causes files with the specified extension to be
quietly ignored).
+ Treat a command of 'true' specially as meaning make the text extraction a
no-op (as actually running /bin/true effectively would). This provides a
way to index some file types by only meta-data. Fixes #519, reported by
Brian Burton.
+ Add support for wildcard mimetypes */* and *. Combined with filter command
``true`` for indexing by meta-data only, you can specify a fall back case
of indexing by meta-data only using ``--filter '*:true'``. From a
suggestion by Brian Burton on xapian-discuss.
+ Index message/rfc822 and message/news. These are individually saved email
messages and news articles.
+ Index archived web page formats MAFF and MHTML.
+ Handle .xla, yet another XL extension.
+ Handle metadata in LibreOffice HTML export (dcterms.subject,
dcterms.description, dcterms.creator and dcterms.contributor).
+ Use zlib's gzopen() instead of invoking "gzip -dc" for compressed Abiword
documents.
+ Add support for %f in command passed to --filter to allow specifying
commands where the input file is not the final argument. Fixed #570,
reported by Charles Atkinson.
+ Allow --filter to handle commands which produce output in a temporary file
rather than on stdout.
+ Allow --filter to specify the character set of the output the filter
produces.
+ Handle application/vnd.ms-excel, text/x-perl and application/x-dvi via
default --filter settings instead of hardcoded cases (now possible thanks
to the new abilities that --filter has).
+ Add support for specifying a MIME subtype of '*' in --filter arguments.
+ Add -track-ctime option to allow omindex to pick up changes to file
ownership and permissions.
+ Index terms from the leafname with an 'F' prefix, rather than treating them
as more body text. (Fixes #633, reported by Emmanuel Garette)
+ The starting URL wasn't previously URL encoded. In 1.2.18, a minimally
intrusive fix was implemented. In 1.3.2, we now encode the starting URL
as we do for the rest of the filename.
+ Don't assume .doc is application/msword but let libmagic decide, since .doc
files may actually be RTF, and sometimes people use .doc for plain-text
documentation.
+ Add support for indexing 'topic' and 'created date' meta-data for
OpenDocument format and HTML.
+ Index "topic" for PDF documents.
+ Commit changes and exit, rather than skipping the current file on most
unexpected errors reading directories or initialising libmagic - otherwise
we can end up deleting a lot of database entries on errors like EHOSTDOWN
when indexing network mounts.
+ Add --opendir-sleep=SECS option to allow working around problems with
indexing files on Microsoft DFS shares.
+ If we get ENOTDIR trying to index a file, skip it quietly (unless in
verbose mode) as we already do if we get ENOENT, since ENOTDIR is what we
get if the file and the directory it was in got removed between us getting
the filename and trying to open it.
+ Handle ENOENT, ENOTDIR and EACCES from readdir().
+ If we've already opened the file (as we often will have if using a modern
libmagic with magic_descriptor() available), then use fstat() on that fd
rather than stat()/lstat() on the pathname.
+ Pass error message string and errno value in ReadError exceptions.
+ Report strerror(errno) if we can't read a file.
+ Filtering via text/html now handles HTML documents which specify a charset.
+ Add support for indexing Microsoft Publisher files using pub2xhtml.
+ Restrict the length of what we consider to be an extension, currently to 7
characters or whatever the longest extension in the mime_map is if it is
longer.
+ Avoid '//' in temporary filenames (cosmetic only).
+ Extend --filter to handle commands which produce HTML on stdout.
+ Don't report an error if a file is deleted (or renamed) between us reading
the directory entry for it and trying to read the file itself by default.
In --verbose mode, the situation is still reported, but now with a
specific message.
+ If omindex receives any of the signals SIGHUP, SIGINT, SIGQUIT or SIGTERM,
then kill any active external filter child process, then handle the signal
as we did before. If setpgid() is available, put each external filter in
its own process group and kill the whole process group when we get a
signal.
+ Use magic_descriptor() if the version of libmagic we're building against
is new enough to have it. This eliminates an extra opening of a file
being indexed in certain cases.
+ Use rst2html to handle .rst and .rest files.
+ Index title with an 'S' prefix rather than no prefix.
+ If the document with the highest existing docid before the run was updated,
we were reporting it as "added", but now we correctly report it as
"updated".
+ Catch and report std::exception explicitly, so failing to allocate memory
is no longer reported as "Unknown exception".
omindex-list: New tool to list URLs of all the documents in a database
(or list of databases) indexed by omindex.
* The HTML parser now explicitly handles <APPLET>, <OBJECT> and <TR>.
* Use a generated compact and efficient table to convert HTML tag names
to enum codes - this is both faster and smaller than the approach we were
using, with the benefit that the table is auto-generated.
* Always use our built-in conversion code for the character sets it can handle
(previously we'd use iconv if available; now we only use iconv for other
character sets). This gives us more consistent results, and in particular
means we now handle BOMs better (at least when using GNU iconv).
* A lot of data labelled as "iso-8859-1" is actually "windows-1252". The two
only differ in characters which are control characters in iso-8859-1, so
assume the latter when we see the former.
scriptindex:
+ Remove special error handling case noting that index=nopos was replaced
with indexnopos - this was removed in 1.1.0 so there's been enough time to
upgrade.
omega:
* Add support for sorting by more than one value - e.g. SORT=+1,-2
* Add $msizelower and $msizeupper which provide access to the lower and upper
bounds on the number of matches.
* Add support for $set{weighting,coord}.
* Add weightingpurefilter option. Normally a query consisting only of filter
terms won't have relevance weights calculated. This new option allows you to
specify a weighting scheme to use for such queries, with the same values
supported as for the existing weighting option. For example,
$set{weightingpurefilter,coord} will weight such queries by how many filter
terms match each document.
* $filters now includes DATEVALUE, which means we'll force the first page when
reloading or changing page starting from existing URLs upon upgrade to 1.4.1,
but the exact same existing URL could be for a search without the date filter
where we want to force the first page, so there's an inherent ambiguity
there. Forcing first page in this case seems the least problematic
side-effect.
* Implement $match command for omegascript. Patch from Richhiey Thomas.
* Add optional prefix argument to $terms.
* $snippet now uses MSet::snippet() instead of the Snipper class.
* Add $contains{STRING1,STRING2}. Contributed by Ayush Gupta.
* Add support for negated boolean filter terms, specified by CGI parameter "N".
* Support a direction prefix on SORT: '+' for ascending, '-' for descending.
SORTREVERSE set to non-0 now flips the direction. Fixes #697, reported by
Andy Chilton.
* Add options argument to $transform.
* Cache compiled regexps used in $transform.
* Add $ord OmegaScript command which returns the Unicode codepoint for the
first character of a UTF-8 string.
* Add $chr OmegaScript command which returns the UTF-8 string for given Unicode
codepoint.
* Add $csv OmegaScript command which escapes a string for use as a field in a
CSV file ("always quote" mode inspired by patch from Gaurav Arora.)
* New $filters encoding which avoids collisions. We also compare CGI parameter
xFILTERS to what $filters would have returned in previous releases, so that
on upgrades old format serialised filters are handled correctly.
* Fix $jsonarray not to prepend ']' to the first array element.
* Skip weighting scheme setup for a pure date range query - it won't be
weighted anyway, so we can avoid having to parse weighting scheme parameters,
etc.
* Use value ranges when date range filtering by value. Should be more
efficient than a MatchDecider, and will automatically take advantage of any
future value range optimisations in xapian-core.
* Add default_db and default_template config options. These allow the default
template and default database name to be set via the config file, rather than
being stuck with the respective defaults of "default" and "query". Fixes
#310, reported by Marco Hennigs.
* Add support for non-exclusive filters. Fixes #234, reported by Thomas
Viehmann.
* Fix handling of multiple P.<prefix> fields - previously only the first seen
was used. These fields are also now taken into account when deciding if the
query has changed. $query now returns an OmegaScript list with one entry for
each CGI parameter passed.
* Allow setting query expansion scheme to "bo1".
* Make the $json and $jsonarray force the text to be valid UTF-8, since
otherwise the output isn't valid JSON.
* Check parameters to $set{weighting,bm25 ...} and $set{weighting,trad ...}
converted OK. Based on patch from Aarsh Shah.
* Add support to $set{weighting,...} for bb2, dlh, dph, ifb2, ineb2, inl2, lm,
pl2 when we're built against a xapian-core which is new enough to have these
schemes.
* Add $snippet to generate a snippet of text tailored to the search.
* Add new $json and $jsonarray OmegaScript commands to support producing JSON
output.
* Add $truncate command which truncates a string after a word.
* Add support for $set{weighting,tfidf} to allow the new TfIdfWeight weighting
scheme to be used.
+ DEFAULTOP now defaults to AND rather than OR, since that matches what pretty
much every search engine does these days. Closes ticket#512.
* Allow mapping a query string prefix to more than one term prefix (which
xapian-core has supported since 1.0.4).
* Add support for search inputs for multiple probabilistic prefixes, with
support for per-prefix stemmers.
* Drop legacy support for handling '.' separated terms in xP - that changed in
Omega 0.9.7, more than 5 years ago now.
* Remove support for OLDP CGI parameter which was superseded by xP
approximately a decade ago, and isn't even documented!
* Drop special handling for R-prefixed terms in $prettyterm - we stopped
generating these in Xapian 1.0.
templates:
* Lower case all HTML tags, attributes and values; explicitly close <option>
tags. Patches from Vivek Pal and Nirmal Singhania.
* Migrate Omega Templates to HTML5. Patch from Nirmal Sighania.
* templates/query: Remove stray double quote from generated URL for spelling
suggestion when THRESHOLD is set. Patch from Nirmal Singhania.
* templates/opensearch: Change response feeds to support OpenSearch 1.1.
Patch from Nirmal Singhania.
* templates/query: Fix setting setting of prefix map for P - in 1.3.2, this
would failed to also search in the subject. Now it also searches in the
subject and topic.
* templates/query:
+ We now map unprefixed queries to include S-prefixed terms to match the
change in omindex to prefixing terms from the title with S. You may want
to make the same update to your own templates.
+ Set up prefixes for 'author:' and 'title:'.
2016-11-07 14:02:45 +01:00
|
|
|
libexec/cgi-bin/mhtml2html
|
2012-01-10 02:03:59 +01:00
|
|
|
libexec/cgi-bin/outlookmsg2html
|
Update to 1.4.1. From the changelog:
omindex:
+ Also index leafname with _ and & replaced by spaces. Literal spaces are
often avoided in filenames, and "hello_world.txt" ought to be searchable for
via "hello" and "world". Partly addresses #618, reported by Julien
Pfefferkorn.
+ Make named entity look-up (e.g. é -> 233) use the same keyword-lookup
table approach we already use for HTML tags and built-in MIME content-types,
rather than a std::map, which makes it faster while using less memory.
+ Avoid using the shell to run most external commands as it's unnecessary
overhead. For the built-in filters, the only cases which now use a shell
are where we run two unzip commands. For user-specified commands, a simple
and slightly conservative test is used, which should avoid a shell in most
common cases where it isn't needed. Notably, environment variables set
before the command are handled.
+ Track files which couldn't be indexed in the user metadata and skip them by
default on subsequent runs to avoid the costs of repeatedly running a
filter on a file it can't handle. Run omindex with --retry-failed to retry
such files.
+ Overhaul the "per-site" terms:
- 'H' prefix is hostname as before, except that if the term would be > 240
bytes (unlikely but possible) the end is hashed is the same way 'U'
prefix terms are.
- 'P' terms are now added for every directory level, not just the start
URL's path.
- A new 'J' prefix term is added with the start URL (less any trailing
'/'), which means all files indexed from a particular "site" are now
indexed by one term. See #376.
+ Add 'skip' pseudo-mimetype which extensions can be mapped to, and they will
then be reported and skipped (to complement the existing 'ignore'
pseudo-mimetype which causes files with the specified extension to be
quietly ignored).
+ Treat a command of 'true' specially as meaning make the text extraction a
no-op (as actually running /bin/true effectively would). This provides a
way to index some file types by only meta-data. Fixes #519, reported by
Brian Burton.
+ Add support for wildcard mimetypes */* and *. Combined with filter command
``true`` for indexing by meta-data only, you can specify a fall back case
of indexing by meta-data only using ``--filter '*:true'``. From a
suggestion by Brian Burton on xapian-discuss.
+ Index message/rfc822 and message/news. These are individually saved email
messages and news articles.
+ Index archived web page formats MAFF and MHTML.
+ Handle .xla, yet another XL extension.
+ Handle metadata in LibreOffice HTML export (dcterms.subject,
dcterms.description, dcterms.creator and dcterms.contributor).
+ Use zlib's gzopen() instead of invoking "gzip -dc" for compressed Abiword
documents.
+ Add support for %f in command passed to --filter to allow specifying
commands where the input file is not the final argument. Fixed #570,
reported by Charles Atkinson.
+ Allow --filter to handle commands which produce output in a temporary file
rather than on stdout.
+ Allow --filter to specify the character set of the output the filter
produces.
+ Handle application/vnd.ms-excel, text/x-perl and application/x-dvi via
default --filter settings instead of hardcoded cases (now possible thanks
to the new abilities that --filter has).
+ Add support for specifying a MIME subtype of '*' in --filter arguments.
+ Add -track-ctime option to allow omindex to pick up changes to file
ownership and permissions.
+ Index terms from the leafname with an 'F' prefix, rather than treating them
as more body text. (Fixes #633, reported by Emmanuel Garette)
+ The starting URL wasn't previously URL encoded. In 1.2.18, a minimally
intrusive fix was implemented. In 1.3.2, we now encode the starting URL
as we do for the rest of the filename.
+ Don't assume .doc is application/msword but let libmagic decide, since .doc
files may actually be RTF, and sometimes people use .doc for plain-text
documentation.
+ Add support for indexing 'topic' and 'created date' meta-data for
OpenDocument format and HTML.
+ Index "topic" for PDF documents.
+ Commit changes and exit, rather than skipping the current file on most
unexpected errors reading directories or initialising libmagic - otherwise
we can end up deleting a lot of database entries on errors like EHOSTDOWN
when indexing network mounts.
+ Add --opendir-sleep=SECS option to allow working around problems with
indexing files on Microsoft DFS shares.
+ If we get ENOTDIR trying to index a file, skip it quietly (unless in
verbose mode) as we already do if we get ENOENT, since ENOTDIR is what we
get if the file and the directory it was in got removed between us getting
the filename and trying to open it.
+ Handle ENOENT, ENOTDIR and EACCES from readdir().
+ If we've already opened the file (as we often will have if using a modern
libmagic with magic_descriptor() available), then use fstat() on that fd
rather than stat()/lstat() on the pathname.
+ Pass error message string and errno value in ReadError exceptions.
+ Report strerror(errno) if we can't read a file.
+ Filtering via text/html now handles HTML documents which specify a charset.
+ Add support for indexing Microsoft Publisher files using pub2xhtml.
+ Restrict the length of what we consider to be an extension, currently to 7
characters or whatever the longest extension in the mime_map is if it is
longer.
+ Avoid '//' in temporary filenames (cosmetic only).
+ Extend --filter to handle commands which produce HTML on stdout.
+ Don't report an error if a file is deleted (or renamed) between us reading
the directory entry for it and trying to read the file itself by default.
In --verbose mode, the situation is still reported, but now with a
specific message.
+ If omindex receives any of the signals SIGHUP, SIGINT, SIGQUIT or SIGTERM,
then kill any active external filter child process, then handle the signal
as we did before. If setpgid() is available, put each external filter in
its own process group and kill the whole process group when we get a
signal.
+ Use magic_descriptor() if the version of libmagic we're building against
is new enough to have it. This eliminates an extra opening of a file
being indexed in certain cases.
+ Use rst2html to handle .rst and .rest files.
+ Index title with an 'S' prefix rather than no prefix.
+ If the document with the highest existing docid before the run was updated,
we were reporting it as "added", but now we correctly report it as
"updated".
+ Catch and report std::exception explicitly, so failing to allocate memory
is no longer reported as "Unknown exception".
omindex-list: New tool to list URLs of all the documents in a database
(or list of databases) indexed by omindex.
* The HTML parser now explicitly handles <APPLET>, <OBJECT> and <TR>.
* Use a generated compact and efficient table to convert HTML tag names
to enum codes - this is both faster and smaller than the approach we were
using, with the benefit that the table is auto-generated.
* Always use our built-in conversion code for the character sets it can handle
(previously we'd use iconv if available; now we only use iconv for other
character sets). This gives us more consistent results, and in particular
means we now handle BOMs better (at least when using GNU iconv).
* A lot of data labelled as "iso-8859-1" is actually "windows-1252". The two
only differ in characters which are control characters in iso-8859-1, so
assume the latter when we see the former.
scriptindex:
+ Remove special error handling case noting that index=nopos was replaced
with indexnopos - this was removed in 1.1.0 so there's been enough time to
upgrade.
omega:
* Add support for sorting by more than one value - e.g. SORT=+1,-2
* Add $msizelower and $msizeupper which provide access to the lower and upper
bounds on the number of matches.
* Add support for $set{weighting,coord}.
* Add weightingpurefilter option. Normally a query consisting only of filter
terms won't have relevance weights calculated. This new option allows you to
specify a weighting scheme to use for such queries, with the same values
supported as for the existing weighting option. For example,
$set{weightingpurefilter,coord} will weight such queries by how many filter
terms match each document.
* $filters now includes DATEVALUE, which means we'll force the first page when
reloading or changing page starting from existing URLs upon upgrade to 1.4.1,
but the exact same existing URL could be for a search without the date filter
where we want to force the first page, so there's an inherent ambiguity
there. Forcing first page in this case seems the least problematic
side-effect.
* Implement $match command for omegascript. Patch from Richhiey Thomas.
* Add optional prefix argument to $terms.
* $snippet now uses MSet::snippet() instead of the Snipper class.
* Add $contains{STRING1,STRING2}. Contributed by Ayush Gupta.
* Add support for negated boolean filter terms, specified by CGI parameter "N".
* Support a direction prefix on SORT: '+' for ascending, '-' for descending.
SORTREVERSE set to non-0 now flips the direction. Fixes #697, reported by
Andy Chilton.
* Add options argument to $transform.
* Cache compiled regexps used in $transform.
* Add $ord OmegaScript command which returns the Unicode codepoint for the
first character of a UTF-8 string.
* Add $chr OmegaScript command which returns the UTF-8 string for given Unicode
codepoint.
* Add $csv OmegaScript command which escapes a string for use as a field in a
CSV file ("always quote" mode inspired by patch from Gaurav Arora.)
* New $filters encoding which avoids collisions. We also compare CGI parameter
xFILTERS to what $filters would have returned in previous releases, so that
on upgrades old format serialised filters are handled correctly.
* Fix $jsonarray not to prepend ']' to the first array element.
* Skip weighting scheme setup for a pure date range query - it won't be
weighted anyway, so we can avoid having to parse weighting scheme parameters,
etc.
* Use value ranges when date range filtering by value. Should be more
efficient than a MatchDecider, and will automatically take advantage of any
future value range optimisations in xapian-core.
* Add default_db and default_template config options. These allow the default
template and default database name to be set via the config file, rather than
being stuck with the respective defaults of "default" and "query". Fixes
#310, reported by Marco Hennigs.
* Add support for non-exclusive filters. Fixes #234, reported by Thomas
Viehmann.
* Fix handling of multiple P.<prefix> fields - previously only the first seen
was used. These fields are also now taken into account when deciding if the
query has changed. $query now returns an OmegaScript list with one entry for
each CGI parameter passed.
* Allow setting query expansion scheme to "bo1".
* Make the $json and $jsonarray force the text to be valid UTF-8, since
otherwise the output isn't valid JSON.
* Check parameters to $set{weighting,bm25 ...} and $set{weighting,trad ...}
converted OK. Based on patch from Aarsh Shah.
* Add support to $set{weighting,...} for bb2, dlh, dph, ifb2, ineb2, inl2, lm,
pl2 when we're built against a xapian-core which is new enough to have these
schemes.
* Add $snippet to generate a snippet of text tailored to the search.
* Add new $json and $jsonarray OmegaScript commands to support producing JSON
output.
* Add $truncate command which truncates a string after a word.
* Add support for $set{weighting,tfidf} to allow the new TfIdfWeight weighting
scheme to be used.
+ DEFAULTOP now defaults to AND rather than OR, since that matches what pretty
much every search engine does these days. Closes ticket#512.
* Allow mapping a query string prefix to more than one term prefix (which
xapian-core has supported since 1.0.4).
* Add support for search inputs for multiple probabilistic prefixes, with
support for per-prefix stemmers.
* Drop legacy support for handling '.' separated terms in xP - that changed in
Omega 0.9.7, more than 5 years ago now.
* Remove support for OLDP CGI parameter which was superseded by xP
approximately a decade ago, and isn't even documented!
* Drop special handling for R-prefixed terms in $prettyterm - we stopped
generating these in Xapian 1.0.
templates:
* Lower case all HTML tags, attributes and values; explicitly close <option>
tags. Patches from Vivek Pal and Nirmal Singhania.
* Migrate Omega Templates to HTML5. Patch from Nirmal Sighania.
* templates/query: Remove stray double quote from generated URL for spelling
suggestion when THRESHOLD is set. Patch from Nirmal Singhania.
* templates/opensearch: Change response feeds to support OpenSearch 1.1.
Patch from Nirmal Singhania.
* templates/query: Fix setting setting of prefix map for P - in 1.3.2, this
would failed to also search in the subject. Now it also searches in the
subject and topic.
* templates/query:
+ We now map unprefixed queries to include S-prefixed terms to match the
change in omindex to prefixing terms from the title with S. You may want
to make the same update to your own templates.
+ Set up prefixes for 'author:' and 'title:'.
2016-11-07 14:02:45 +01:00
|
|
|
libexec/cgi-bin/rfc822tohtml
|
Update to 1.4.4. From the changelog:
indexers:
* omindex:
+ 1.4.3 added a new --sample option, but contrary to the documentation
the default behaviour was to take the sample from the meta description
(which was the hard-wired behaviour in 1.4.2 and earlier). The default
has now been changed to take the sample from the body.
+ Index .shtm, .xhtml and .xhtm as HTML by default - .shtm is another
extension used for server-parsed HTML (in addition to the more common
.shtml), and .xhtm and .xhtml are XHTML.
+ Fix fallback lookup for extension containing upper case. User mappings
worked, but built-in extension to MIME type mappings were effectively being
ignored (because the result of the function call was not being checked).
Bug introduced in 1.3.4.
+ Fix term-based date ranges, broken by changes in 1.4.2. Found and
diagnosed by Gaurav Arora.
+ Handle date range with start after end better - with term-based ranges,
this used to generate a bogus filter, but now just generates Dlatest.
+ Use Y-term when range starts/ends at year start/end. Previously we used 12
M-terms for these cases.
+ Use full leap-year check when constructing term-based date ranges -
previous code was good until 2100, but even then it would only result
in an extra term being included for a non-existent February 29th in
rare cases.
+ Add support for indexing vCard files if Perl and its Text::vCard module
are available.
+ Recognise application/x-rpm as alternative type since libmagic reports this
rather than application/x-redhat-package-manager.
+ Use official MIME type application/vnd.debian.binary-package for debian
packages. We used to map .deb and .udeb to application/x-debian-package,
but in 2014 (after we added that support for .deb) an official type was
registered with IANA. We now map extensions .deb and .udeb to the official
type, but the unofficial type is still recognised (older versions of
libmagic probably report it, and users may be mapping to it).
+ Handle PHP as MIME type text/x-php. The main difference this makes is that
PHP files which don't have extension '.php' (e.g. .phtml, .phps, .php5,
.ph4, etc) get identified by libmagic as text/x-php and will now be indexed.
It also means that the user can now more easily configure different filters
for HTML and PHP.
+ Don't use meta description as sample by default. Now we have dynamic
snippets (via $snippet), the body text is a better default. Also generated
HTML sometimes has unhelpful content in the meta description. To get the
previous behaviour, use the new omindex command line option:
--sample=description
omega:
* New OmegaScript command $cgiparams which returns a list of the parameter
names.
* Handle tab in a CGI parameter name in the same way as space. Mostly this is
a way to avoid having tabs in CGI parameter names - they aren't useful, but
if they could have tabs in we can't put CGI parameter names in a list.
templates:
* query: Fix highlighting of matching terms. We were using both $snippet and
$highlight, which results in double highlighting and HTML escaping, most
noticeable by literal <strong> and </strong> appearing around matching terms
in the rendered HTML snippet. Reported by Mark Thomas on xapian-discuss.
build system:
* If gen-mimemap failed after creating mimemap.h, the rule wouldn't get rerun.
2017-07-10 00:27:47 +02:00
|
|
|
libexec/cgi-bin/vcard2text
|
2008-07-27 01:37:29 +02:00
|
|
|
libexec/cgi-bin/xapian-omega
|
Update to 1.4.1. From the changelog:
omindex:
+ Also index leafname with _ and & replaced by spaces. Literal spaces are
often avoided in filenames, and "hello_world.txt" ought to be searchable for
via "hello" and "world". Partly addresses #618, reported by Julien
Pfefferkorn.
+ Make named entity look-up (e.g. é -> 233) use the same keyword-lookup
table approach we already use for HTML tags and built-in MIME content-types,
rather than a std::map, which makes it faster while using less memory.
+ Avoid using the shell to run most external commands as it's unnecessary
overhead. For the built-in filters, the only cases which now use a shell
are where we run two unzip commands. For user-specified commands, a simple
and slightly conservative test is used, which should avoid a shell in most
common cases where it isn't needed. Notably, environment variables set
before the command are handled.
+ Track files which couldn't be indexed in the user metadata and skip them by
default on subsequent runs to avoid the costs of repeatedly running a
filter on a file it can't handle. Run omindex with --retry-failed to retry
such files.
+ Overhaul the "per-site" terms:
- 'H' prefix is hostname as before, except that if the term would be > 240
bytes (unlikely but possible) the end is hashed is the same way 'U'
prefix terms are.
- 'P' terms are now added for every directory level, not just the start
URL's path.
- A new 'J' prefix term is added with the start URL (less any trailing
'/'), which means all files indexed from a particular "site" are now
indexed by one term. See #376.
+ Add 'skip' pseudo-mimetype which extensions can be mapped to, and they will
then be reported and skipped (to complement the existing 'ignore'
pseudo-mimetype which causes files with the specified extension to be
quietly ignored).
+ Treat a command of 'true' specially as meaning make the text extraction a
no-op (as actually running /bin/true effectively would). This provides a
way to index some file types by only meta-data. Fixes #519, reported by
Brian Burton.
+ Add support for wildcard mimetypes */* and *. Combined with filter command
``true`` for indexing by meta-data only, you can specify a fall back case
of indexing by meta-data only using ``--filter '*:true'``. From a
suggestion by Brian Burton on xapian-discuss.
+ Index message/rfc822 and message/news. These are individually saved email
messages and news articles.
+ Index archived web page formats MAFF and MHTML.
+ Handle .xla, yet another XL extension.
+ Handle metadata in LibreOffice HTML export (dcterms.subject,
dcterms.description, dcterms.creator and dcterms.contributor).
+ Use zlib's gzopen() instead of invoking "gzip -dc" for compressed Abiword
documents.
+ Add support for %f in command passed to --filter to allow specifying
commands where the input file is not the final argument. Fixed #570,
reported by Charles Atkinson.
+ Allow --filter to handle commands which produce output in a temporary file
rather than on stdout.
+ Allow --filter to specify the character set of the output the filter
produces.
+ Handle application/vnd.ms-excel, text/x-perl and application/x-dvi via
default --filter settings instead of hardcoded cases (now possible thanks
to the new abilities that --filter has).
+ Add support for specifying a MIME subtype of '*' in --filter arguments.
+ Add -track-ctime option to allow omindex to pick up changes to file
ownership and permissions.
+ Index terms from the leafname with an 'F' prefix, rather than treating them
as more body text. (Fixes #633, reported by Emmanuel Garette)
+ The starting URL wasn't previously URL encoded. In 1.2.18, a minimally
intrusive fix was implemented. In 1.3.2, we now encode the starting URL
as we do for the rest of the filename.
+ Don't assume .doc is application/msword but let libmagic decide, since .doc
files may actually be RTF, and sometimes people use .doc for plain-text
documentation.
+ Add support for indexing 'topic' and 'created date' meta-data for
OpenDocument format and HTML.
+ Index "topic" for PDF documents.
+ Commit changes and exit, rather than skipping the current file on most
unexpected errors reading directories or initialising libmagic - otherwise
we can end up deleting a lot of database entries on errors like EHOSTDOWN
when indexing network mounts.
+ Add --opendir-sleep=SECS option to allow working around problems with
indexing files on Microsoft DFS shares.
+ If we get ENOTDIR trying to index a file, skip it quietly (unless in
verbose mode) as we already do if we get ENOENT, since ENOTDIR is what we
get if the file and the directory it was in got removed between us getting
the filename and trying to open it.
+ Handle ENOENT, ENOTDIR and EACCES from readdir().
+ If we've already opened the file (as we often will have if using a modern
libmagic with magic_descriptor() available), then use fstat() on that fd
rather than stat()/lstat() on the pathname.
+ Pass error message string and errno value in ReadError exceptions.
+ Report strerror(errno) if we can't read a file.
+ Filtering via text/html now handles HTML documents which specify a charset.
+ Add support for indexing Microsoft Publisher files using pub2xhtml.
+ Restrict the length of what we consider to be an extension, currently to 7
characters or whatever the longest extension in the mime_map is if it is
longer.
+ Avoid '//' in temporary filenames (cosmetic only).
+ Extend --filter to handle commands which produce HTML on stdout.
+ Don't report an error if a file is deleted (or renamed) between us reading
the directory entry for it and trying to read the file itself by default.
In --verbose mode, the situation is still reported, but now with a
specific message.
+ If omindex receives any of the signals SIGHUP, SIGINT, SIGQUIT or SIGTERM,
then kill any active external filter child process, then handle the signal
as we did before. If setpgid() is available, put each external filter in
its own process group and kill the whole process group when we get a
signal.
+ Use magic_descriptor() if the version of libmagic we're building against
is new enough to have it. This eliminates an extra opening of a file
being indexed in certain cases.
+ Use rst2html to handle .rst and .rest files.
+ Index title with an 'S' prefix rather than no prefix.
+ If the document with the highest existing docid before the run was updated,
we were reporting it as "added", but now we correctly report it as
"updated".
+ Catch and report std::exception explicitly, so failing to allocate memory
is no longer reported as "Unknown exception".
omindex-list: New tool to list URLs of all the documents in a database
(or list of databases) indexed by omindex.
* The HTML parser now explicitly handles <APPLET>, <OBJECT> and <TR>.
* Use a generated compact and efficient table to convert HTML tag names
to enum codes - this is both faster and smaller than the approach we were
using, with the benefit that the table is auto-generated.
* Always use our built-in conversion code for the character sets it can handle
(previously we'd use iconv if available; now we only use iconv for other
character sets). This gives us more consistent results, and in particular
means we now handle BOMs better (at least when using GNU iconv).
* A lot of data labelled as "iso-8859-1" is actually "windows-1252". The two
only differ in characters which are control characters in iso-8859-1, so
assume the latter when we see the former.
scriptindex:
+ Remove special error handling case noting that index=nopos was replaced
with indexnopos - this was removed in 1.1.0 so there's been enough time to
upgrade.
omega:
* Add support for sorting by more than one value - e.g. SORT=+1,-2
* Add $msizelower and $msizeupper which provide access to the lower and upper
bounds on the number of matches.
* Add support for $set{weighting,coord}.
* Add weightingpurefilter option. Normally a query consisting only of filter
terms won't have relevance weights calculated. This new option allows you to
specify a weighting scheme to use for such queries, with the same values
supported as for the existing weighting option. For example,
$set{weightingpurefilter,coord} will weight such queries by how many filter
terms match each document.
* $filters now includes DATEVALUE, which means we'll force the first page when
reloading or changing page starting from existing URLs upon upgrade to 1.4.1,
but the exact same existing URL could be for a search without the date filter
where we want to force the first page, so there's an inherent ambiguity
there. Forcing first page in this case seems the least problematic
side-effect.
* Implement $match command for omegascript. Patch from Richhiey Thomas.
* Add optional prefix argument to $terms.
* $snippet now uses MSet::snippet() instead of the Snipper class.
* Add $contains{STRING1,STRING2}. Contributed by Ayush Gupta.
* Add support for negated boolean filter terms, specified by CGI parameter "N".
* Support a direction prefix on SORT: '+' for ascending, '-' for descending.
SORTREVERSE set to non-0 now flips the direction. Fixes #697, reported by
Andy Chilton.
* Add options argument to $transform.
* Cache compiled regexps used in $transform.
* Add $ord OmegaScript command which returns the Unicode codepoint for the
first character of a UTF-8 string.
* Add $chr OmegaScript command which returns the UTF-8 string for given Unicode
codepoint.
* Add $csv OmegaScript command which escapes a string for use as a field in a
CSV file ("always quote" mode inspired by patch from Gaurav Arora.)
* New $filters encoding which avoids collisions. We also compare CGI parameter
xFILTERS to what $filters would have returned in previous releases, so that
on upgrades old format serialised filters are handled correctly.
* Fix $jsonarray not to prepend ']' to the first array element.
* Skip weighting scheme setup for a pure date range query - it won't be
weighted anyway, so we can avoid having to parse weighting scheme parameters,
etc.
* Use value ranges when date range filtering by value. Should be more
efficient than a MatchDecider, and will automatically take advantage of any
future value range optimisations in xapian-core.
* Add default_db and default_template config options. These allow the default
template and default database name to be set via the config file, rather than
being stuck with the respective defaults of "default" and "query". Fixes
#310, reported by Marco Hennigs.
* Add support for non-exclusive filters. Fixes #234, reported by Thomas
Viehmann.
* Fix handling of multiple P.<prefix> fields - previously only the first seen
was used. These fields are also now taken into account when deciding if the
query has changed. $query now returns an OmegaScript list with one entry for
each CGI parameter passed.
* Allow setting query expansion scheme to "bo1".
* Make the $json and $jsonarray force the text to be valid UTF-8, since
otherwise the output isn't valid JSON.
* Check parameters to $set{weighting,bm25 ...} and $set{weighting,trad ...}
converted OK. Based on patch from Aarsh Shah.
* Add support to $set{weighting,...} for bb2, dlh, dph, ifb2, ineb2, inl2, lm,
pl2 when we're built against a xapian-core which is new enough to have these
schemes.
* Add $snippet to generate a snippet of text tailored to the search.
* Add new $json and $jsonarray OmegaScript commands to support producing JSON
output.
* Add $truncate command which truncates a string after a word.
* Add support for $set{weighting,tfidf} to allow the new TfIdfWeight weighting
scheme to be used.
+ DEFAULTOP now defaults to AND rather than OR, since that matches what pretty
much every search engine does these days. Closes ticket#512.
* Allow mapping a query string prefix to more than one term prefix (which
xapian-core has supported since 1.0.4).
* Add support for search inputs for multiple probabilistic prefixes, with
support for per-prefix stemmers.
* Drop legacy support for handling '.' separated terms in xP - that changed in
Omega 0.9.7, more than 5 years ago now.
* Remove support for OLDP CGI parameter which was superseded by xP
approximately a decade ago, and isn't even documented!
* Drop special handling for R-prefixed terms in $prettyterm - we stopped
generating these in Xapian 1.0.
templates:
* Lower case all HTML tags, attributes and values; explicitly close <option>
tags. Patches from Vivek Pal and Nirmal Singhania.
* Migrate Omega Templates to HTML5. Patch from Nirmal Sighania.
* templates/query: Remove stray double quote from generated URL for spelling
suggestion when THRESHOLD is set. Patch from Nirmal Singhania.
* templates/opensearch: Change response feeds to support OpenSearch 1.1.
Patch from Nirmal Singhania.
* templates/query: Fix setting setting of prefix map for P - in 1.3.2, this
would failed to also search in the subject. Now it also searches in the
subject and topic.
* templates/query:
+ We now map unprefixed queries to include S-prefixed terms to match the
change in omindex to prefixing terms from the title with S. You may want
to make the same update to your own templates.
+ Set up prefixes for 'author:' and 'title:'.
2016-11-07 14:02:45 +01:00
|
|
|
man/man1/omindex-list.1
|
2008-07-27 01:37:29 +02:00
|
|
|
man/man1/omindex.1
|
|
|
|
man/man1/scriptindex.1
|
|
|
|
share/doc/xapian-omega/cgiparams.html
|
Update to 1.2.21. From the changelog:
documentation:
* docs/overview.rst: Document 'E' prefixed boolean terms for filtering by
extension (see #668, reported by bramvdh).
* docs/encodings.rst: Add a document about character encoding, as suggested by
James Aylett in #550.
* docs/cgiparams.rst: Improve wording of docs for SORT parameter.
* docs/omegascript.rst: Update documentation references to DATE1, DATE2, and
DAYSMINUS which were renamed in 0.6.x and the compatibility aliases removed
in 1.0.0.
indexers:
* omindex:
+ outlookmsg2html: Fix handling of message/rfc822 subparts.
+ Ignore extensions .msi and .msp, which are Microsoft installer files, but
which libmagic sometimes incorrectly identifies as application/msword.
+ Interpret a command of "false" in "--filter" as meaning to ignore files
with that MIME type.
omega:
* $prettyurl now decodes valid UTF-8 sequences, and some additional ASCII
characters in the path part: []@!$&'()*+.;= (Fixes #550 and #644, reported by
catkin and terencz.)
* $prettyurl now leaves the query and fragment parts of the URL alone and won't
decode an escaped "/" (omindex doesn't create URLs with any of these, so we
only risk breaking other URLs which have them).
* Drop compilation date and time from output when run from the command line -
they prevent reproducible builds and the version number is sufficient
information.
* Handle CGI parameter [=0 as [=1.
templates:
* templates/query: When listing matching terms, don't make the commas italic.
* templates/query: Eliminate blank line before <html>.
* templates/xml: Add XML declaration.
* templates/godmode: Specify charset utf-8 in the content-type.
* templates/xml: Update handling of DATE1, DATE2 and DAYSMINUS which were
renamed in 0.6.x and the compatibility aliases removed in 1.0.0.
build system:
* Link test programs with libtool's '-no-install' or '-no-fast-install', like
we already do in xapian-core, which means that libtool doesn't need to
generate shell script wrappers for them on most platforms.
* configure: Use pkg-config in preference to determine flags needed to
compile and link with PCRE, as this will just work when cross-compiling
(at least under MXE).
* configure: Define MINGW_HAS_SECURE_API under mingw to get _putenv_s()
declared in stdlib.h.
* Enable automake option 'subdir-objects' to avoid warning from newer automake.
portability:
* Add spaces between literal strings and macros which expand to literal strings
for C++11 compatibility.
* Remove 'register' as it's deprecated and clang spits out warnings because of
that. Any modern compiler likely just ignores it as an optimisation hint
anyway.
* Avoid doing link tests with libmagic in configure as they fail on mingw due
to not automatically picking up libraries which libmagic itself depends on.
2015-05-23 20:21:16 +02:00
|
|
|
share/doc/xapian-omega/encodings.html
|
2012-01-10 02:03:59 +01:00
|
|
|
share/doc/xapian-omega/index.html
|
2008-07-27 01:37:29 +02:00
|
|
|
share/doc/xapian-omega/omegascript.html
|
|
|
|
share/doc/xapian-omega/overview.html
|
|
|
|
share/doc/xapian-omega/quickstart.html
|
|
|
|
share/doc/xapian-omega/scriptindex.html
|
|
|
|
share/doc/xapian-omega/termprefixes.html
|
|
|
|
share/examples/xapian-omega/omega.conf
|
|
|
|
share/xapian-omega/htdig2omega.script
|
|
|
|
share/xapian-omega/mbox2omega.script
|