a022fa80ec
by Nassib Nassar and distributed as open source software under the terms of version 2 of the GNU General Public License (GPL). Its distinguishing features are indexing/search of semi-structured text (i.e. both free tex and multiply nested fields), built-in support for XML documents using the Xerces library, structured queries allowing generalized field/tag paths, hierarchical result sets (XML only), automatic searching across multiple databases (allowing modular indexing), TREC format results, efficient indexing, and relatively low memory requirements during indexing (and the ability to index documents larger than available memory). Z39.50 support is available. Other features include Boolean queries, right truncation, phrase searching, relevance ranking, support for multiple documents per file, incremental indexing, and easy integration with other UNIX tools, The architecture is also designed to permit proximity queries; however, they are not fully implemented at present. WWW: http://www.etymon.com/tr.html This port also includes the Porter stemming algorithm for suffix stripping, available at: http://www.tartarus.org/~martin/PorterStemmer PR: ports/127580 Submitted by: Pedro Giffuni
21 lines
1.2 KiB
Text
21 lines
1.2 KiB
Text
Amberfish is general purpose text retrieval software, developed at Etymon
|
|
by Nassib Nassar and distributed as open source software under the terms
|
|
of version 2 of the GNU General Public License (GPL). Its distinguishing
|
|
features are indexing/search of semi-structured text (i.e. both free tex
|
|
and multiply nested fields), built-in support for XML documents using the
|
|
Xerces library, structured queries allowing generalized field/tag paths,
|
|
hierarchical result sets (XML only), automatic searching across multiple
|
|
databases (allowing modular indexing), TREC format results, efficient
|
|
indexing, and relatively low memory requirements during indexing (and the
|
|
ability to index documents larger than available memory). Z39.50 support
|
|
is available. Other features include Boolean queries, right truncation,
|
|
phrase searching, relevance ranking, support for multiple documents per
|
|
file, incremental indexing, and easy integration with other UNIX tools,
|
|
The architecture is also designed to permit proximity queries; however,
|
|
they are not fully implemented at present.
|
|
|
|
WWW: http://www.etymon.com/tr.html
|
|
|
|
This port also includes the Porter stemming algorithm for suffix
|
|
stripping, available at:
|
|
http://www.tartarus.org/~martin/PorterStemmer
|