Russian and German Languages. Version 2.
Finds the lemmas (all forms) of a word.
Written in C++.
WWW: http://www.aot.ru/
- Andrei V. Shetuhin
slonik-v-domene@mail.rureki@reki.ru
PR: ports/110137
Submitted by: Andrei V. Shetuhin
is a bit different on these points:
(1) The project is end-user oriented, that is, it tries to hide as much
as possible the latex compiling stuff by providing a single clean
script to produce directly DVI, PostScript and PDF output.
(2) The actual output rendering is done not only by the XSL stylesheets
transformation, but also by a dedicated LaTeX package. The purpose is
to allow a deep LaTeX customisation without changing the XSL
stylesheets.
(3) Post-processing is done by Python, to make publication faster,
convert the images if needed, and do the whole compilation.
WWW: http://dblatex.sourceforge.net/
PR: ports/109520
Submitted by: Peter Johnson <johnson.peter at gmail.com>
and at the same time be as close as possible to the original Java API.
This has the combined advantage of providing perl programmers with a
well-documented API and giving them access to a C++ search engine
library that is supposedly faster than the original.
WWW: http://search.cpan.org/dist/Lucene/
WWW: http://sourceforge.net/projects/clucene/
2006-12-30 textproc/ruby-htmlcompact: distfile and homepage disappeared
2006-12-30 textproc/ruby-rwv2: distfile disappeared and has no homepage
Approved by: erwin (mentor, implicit)
It can be used for programmatically access outside HTML-pages.
I hope to extend it to become a web-publishing framework in the future.
Author: Johannes Brodwall <johannes@brodwall.com>
WWW: http://rubyforge.org/projects/ruby-htmltools/
to another. It can read markdown and (subsets of) reStructuredText,
HTML, and LaTeX, and it can write markdown, reStructuredText, HTML,
LaTeX, DocBook, RTF, and S5 HTML slide shows.
Pandoc extends standard markdown syntax with footnotes, embedded LaTeX,
and other features. A compatibility mode is provided for those who
need a drop-in replacement for Markdown.pl. Included wrapper scripts
make it easy to convert markdown documents to PDFs and to convert web
pages to markdown documents.
In contrast to existing tools for converting markdown to HTML, which
use regex substitutions, pandoc has a modular design: it consists of a
set of readers, which parse text in a given format and produce a native
representation of the document, and a set of writers, which convert
this native representation into a target format. Thus, adding an input
or output format requires only adding a reader or writer.
WWW: http://sophos.berkeley.edu/macfarlane/pandoc/
PR: ports/109028
Submitted by: John MacFarlane <jgm at berkeley.edu>
Approved by: miwi (mentor)
for parsing, generating, and processing HTML, XML or other textual content
for output generation on the web. The major feature is a template language,
which is heavily inspired by Kid.
WWW: http://genshi.wedgewall.org/
Approved by: alexbl (mentor, implicit)
algorithms can either be applied directly to a dataset or called from your own
Java code. Weka contains tools for data pre-processing, classification,
regression, clustering, association rules, and visualization. It is also
well-suited for developing new machine learning schemes.
WWW: http://www.cs.waikato.ac.nz/ml/weka/
PR: ports/108143
Submitted by: Simon Olofsson <simon at olofsson.de>
Just select the text, click on the service item menu, choose
"Return the LaTeX rendering" and voila! Your text is replaced by
its LaTeX rendering.
WWW: http://www.roard.com/latexservice/
streams. It supports the whole XML 1.0 specifications, and can parse
any file that follows this standard (including the contents of the
DTD).
It also provides support for a number of other standard associated
with XML, like SAX and DOM.
In addition, It includes a module to manipulate Unicode streams, since
this is required by the XML standard.
This version of GtkAda is designed to be used with lang/gnat-gcc41.
WWW: https://libre2.adacore.com/xmlada/
WWW: http://gnuada.sourceforge.net/
Author: Petr Holub <hopet@ics.muni.cz>
PR: ports/107180
Submitted by: hopet at ics.muni.cz
LuceneKit is a class-to-class port of Lucene in GNUstep. It is a technology
suitable for nearly any application that requires full-text search.
WWW: http://www.etoile-project.org/
It uses OniGuruma as regular expression engine.
This is a GNUstep fork of OgreKit 2.1.2
<http://www8.ocn.ne.jp/~sonoisa/OgreKit/>.
Since it is a fork, the API may differ in the future.
Original licence of OgreKit is BSD License.
This fork uses also BSD license (see COPYING document).
WWW: http://www.etoile-project.org/
a classic GNU-style ChangeLog from a subversion repository log. It is made
from several changelog-like scripts using common xslt constructs found in
different places.
WWW: http://ch.tudelft.nl/~arthur/svn2cl/
PR: ports/107007
Submitted by: Alexander Logvinov <ports at logvinov.com>
a stack of flashcards, but handles one-to-many and many-to-one word
relationships better, and includes an integrated scheduler for efficient use
of your 'cards'. Popup was written by Bjorn Ghola and Rob Burns.
Features:
* An editor for cardstack files with support for copying and pasting groups
of words, as well as drag and drop.
* Three quiz styles: multiple choice, spelling, and flashcard.
* Supports quizes and practice
* Graduated time interval scheduler.
* Localized for Thai and German.
WWW: http://popup.sourceforge.net/
software tool that converts the plain text formatting to (X)HTML. The
formatting syntax is designed to be easy and intuitive for web authors
and resembles typical email formatting conventions. The resultant
(X)HTML is structurally valid.
WWW: http://www.freewisdom.org/projects/python-markdown
PR: ports/105992
Submitted by: Graham Todd <gtodd at bellanet.org>
technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization".
It was primarily developed for language guessing, a task on which it is known to
perform with near-perfect accuracy.
WWW: http://software.wise-guys.nl/libtextcat/
.strings files must be distributed in ASCII encoding, which generally
isn't a convenient encoding to do translation in. As an example, its rather
difficult to enter Chinese characters into an ASCII encoded text file.
Localize will, with any luck, help out with this. Currently its just a
shell of an application, but sometime in the future I hope to complete it.
WWW: http://www.eskimo.com/~pburns/Localize/
It provides a shared library to parse, generate, mainpulate and
validate XML documents from within your own application.
(Linux version)
WWW: http://xml.apache.org/xerces-c/
PR: ports/105275
Submitted by: Alexander Logvinov <ports at logvinov.com>
2. Commercial license is also available for embedded use.
Generally, it's a standalone search engine, meant to provide fast,
size-efficient and relevant fulltext search functions to other
applications. Sphinx was specially designed to integrate well with SQL
databases and scripting languages. Currently built-in data sources
support fetching data either via direct connection to MySQL, or from
an XML pipe.
As for the name, Sphinx is an acronym which is officially decoded as
SQL Phrase Index.
WWW: http://www.sphinxsearch.com/
PR: ports/105649
Submitted by: Matthew Seaman <m.seaman at infracaninophile.co.uk>
Unicode::Unihan - The Unihan Data Base 3.2.0
use Unicode::Unihan;
my $db = new Unicode::Unihan;
print join("," => $db->Mandarin("\x{5c0f}\x{98fc}\x{5f3e}"), "\n";
This module provides a user-friendly interface to the Unicode Unihan
Database 3.2. With this module, the Unihan database is as easy as shown in
above.
WWW: http://search.cpan.org/dist/Unicode-Unihan/