html2text is a Python script that convers a page of HTML into clean,
easy-to-read plain ASCII text. Better yet, that ASCII also happens to
be valid Markdown (a text-to-HTML format).
WWW: http://www.aaronsw.com/2002/html2text/
Author: Aaron Swartz <me@aaronsw.com>
Inspired by: pkgsrc package
2007-04-10 textproc/ocaml-yaxi: Does not build
2007-04-10 ukrainian/pine.language: Leaves behind config file on deinstall
2007-04-10 www/mod_zap: Incomplete pkg-plist
2007-04-10 www/sahana2: Conflicting dependencies: php4 vs php5
2007-04-10 www/urchin5: Does not install
2007-04-07 databases/cyrus-smlacapd: this software is obsolete
Simple Blog Code is a simple markup language. You can use it for guest
books, blogs, wikis, boards and various other web applications. It
produces valid and semantic (X)HTML from input and is patterned on that
tiny usenet markups like *bold* and _underline_.
pdfoutline adds outlines (aka bookmarks) to PDF files. It reads input
file given as first argument, adds outlines from text file given as
second argument, and saves result to file with name given as third
argument.
WWW: http://sourceforge.net/projects/fntsample/
Author: Eugeniy Meshcheryakov <eugeniy@users.sourceforge.net>
It is a generic syntax highlighter for general use in all kinds of software
such as forum systems, wikis or other applications that need to prettify
source code. Highlights are:
* a wide range of common languages and markup formats is supported
* special attention is paid to details, increasing quality by a fair amount
* support for new languages and formats are added easily
* a number of output formats, presently HTML, LaTeX, RTF and ANSI sequences
* it is usable as a command-line tool and as a library
WWW: http://pygments.org/
Data::SpreadPagination can be used to create an easy to use spread pagination
navigator. It inherits from Data::Page, and in addition provides methods to
create a pagination spread, keeping pagenumbers displayed within a sensible
limit.
WWW: http://search.cpan.org/dist/Data-SpreadPagination/
PR: ports/110677
Submitted by: Sergei Vyshenski <svysh@pn.sinp.msu.ru>
Russian and German Languages. Version 2.
Finds the lemmas (all forms) of a word.
Written in C++.
WWW: http://www.aot.ru/
- Andrei V. Shetuhin
slonik-v-domene@mail.rureki@reki.ru
PR: ports/110137
Submitted by: Andrei V. Shetuhin
is a bit different on these points:
(1) The project is end-user oriented, that is, it tries to hide as much
as possible the latex compiling stuff by providing a single clean
script to produce directly DVI, PostScript and PDF output.
(2) The actual output rendering is done not only by the XSL stylesheets
transformation, but also by a dedicated LaTeX package. The purpose is
to allow a deep LaTeX customisation without changing the XSL
stylesheets.
(3) Post-processing is done by Python, to make publication faster,
convert the images if needed, and do the whole compilation.
WWW: http://dblatex.sourceforge.net/
PR: ports/109520
Submitted by: Peter Johnson <johnson.peter at gmail.com>
and at the same time be as close as possible to the original Java API.
This has the combined advantage of providing perl programmers with a
well-documented API and giving them access to a C++ search engine
library that is supposedly faster than the original.
WWW: http://search.cpan.org/dist/Lucene/
WWW: http://sourceforge.net/projects/clucene/
2006-12-30 textproc/ruby-htmlcompact: distfile and homepage disappeared
2006-12-30 textproc/ruby-rwv2: distfile disappeared and has no homepage
Approved by: erwin (mentor, implicit)
It can be used for programmatically access outside HTML-pages.
I hope to extend it to become a web-publishing framework in the future.
Author: Johannes Brodwall <johannes@brodwall.com>
WWW: http://rubyforge.org/projects/ruby-htmltools/
to another. It can read markdown and (subsets of) reStructuredText,
HTML, and LaTeX, and it can write markdown, reStructuredText, HTML,
LaTeX, DocBook, RTF, and S5 HTML slide shows.
Pandoc extends standard markdown syntax with footnotes, embedded LaTeX,
and other features. A compatibility mode is provided for those who
need a drop-in replacement for Markdown.pl. Included wrapper scripts
make it easy to convert markdown documents to PDFs and to convert web
pages to markdown documents.
In contrast to existing tools for converting markdown to HTML, which
use regex substitutions, pandoc has a modular design: it consists of a
set of readers, which parse text in a given format and produce a native
representation of the document, and a set of writers, which convert
this native representation into a target format. Thus, adding an input
or output format requires only adding a reader or writer.
WWW: http://sophos.berkeley.edu/macfarlane/pandoc/
PR: ports/109028
Submitted by: John MacFarlane <jgm at berkeley.edu>
Approved by: miwi (mentor)
for parsing, generating, and processing HTML, XML or other textual content
for output generation on the web. The major feature is a template language,
which is heavily inspired by Kid.
WWW: http://genshi.wedgewall.org/
Approved by: alexbl (mentor, implicit)
algorithms can either be applied directly to a dataset or called from your own
Java code. Weka contains tools for data pre-processing, classification,
regression, clustering, association rules, and visualization. It is also
well-suited for developing new machine learning schemes.
WWW: http://www.cs.waikato.ac.nz/ml/weka/
PR: ports/108143
Submitted by: Simon Olofsson <simon at olofsson.de>