"The Saxon 8.0 package is a collection of tools for processing XML documents.
The main components are:
- An XSLT 2.0 processor, that can be used from the command line, or invoked
from a Java application by use of the standard JAXP API. Saxon can be
integrated with Java applications using the JAXP API, which means it is
possible for a Java application to switch between different XSLT processors
without changing the application code. As well as conforming closely with the
XSLT 2.0 specification, Saxon offers a number of powerful extensions.
- An XPath 2.0 processor accessible via an API to Java applications.
- An XQuery 1.0 processor that can be used from the command line, or invoked
from a Java application by use of an API.
- An XML Schema 1.0 processor. This can be used on its own to validate a schema
for correctness, or to validate a source document against the definitions in
a schema. It is also used to support the schema-aware functionality of the
XSLT and XQuery processors.
So you can use Saxon to process XML by writing XSLT stylesheets, by writing
XQuery queries, by writing Java applications, or by combinations of the
approaches."
PR: 68637
Submitted by: Herve Quiroz <herve.quiroz@esil.univ-mrs.fr>
randomizes the lines and outputs a specified number of lines. It does
this with only a single pass over the input while trying to use as little
memory as possible.
PR: ports/68182
Submitted by: David Sze <dsze@alumni.uwaterloo.ca>
read XML and XML-like data files in your application without
requiring large non-standard libraries. Mini-XML only
requires an ANSI C compatible compiler (GCC works, as do
most vendors' ANSI C compilers) and a "make" program.
PR: ports/67304
Submitted by: Osintsev Vladimir <oc@nm.ru>
Text::TabularDisplay simplifies displaying textual data in a table.
The output is identical to the columnar display of query results
in the mysql text monitor.
WWW: http://search.cpan.org/dist/Text-TabularDisplay
PR: ports/66804
Submitted by: Lars Thegler <lars@thegler.dk>
Linux binary instead. Current, it will use aspell core linux binary and
regular tarballs of language similar to what aspell is having.
This new port will be need for the next version of www/linux-opera.
--
Linux version of Aspell.
Aspell is a spelling checker designed to eventually replace ispell, although
it currently lacks many of ispell's basic functions. Aspell's main feature is
that it does a much better job of coming up with possible suggestions than
ispell. Aspell also includes a powerful C++ library with C and Perl interfaces
in the works.
WWW: http://aspell.sourceforge.net/
Approved by: adamw (mentor)
particular XPP2 but completely revised and rewritten to take best advantage of
latest JIT JVMs such as Hotspot in JDK 1.4.
MXP1 was designed to use best available the latest and the most advanced JIT
engines such as Hotspot in JDK1.4.
MXP1 has following features:
- fast - let me say it again it is fast :-)
- small - lot of performance packed in JAR file that is less than 20KB!
- easy to use - the parser implements common XML pull parsing API (XMLPULL)
described at http://www.xmlpull.org
Performance tests that compare MXP1 to other leading XML parsers are available
at http://www.extreme.indiana.edu/~aslom/xpp_sax2bench/
WWW: http://www.extreme.indiana.edu/soap/xpp/mxp1/
PR: ports/65066
Submitted by: Herve Quiroz <herve.quiroz@esil.univ-mrs.fr>
This module extends the functionality of Lingua::EN::Inflect
with three new functions available for export.
PR: ports/65148
Submitted by: Lars Thegler <lars@thegler.dk>
a simple and elegant pull parsing API that will provide a standardized way to
do pull XML parsing from J2ME to J2EE.
WWW: http://www.xmlpull.org
PR: ports/64948
Submitted by: Herve Quiroz <herve.quiroz@esil.univ-mrs.fr>
html-pretty (or htmlpty on file systems with unpleasant filename
length restrictions) is a prettyprinter for HTML and SGML. It can
also assist in the conversion of ordinary text files in ASCII or
ISO8859-1 character sets to HTML.
WWW: http://www.math.utah.edu/~beebe/software/html-sgml-tools.html#html-pretty
are not used to verb conjugation and number agreement. We especially
focus on people who're writing academic papers or business documents
where thorough checking is required. We aim to reduce this laborious
work with automated checking.
PR: ports/63472
Submitted by: Kimura Fuyuki <fuyuki@nigredo.org>
- Update to 2.4.2
- Respect GNOME hierarchy
- Give maintainership to submitter
NOTE that dictionary files have been splitted into separate ports
by languages, which will be committed shortly:
chinese/stardict2-dict-zh_CN
chinese/stardict2-dict-zh_TW
japanese/stardict2-dict-ja
PR: ports/62905
Submitted by: LI Dong <ld@FreeBSD.org.cn>
primarily for use in speech recognition, statistical tagging and segmentation.
It has been under development in the SRI Speech Technology and
Research Laboratory since 1995. The toolkit has also greatly benefitted from
its use and enhancements during the Johns Hopkins University/CLSP summer
workshops in 1995, 1996, and 1997
SRILM consists of the following components:
* A set of C++ class libraries implementing language models,
supporting data stuctures and miscellaneous utility functions.
* A set of executable programs built on top of these libraries to
perform standard tasks such as training LMs and testing them on
data, tagging or segmenting text, etc.
* A collection of miscellaneous scripts facilitating minor related tasks.
WWW: http://www.speech.sri.com/projects/srilm/
Author: stolcke@speech.sri.com
PR: 60810
Submitted by: Cheng-Lung Sung <clsung@dragon2.net>
XML::Generator is a simple perl module to help in the generation of XML.
Basically, you create an XML::Generator object and then call a method
for each tag, supplying the contents of that tag as parameters.
PR: ports/61720
Submitted by: Andreas Heil <ah@linux-hq.de>
Note that this port will be required to make textproc/p5-Lingua-Stem work
with all locales.
PR: 60859
Submitted by: Thorsten Greiner <thorsten.greiner@web.de>
from unrtf CHANGES :
Overall Program Change Log for GNU UnRTF
----------------------------------------
0.1: original version, known as "rtf2htm"
0.17.4: changed attr.c to use AttrStack (stack of stacks) paradigm
----program renamed UnRTF----
0.17.5: began implementation of output personalities; wrote HTML personality.
This port contains the SGML version of DocBook 4.2. Note
that DocBook 4.2 includes the XML DocBook DTD as part of
the SGML DTD distribution. This port is a superset of
textproc/docbook-xml, which includes only the XML DTD.
PR: ports/58695
Submitted by: Michael Edenfield <kutulu@kutulu.org>
30000 entries and, through additional word construction with hundreds
of prefixes and suffixes, may generate more, leading to many hundreds
of thousands of 'words' that can be formed by declension and
conjugation. This is also our first port written in ADA.
PR: ports/60822
Submitted by: Leland Wang <llwang@infor.org>
Skribe is a text processor. Even if it is a general purpose
tool, it best suits the writing of technical documents such
as web pages or technical reports, API documentations, etc.
At first glance, Skribe looks like a mark-up language ala
HTML. So, there is no need to be provided with computer
programming skills in order to use Skribe.
A second look reveals that Skribe is actually a true
programming language, provided with high level features
(such as objects, higher order functions, regular and
syntactic parsing, etc.). Skribe is based on the Scheme
programming language.
WWW: http://www-sop.inria.fr/mimosa/fp/Skribe/
PR: ports/60485
Submitted by: Kimura Fuyuki <fuyuki@nigredo.org>
IIIMF stands for Internet/Intranet Input Method Framework.
IIIMF is designed to be the next generation of input method framework
which provides the following capabilities;
* Multiplatform, platform independent.
* Multlingual and Full UNICODE support, but satisfactory for native speakers.
* Windowing System Independent.
* Multiple language engines concurrently run.
* Multiuser.
* Distributed, lightweight clients and scalable server.
* Extensible in multiple means.
* Input method protocol efficient enough to be used over low-speed modem
connection.
* Easy input method engine development with plugin API.
* Easy input method enabling with libiiimcf, even on console apps.
* Small core part to start from.
WWW: http://www.openi18n.org/subgroups/im/IIIMF/
- Kuang-che Wu
kcwu@csie.org
PR: ports/60087
Submitted by: Kuang-che Wu <kcwu@csie.org>
IIIMF stands for Internet/Intranet Input Method Framework.
IIIMF is designed to be the next generation of input method framework
which provides the following capabilities;
* Multiplatform, platform independent.
* Multlingual and Full UNICODE support, but satisfactory for native speakers.
* Windowing System Independent.
* Multiple language engines concurrently run.
* Multiuser.
* Distributed, lightweight clients and scalable server.
* Extensible in multiple means.
* Input method protocol efficient enough to be used over low-speed modem
connection.
* Easy input method engine development with plugin API.
* Easy input method enabling with libiiimcf, even on console apps.
* Small core part to start from.
WWW: http://www.openi18n.org/subgroups/im/IIIMF/
- Kuang-che Wu
kcwu@csie.org
PR: ports/60087
Submitted by: Kuang-che Wu <kcwu@csie.org>
IIIMF stands for Internet/Intranet Input Method Framework.
IIIMF is designed to be the next generation of input method framework
which provides the following capabilities;
* Multiplatform, platform independent.
* Multlingual and Full UNICODE support, but satisfactory for native speakers.
* Windowing System Independent.
* Multiple language engines concurrently run.
* Multiuser.
* Distributed, lightweight clients and scalable server.
* Extensible in multiple means.
* Input method protocol efficient enough to be used over low-speed modem
connection.
* Easy input method engine development with plugin API.
* Easy input method enabling with libiiimcf, even on console apps.
* Small core part to start from.
WWW: http://www.openi18n.org/subgroups/im/IIIMF/
- Kuang-che Wu
kcwu@csie.org
PR: ports/60087
Submitted by: Kuang-che Wu <kcwu@csie.org>
IIIMF stands for Internet/Intranet Input Method Framework.
IIIMF is designed to be the next generation of input method framework
which provides the following capabilities;
* Multiplatform, platform independent.
* Multlingual and Full UNICODE support, but satisfactory for native speakers.
* Windowing System Independent.
* Multiple language engines concurrently run.
* Multiuser.
* Distributed, lightweight clients and scalable server.
* Extensible in multiple means.
* Input method protocol efficient enough to be used over low-speed modem
connection.
* Easy input method engine development with plugin API.
* Easy input method enabling with libiiimcf, even on console apps.
* Small core part to start from.
WWW: http://www.openi18n.org/subgroups/im/IIIMF/
- Kuang-che Wu
kcwu@csie.org
PR: ports/60087
Submitted by: Kuang-che Wu <kcwu@csie.org>
IIIMF stands for Internet/Intranet Input Method Framework.
IIIMF is designed to be the next generation of input method framework
which provides the following capabilities;
* Multiplatform, platform independent.
* Multlingual and Full UNICODE support, but satisfactory for native speakers.
* Windowing System Independent.
* Multiple language engines concurrently run.
* Multiuser.
* Distributed, lightweight clients and scalable server.
* Extensible in multiple means.
* Input method protocol efficient enough to be used over low-speed modem
connection.
* Easy input method engine development with plugin API.
* Easy input method enabling with libiiimcf, even on console apps.
* Small core part to start from.
WWW: http://www.openi18n.org/subgroups/im/IIIMF/
- Kuang-che Wu
kcwu@csie.org
PR: ports/60087
Submitted by: Kuang-che Wu <kcwu@csie.org>
IIIMF stands for Internet/Intranet Input Method Framework.
IIIMF is designed to be the next generation of input method framework
which provides the following capabilities;
* Multiplatform, platform independent.
* Multlingual and Full UNICODE support, but satisfactory for native speakers.
* Windowing System Independent.
* Multiple language engines concurrently run.
* Multiuser.
* Distributed, lightweight clients and scalable server.
* Extensible in multiple means.
* Input method protocol efficient enough to be used over low-speed modem
connection.
* Easy input method engine development with plugin API.
* Easy input method enabling with libiiimcf, even on console apps.
* Small core part to start from.
WWW: http://www.openi18n.org/subgroups/im/IIIMF/
PR: ports/60087
Submitted by: Kuang-che Wu <kcwu@csie.org>
IIIMF stands for Internet/Intranet Input Method Framework.
IIIMF is designed to be the next generation of input method framework
which provides the following capabilities;
* Multiplatform, platform independent.
* Multlingual and Full UNICODE support, but satisfactory for native speakers.
* Windowing System Independent.
* Multiple language engines concurrently run.
* Multiuser.
* Distributed, lightweight clients and scalable server.
* Extensible in multiple means.
* Input method protocol efficient enough to be used over low-speed modem
connection.
* Easy input method engine development with plugin API.
* Easy input method enabling with libiiimcf, even on console apps.
* Small core part to start from.
WWW: http://www.openi18n.org/subgroups/im/IIIMF/
PR: ports/60087
Submitted by: Kuang-che Wu <kcwu@csie.org>
IIMF stands for Internet/Intranet Input Method Framework.
IIIMF is designed to be the next generation of input method framework
which provides the following capabilities;
* Multiplatform, platform independent.
* Multlingual and Full UNICODE support, but satisfactory for native speakers.
* Windowing System Independent.
* Multiple language engines concurrently run.
* Multiuser.
* Distributed, lightweight clients and scalable server.
* Extensible in multiple means.
* Input method protocol efficient enough to be used over low-speed modem
connection.
* Easy input method engine development with plugin API.
* Easy input method enabling with libiiimcf, even on console apps.
* Small core part to start from.
WWW: http://www.openi18n.org/subgroups/im/IIIMF/
PR: ports/60087
Submitted by: Kuang-che Wu <kcwu@csie.org>
The FoilTeX is a collection of LaTeX files for making foils. A number
of features are built-in including large sans serif font as normal font,
options for setting normalsize at 20pt (default), 17pt, 25pt or 30pt,
new macros for starting new foils, for special environments like Theorem
and Proof, simple macros to control the headline and footline.
WWW: http://www.ctan.org/tex-archive/nonfree/macros/latex/contrib/foiltex/
PR: 54372
Submitted by: Stefan Walter <sw@gegenunendlich.de>
PPower4 is used to post process presentations in PDF format which were
prepared using (La)TeX to add dynamic effects. The PDF files can be
created with pdf(la)tex, v(la)tex or with standard LaTeX and then
converted to PDF with dvipdfm.
WWW: http://www-sp.iti.informatik.tu-darmstadt.de/software/ppower4/
PR: 54335
Submitted by: Stefan Walter <sw@gegenunendlich.de>
dbacl is a digramic Bayesian text classifier. Given some text,
it calculates the posterior probabilities that the input resembles
one of any number of previously learned document collections.
It can be used to sort incoming email into arbitrary categories
such as spam, work, and play, or simply to distinguish an English text
from a French text. It fully supports international character sets,
and uses sophisticated statistical models based on the
Maximum Entropy Principle.
Author: Laird A. Breyer <laird@lbreyer.com>
WWW: http://dbacl.sourceforge.net/
PR: 58733
Submitted by: Cheng-Lung Sung <clsung@dragon2.net>
(FAQ) list from a specially formatted text data file. For
output, it can can generate either an HTML page, a text file,
or a DocBook XML file.
PR: 57694
Submitted by: king@v2project.com
CL-PPCRE is a fast, perl compatible implementation of regular expressions
written in portable, ANSI-compliant Common Lisp.
PR: 52372
Submitted by: Henrik Motakef <henrik.motakef@web.de>
META is builder for recursive descent parsers implemented as a domain
specific language on top of Common Lisp.
PR: 52364
Submitted by: Henrik Motakef <henrik.motakef@web.de>
cl-ppcre is a pure ANSI Common Lisp implementation of
Perl-compatible regular expressions.
This port depends on the previously submitted ASDF port.
It installs the sources and the .asd file. There are other
ports for binaries for the supported Lisp implementations.
PR: ports/52369
Submitted by: Henrik Motakef <henrik.motakef@web.de>
Meta is a parser building toolkit implemented as a
domain-specific language to be integrated in Common Lisp
applications.
This port depends on the previously submitted ASDF port.
It installs the sources and the .asd file. There are other
ports for the binaries for supported Lisp implementations.
PR: ports/52361
Submitted by: Henrik Motakef <henrik.motakef@web.de>
Note that, despite the misleading name, this is actually different from (and
independent of) textproc/wv, thus no repocopy.
Approved by: arved (Mentor)
libcroco is a standalone css2 parsing library.
It provides a low level event driven SAC like api
and a css object model like api.
This library is being written to bring the css support
to the mlview xml editor project but it can be used
for other applications as well.
XML::Filter::GenericChunk is a base class for SAX filters that are able to
Xparse wellballanced chunks from SAX events and transforms this chunk
Xinto a sequence of SAX events.
PR: 54729
Submitted by: Hansjoerg Pehofer <hansjoerg.pehofer@uibk.ac.at>
lan-C++ is an implementation of XSL Transformations (XSLT) and
XML Path Language (XPath).
It works hand in hand with the XML parser Xerces-C++ version 2.
For mor information please visit the homepage:
WWW: http://xml.apache.org/xalan-c/index.html
PR: ports/44430
Submitted by: Christopher Kelly <christopher.kelly@uk.yahoo-inc.com>,Bjoern A. Zeeb <bzeeb+freebsdports@zabbadoz.net>
XML::Generator::PerlData provides a simple way to generate SAX2 events from
nested Perl data structures, while providing finer-grained control over the
resulting document streams.
PR: 55130
Submitted by: Hansjoerg Pehofer <hansjoerg.pehofer@uibk.ac.at>
This package consists of Perl modules along with supporting Perl programs that
implement the semantic relatedness measures described by Leacock Chodorow
(1998), Jiang Conrath (1997), Resnik (1995), Lin (1998), Hirst St Onge (1998)
and the adapted gloss overlap measure by Banerjee and Pedersen (2002). The Perl
modules are designed as object classes with methods that take as input two word
senses. The semantic relatedness of these word senses is returned by these
methods. A quantitative measure of the degree to which two word senses are
related has wide ranging applications in numerous areas, such as word sense
disambiguation, information retrieval, etc. For example, in order to determine
which sense of a given word is being used in a particular context, the sense
having the highest relatedness with its context word senses is most likely to
be the sense being used. Similarly, in information retrieval, retrieving
documents containing highly related concepts are more likely to have higher
precision and recall values.
A command line interface to these modules is also present in the package. The
simple, user-friendly interface returns the relatedness measure of two given
words. A number of switches and options have been provided to modify the output
and enhance it with trace information and other useful output. Details of the
usage are provided in other sections of this README. Supporting utilities for
generating information content files from various corpora are also available in
the package. The information content files are required by three of the
measures for computing the relatedness of concepts.
WordNet::QueryData provides a direct interface to the WordNet database files.
It requires the WordNet package. It allows the user direct access to the full
WordNet semantic lexicon. All parts of speech are supported and access is
generally very efficient because the index and morphical exclusion tables are
loaded at initialization. This initialization step is slow (appx. 10-15
seconds), but queries are very fast thereafter---thousands of queries can be
completed every second.
This is a ruby library, implements Algorithm::Diff of Perl. This library
is needed for aswiki, which will be commited later.
Reviewed by: kuriyama, knu
This is a thin wrapper around the shellwords.pl package,
which comes preinstalled with Perl. This module imports a
single subroutine, shellwords(). The shellwords() routine
parses lines of text and returns a set of tokens using the
same rules that the Unix shell does for its command-line
arguments. Tokens are separated by whitespace, and can be
delimited by single or double quotes. The module also
respects backslash escapes.
PR: ports/50081
Submitted by: George Hartzell <hartzell@fruitfly.org>
This module is a wrapper around the diff algorithm from the
module Algorithm::Diff. It's job is to simplify a visualization
of the differences of each strings.
Compared to the many other Diff modules, the output is
neither in diff-style nor are the recognised differences
on line or word boundaries, they are at character level.
PR: ports/51434
Submitted by: Mathieu Arnold <m@absolight.net>
Text::Quoted examines the structure of some text which may
contain multiple different levels of quoting, and turns the
text into a nested data structure.
PR: ports/50936
Submitted by: Erwin Lansing <erwin@lansing.dk>
L2A is a simple filter to remove most LaTeX commands from
marked-up documents, leaving only the body of text.
PR: ports/47974
Submitted by: James E. Flemer <jflemer@alum.rpi.edu>
files with less memory. GNU Diff often can not work with files
larger than 33% of datasize (from limit) due to memory exhaustion.
PR: ports/50097
Submitted by: risner@stdio.com
Spellutils is a suite of programs which are used to isolate
some parts or texts from various types of files and hand
them over to another program which may change the texts;
it is typically a spell checker. Afterwards the possibly
changed text parts are copied back in place in the original
file.
PR: ports/41211
Submitted by: Thierry Thomas <thierry@pompo.net>