This package consists of Perl modules along with supporting Perl programs that
implement the semantic relatedness measures described by Leacock Chodorow
(1998), Jiang Conrath (1997), Resnik (1995), Lin (1998), Hirst St Onge (1998)
and the adapted gloss overlap measure by Banerjee and Pedersen (2002). The Perl
modules are designed as object classes with methods that take as input two word
senses. The semantic relatedness of these word senses is returned by these
methods. A quantitative measure of the degree to which two word senses are
related has wide ranging applications in numerous areas, such as word sense
disambiguation, information retrieval, etc. For example, in order to determine
which sense of a given word is being used in a particular context, the sense
having the highest relatedness with its context word senses is most likely to
be the sense being used. Similarly, in information retrieval, retrieving
documents containing highly related concepts are more likely to have higher
precision and recall values.
A command line interface to these modules is also present in the package. The
simple, user-friendly interface returns the relatedness measure of two given
words. A number of switches and options have been provided to modify the output
and enhance it with trace information and other useful output. Details of the
usage are provided in other sections of this README. Supporting utilities for
generating information content files from various corpora are also available in
the package. The information content files are required by three of the
measures for computing the relatedness of concepts.
WordNet::QueryData provides a direct interface to the WordNet database files.
It requires the WordNet package. It allows the user direct access to the full
WordNet semantic lexicon. All parts of speech are supported and access is
generally very efficient because the index and morphical exclusion tables are
loaded at initialization. This initialization step is slow (appx. 10-15
seconds), but queries are very fast thereafter---thousands of queries can be
completed every second.
This is a ruby library, implements Algorithm::Diff of Perl. This library
is needed for aswiki, which will be commited later.
Reviewed by: kuriyama, knu
This is a thin wrapper around the shellwords.pl package,
which comes preinstalled with Perl. This module imports a
single subroutine, shellwords(). The shellwords() routine
parses lines of text and returns a set of tokens using the
same rules that the Unix shell does for its command-line
arguments. Tokens are separated by whitespace, and can be
delimited by single or double quotes. The module also
respects backslash escapes.
PR: ports/50081
Submitted by: George Hartzell <hartzell@fruitfly.org>
This module is a wrapper around the diff algorithm from the
module Algorithm::Diff. It's job is to simplify a visualization
of the differences of each strings.
Compared to the many other Diff modules, the output is
neither in diff-style nor are the recognised differences
on line or word boundaries, they are at character level.
PR: ports/51434
Submitted by: Mathieu Arnold <m@absolight.net>
Text::Quoted examines the structure of some text which may
contain multiple different levels of quoting, and turns the
text into a nested data structure.
PR: ports/50936
Submitted by: Erwin Lansing <erwin@lansing.dk>
L2A is a simple filter to remove most LaTeX commands from
marked-up documents, leaving only the body of text.
PR: ports/47974
Submitted by: James E. Flemer <jflemer@alum.rpi.edu>
files with less memory. GNU Diff often can not work with files
larger than 33% of datasize (from limit) due to memory exhaustion.
PR: ports/50097
Submitted by: risner@stdio.com