Cocoa libraries. The GNUstep port that can be found here, was done by me. It
was very easy to do; primarily requiring only new interface files, and build
files.
PR: 104964
Submitted by: Gürkan Sengün
is simple: Using "Text::ExtractWords" and "Lingua::StopWords" from CPAN,
it determines how many of the known stopwords the document contains for
each language supported by "Lingua::StopWords".
Each word in the document recognized as stopword of a particular
language scores one point for this language.
The "language_guess()" function takes a document as a parameter and
returns the abbreviation of the language that it is most likely written
in.
Author: Mike Schilli <cpan@perlmeister.com>
WWW: http://search.cpan.org/~mschilli/Text-Language-Guess-0.02/
PR: ports/103571
Submitted by: Masahiro Teramoto <markun@onohara.to>
ffe is a program for extracting fields from flat file records and dis-
playing them in different formats. ffe relies on the configuration file
to control input file structure and the output format.
WWW: http://sourceforge.net/projects/ff-extractor/
Author: Timo Savinen <tjsa@iki.fi>
arbitrary text and also allows you to mark up a text as HTML
with the keywords.
A Hatena keyword is an element in a suite of web sites
*.hatena.ne.jp having blogs and social bookmarks among others.
Please refer to http://d.hatena.ne.jp/keyword/ (in Japanese) for details.
In Hatena Diary, a blog hosting service, a Hatena keyword found in
a posting is linked to the keywords page automatically.
You can implement the same kind of feature outside Hatena using this module.
It queries Hatena Keyword Link API internally for retrieving terms
Author: Naoya Ito <naoya@bloghackers.net>
WWW: http://search.cpan.org/~naoya/Hatena-Keyword-0.04/
PR: ports/102794
Submitted by: Masahiro Teramoto <markun(at)onohara.to>
This is a smaller, cheaper, faster SED implementation. Minix uses it. GNU
used to use it, until they built their own sed around an extended (some
would say over-extended) regexp package.
For embedded use we searched for a tiny sed implementation especially for
use with the dietlibc and found Eric S. Raymond's sed implementation quite
handy. Though it suffered several bugs and was not under active maintenance
anymore. After sending a bunch of fixes we agreed to continue maintaining
this lovely, historic sed implementation.
Along a lot fixes and cleanups, further speedups, and some missing features
and POSIX conformance, we also added a test-suite to the package, so
regressions are quickly and easily uncovered.
WWW: http://www.exactcode.de/oss/minised/
Author: ExactCode <info@exactcode.de>
Basically, this package contains:
- Functions to automatically adjust and cycle the section underline
decorations;
- A mode that displays the table of contents and allows you to jump anywhere
from it;
- Functions to insert and automatically update a TOC in your source
document;
- A mode which supports font-lock highlighting of reStructuredText
structures;
- Some other convenience functions.
This package is the result of merging:
- restructuredtext.el
- rst-mode.el
- rst-html.el
Those files are now OBSOLETE and have been replaced by this single
package file (2005-10-30).
WWW: http://docutils.sourceforge.net/docs/user/emacs.html
PR: ports/102384
Submitted by: Denis Shaposhnikov <dsh at vlink.ru>
Perl. Everything is implemented as a small plugin and you can mash
them up together using Plagger core API and plugin hooks. You can
think of Plagger as a blosxom or qpsmtpd for RSS aggregator.
WWW: http://plagger.org/
WARNING: This port depends on thousands of ports spececially with
full options.
xxdiff is a computer program that allows a user (usually a software
developer of some sort) to easily visualize the differences between
files. The manner and goal for which this process is applied over
multiple files is highly dependent on the application, and most of
the time is driven by custom user scripts.
For example, a configuration management engineer in a company might
provide some kind of merge policing environment, that allows software
developers to review changes in files for the purpose of accepting or
rejecting a submitted changeset to a codebase. Another example is
that of a developer wishing to review the changes he made to a
checkout of files from a source-code management system such as CVS,
Subversion, ClearCase, Perforce, etc.
WWW: http://furius.ca/xxdiff/doc/xxdiff-scripts.html
Flex is a tool for generating scanners. A scanner, sometimes called a
tokenizer, is a program which recognizes lexical patterns in text. The
flex program reads user-specified input files, or its standard input
if no file names are given, for a description of a scanner to generate.
The description is in the form of pairs of regular expressions and C
code, called rules. Flex generates a C source file named, "lex.yy.c",
which defines the function yylex(). The file "lex.yy.c" can be compiled
and linked to produce an executable. When the executable is run, it
analyzes its input for occurrences of text matching the regular
expressions for each rule. Whenever it finds a match, it executes the
corresponding C code.
WWW: http://flex.sourceforge.net/
Note that there's flex 2.5.4 in the base system. This port provides
a newer version for programs that require it, textproc/xxdiff for one.
This module provides functions that deals with formatting data with
Content-Type 'text/plain; format=flowed' as described in RFC2646
(http://www.rfc-editor.org/rfc/rfc2646.txt). In a nutshell,
format=flowed text solves the problem in plain text files where it
is not known which lines can be considered a logical paragraph,
enabling lines to be automatically flowed (wrapped and/or joined)
as appropriate when displaying.
In format=flowed, a soft newline is expressed as " \n", while hard
newlines are expressed as "\n". Soft newlines can be automatically
deleted or inserted as appropriate when the text is reformatted.
WWW: http://search.cpan.org/dist/Text-Flowed/
Justification: socialtext dependency
This provides a simple interface to Plucene. Plucene is large and multi-
featured, and it expected that users will subclass it, and tie all the
pieces together to suit their own needs. Plucene::Simple is, therefore,
just one way to use Plucene. It's not expected that it will do exactly
what *you* want, but you can always use it as an example of how to
build your own interface.
WWW: http://search.cpan.org/dist/PluceneSimple/
Justification: socialtext dependency
Quirks: 1/6 test fails
Bastardize provides an magical object into which text can be charged
and then returned in various, slighty modified ways.
Among others, bastardize has the following methods:
rdct converts english to hyperreductionist english
(ex. "english" becomes "")
pig pig latin
(ex. "hi there" becomes "ihay erethay")
k3wlt0k a k3wlt0kizer developed originally by Fmh
rot13 implements rot13 "encryption" in perl
(ex. "foo bar" becomes "sbb one")
rev reverses the arrangement of characters
censor attempts to censor text which might be innaproriate
n20e performs numerical abbreviations
(ex. "numerical_abbreviation" becomes "n20e")
WWW: http://search.cpan.org/dist/Text-Bastardize/
This is an XS wrapper around some Unicode Consortium code to check if
a string is valid UTF-8, revised to conform to what expat/Mozilla
think is valid UTF-8, especially with regard to low-ASCII characters.
Note that this module has NOTHING to do with Perl's internal UTF8 flag
on scalars.
This module is for use when you're getting input from users and want
to make sure it's valid UTF-8 before continuing.
WWW: http://search.cpan.org/dist/Unicode-CheckUTF8/
The goals of this project are simple:
Create a highly configurable, easily modifiable source code beautifier.
What it does:
* Ident code, aligning on parens, assignments, etc
* Align on '=' and variable definitions
* Align structure initializers
* Align #define stuff
* Align backslash-newline stuff
* Reformat comments (a little bit)
* Fix inter-character spacing
* Add or remove parens on return statements
* Add or remove braces on single-statement if/do/while/for statements
* Highly configurable - 118 configurable options as of version 0.0.15
WWW: http://uncrustify.sourceforge.net
PR: ports/100604
Submitted by: Dmitry Marakasov <amdmi3 at mail.ru>