Provided in PR 12884 by Jesse Off (joff@newmonics.com)
"External converter script for ht://Dig (version 3.1.4 and later), that
converts Microsoft Word, Excel and Powerpoint files, and PDF,
PostScript, RTF, and WordPerfect files to text (in HTML form) so they
can be indexed. Uses a variety of conversion programs:
wp2html - to convert Wordperfect and Word7 & 97 documents to HTML
catdoc - to extract text from Word documents
rtf2html - to convert RTF documents to HTML
pdftotext - to extract text from Adobe PDFs
ps2ascii - to extract text from PostScript
pptHtml - to convert Powerpoint files to HTML
xlHtml - to convert Excel spreadsheets to HTML
or
xls2csv - to obtain data from Excel spreadsheets.
Written by David Adams (University of Southampton), and based on the
conv_doc.pl script by Gilles Detillieux."
Provided in PR 12882 by Jesse Off (joff@newmonics.com)
xlHtml is an Excel 95 and later file converter. Its html output
can be used as a Netscape Plugin to let you view xls e-mail
attachments. It can also extract regions of a spreadsheet and
convert the spreadsheet to pure text rather than html.
* Bugfixes
* The iconv program's -f and -t options are now optional.
* Many more transliterations.
* Added CP862 converter.
* Changed the GB18030 converter.
* Portability to DOS with DJGPP.
Speed up pure perl base64 encoder/decoder by using join/map instead
of while loop.
Doc update contributed by Jerrad Pierce
Downgrade UTF8 strings before starting to encode.
rpm2cpio reads RPM archive and dumps the content to standard out as a cpio
format. It's written in Perl.
I moved its category from original place misc to converters.
out of date - it was based on a.out OBJECT_FMT, and added entries in the
generated PLISTs to reflect the symlinks that ELF packages uses. It also
tried to be clever, and removed and recreated any symbolic links that were
created, which has resulted in some fun, especially with packages which
use dlopen(3) to load modules. Some recent changes to our ld.so to bring
it more into line with other Operating Systems also exposed some cracks.
+ Modify bsd.pkg.mk and its shared object handling, so that PLISTs now contain
the ELF symlinks.
+ Don't mess about with file system entries when handling shared objects in
bsd.pkg.mk, since it's likely that libtool and the BSD *.mk processing will
have got it right, and have a much better idea than we do.
+ Modify PLISTs to contain "ELF symlinks"
+ On a.out platforms, delete any "ELF symlinks" from the generated PLISTs
+ On ELF platforms, no extra processing needs to be done in bsd.pkg.mk
+ Modify print-PLIST target in bsd.pkg.mk to add dummy symlink entries on
a.out platforms
+ Update the documentation in Packages.txt
With many thanks to Thomas Klausner for keeping me honest with this.
unix2dos and dos2unix are utilities that convert ASCII files from
the DOS cr/lf format to the UNIX lf format.
Submitted by Tomasz Luchowski <zuntum@eik.pl>
* Changed executable name from wvHtml to wvWare
* Added Mime display script (wvMime) (Martin Vermeer, Dom)
* Added Conversion Helper Scripts (wvHtml, wvLatex, wvCleanLatex,
wvPS, wvDVI, wvPDF) (Dom)
* Added CleanLaTeX output mode, more closely resembling hand-crafted
LaTeX (Martin)
* Use GLib (http://www.gtk.org) (Dom)
* Use Gnome Libole2 (http://www.gnome.org) (Dom, Jamie)
* New wvStream architecture (Jamie)
* Word 2 support! (Martin Vermeer)
* Code speedups and XP improvements (the Abiword team)
* Massive work started on an exporter (Dom)
* Move to wvware.sourceforge.net (Dom)
* New Maintainer: Dom Lachowicz (cinamod@hotmail.com)
|-> also getting lots of help from Martin Vermeer
ones to do, and each compiled and installed/de-installed apparently
correctly.
As a side effect of the dynamic PLIST, we no longer need to have separate
-static and -shared PLISTs. It's now easier than ever to make a perl5
package for NetBSD :)
which takes entries of the format <make-definition-name>=<pkgname>. This
has not been added to MAKEFLAGS because (a) premature optimisation is the
root of all evil, and (b) because the .for loop used to implement this
shows the wrong results when multiple prefices are evaluated.
Modify all the package Makefiles to use EVAL_PREFIX, thereby simplifying
them considerably.
ALso simplify the logic to calculate the prefix as well.
package's prefix would not work as part of the environment specification
via MAKE_ENV (as it would not be executed in the correct directory).
Fix this by invoking pkg_info(1) directly, not via an intermediate make(1)
step - this is not as clean, but more effective (i.e. it works).
a bit more user-friendly.
Introduce a show-{gtk+,imlib,kdebase,qt1,qt2,xpm}-prefix target in
bsd.pkg.mk, and use "${MAKE} show-*-prefix" in package Makefiles.
Add a new USE_LIBTOOL definition that uses the libtool package instead of
pkglibtool which is now considered outdated.
USE_PKGLIBTOOL is available for backwards compatibility with old packages
but is deprecated for new packages.
User-visible changes are:
.* Incompatible changes
. + A double dot `..' should now be used instead of a colon `:'.
. + Option --force (-f) is needed to pursue recoding despite errors.
. + There is no more quoting for special characters within charsets names.
. + Auto check (`-a') and popen (`-o') options have been withdrawn.
. + Some charsets and aliases were deleted, see `Charsets & aliases' below.
.* Extended features
. + Program messages are available in localised form for many languages.
. + Long character names are available in French, if LANGUAGE is set to `fr'.
. + A new request syntax allows for recode chaining, and for surfaces.
. + Option --header-file (-h) accepts a language parameter, and Perl is new.
. + Full charset listings now show the UCS-2 value for characters.
. + Option --known=PAIRS (-k) also accepts octal and hexadecimal numbers.
. + Option --list (-l) better sorts charsets and aliases, also fully written.
. + Charset `RFC1345' implements mnemonic+ascii+38, and is now reversible.
. + HTML is not limited anymore to Latin-1, HTML 4.0 entities are supported.
.* New features
. + Euro support.
. + Updated RFC 1345 set of tables, from Keld Simonsen.
. + Some African charsets and transliterated forms.
. + Conversions for ISO 10646 and Unicode.
. + Combining or explosion of UCS-2 diacriticized characters and ligatures.
. + Implementation of surfaces, see `Surfaces & aliases' below.
. + Mixed mode for recoding only comments and strings in C sources or PO files.
. + A stand-alone recoding library gets installed, often as a shared library.
. + Option --find-subsets (-T) lists charsets which are subsets of another.
. + The library may generate testing data, and study character frequencies.
.* Charsets & aliases
. + New ISO 10646 and Unicode charsets
. - combined-UCS-2: pseudo-charset.
. - count-characters: pseudo-charset.
. - dump-with-names: pseudo-charset.
. - ISO-10646-UCS-2: aliases are UNICODE-1-1, BMP, rune and u2.
. - ISO-10646-UCS-4: aliases are 10646, ISO-10646, UCS-4 and u4.
. - UNICODE-1-1-UTF-7: aliases are TF-7 and u7.
. - UTF-8: aliases are UTF-2, UTF-FSS, FSS_UTF, TF-8 and u8.
. - UTF-16: aliases are Unicode, TF-16 and u6.
. + RFC 1345.bis matters
. - Deleted charsets
dk-us, us-dk (because of &duplicate which `recode' does not handle yet).
. - New charsets
baltic (alias is iso-ir-179); CP1250 (1250, ms-ee, windows-1250);
CP1251 (1251, ms-cyrl, windows-1251); CP1252 (1252, ms-ansi, windows-1252);
CP1253 (1253, ms-greek, windows-1253);
CP1254 (1254, ms-turk, windows-1254); CP1255 (1255, ms-hebr, windows-1255);
CP1256 (1256, ms-arab, windows-1256);
CP1257 (1257, WinBaltRim, windows-1257);
CWI (CWI-2, cp-hu); EBCDIC-IS-FRISS (friss);
GOST_19768-87 with aliases of previous GOST_19768-74;
IBM256 (256, CP256, EBCDIC-INT1); IBM875 (875, CP875, EBCDIC-Greek);
IBM1004 (1004, CP1004, os2latin1); IBM1047 (1047, CP1047);
ISO-8859-13 (ISO_8859-13:1998, iso-baltic, iso-ir-179a, l7, latin7);
ISO-8859-14 (ISO_8859-14:1998, iso-celtic, iso-ir-199, l8, latin8);
ISO-8859-15 (ISO_8859-15:1998, iso-ir-203, l9, latin9);
KOI-7; KOI-8 (GOST_19768-74); KOI8-R; KOI8-RU; KOI8-U;
macintosh_ce (macce); mac-is;
NeXTSTEP (next) yet previous `recode' had it outside RFC 1345.
. - Alias promoted to charset (with previous charset becoming alias)
ISO-646.basic (with ISO-646.basic:1983); ISO-646.irv (ISO-646.irv:1983);
ISO_5427-ext (ISO_5427:1981); ISO_5428 (ISO_5428:1980);
ISO-8859-1 (ISO_8859-1:1987); ISO-8859-2 (ISO_8859-2:1987);
ISO-8859-3 (ISO_8859-3:1988); ISO-8859-4 (ISO_8859-4:1988);
ISO-8859-5 (ISO_8859-5:1988); ISO-8859-6 (ISO_8859-6:1987);
ISO-8859-7 (ISO_8859-7:1987); ISO-8859-8 (ISO_8859-8:1988);
ISO-8859-9 (ISO_8859-9:1989); ISO-8859-10 (latin6);
NC_NC00-10 (NC_NC00-10:81); sami (latin-lap).
. - New aliases
037 (for charset IBM037); 038 (IBM038); 273 (IBM273); 274 (IBM274);
275 (IBM275); 278 (IBM278); 280 (IBM280); 281 (IBM281); 284 (IBM284);
285 (IBM285); 290 (IBM290); 297 (IBM297); 367 (ANSI_X3.4-1968);
420 (IBM420); 423 (IBM423); 424 (IBM424); 500, 500V1 (IBM500);
819 (ISO-8859-1); 864 (IBM864); 868 (IBM868); 870 (IBM870);
871 (IBM871); 880 (IBM880); 891 (IBM891); 903 (IBM903); 905 (IBM905);
912, CP912, IBM912 (ISO-8859-2); 918 (IBM918); 1026 (IBM1026);
ECMA-113, ECMA-113:1986 (ECMA-Cyrillic); GOST_19768-74 (KOI8);
ISO_8859-N (ISO-8859-N) for N = 1 through 10 and 13 through 15;
ISO_8859-10:1993 (ISO-8869-10); iso-ir-170 (INVARIANT);
KOI8_L2 (CSN_369103); pclatin2, pcl2 (IBM852); SS636127 (SEN_850200_B).
. + New African charsets
. - AFRL1-101-BPI_OCIL: aliases are t-francais and t-fra.
. - AFRFUL-102-BPI_OCIL: aliases are bambara, bra, ewondo and fulfulde.
. - AFRFUL-103-BPI_OCIL: aliases are t-bambara, t-bra, t-ewondo and t-fulfulde.
. - AFRLIN-104-BPI_OCIL: aliases are lingala, lin, sango and wolof.
. - AFRLIN-105-BPI_OCIL: aliases are t-lingala, t-lin, t-sango and t-wolof.
. + Extra miscellaneous charsets
. - KEYBCS2, Kamenicky.
. - CORK, T1.
. - KOI-8_CS2.
. + New HTML pseudo-charsets
. - HTML_1.1: alias is h1.
. - HTML_2.0: aliases are RFC 1866, 1866 and h2.
. - HTML-i18n: alias is RFC 2070.
. - HTML_3.2: reimplemented; alias is h3.
. - HTML_4.0: aliases are h4, HTML and h.
. - Deleted aliases: HTF, 8859, ISO 8859, Entities, SGML, WWW, w3.
.* Surfaces & aliases
. + New MIME encoding surfaces
. - Base64: aliases are 64 and b64.
. - Quoted-Printable: aliases are qp and Quote-Printable.
. + New permutation surfaces
. - 21-Permutation: alias is swabytes.
. - 4321-Permutation.
. + New end of line surfaces
. - CR.
. - CR-LF: alias is cl.
. + New (fully reversible) dump surfaces
. - Decimal-1: aliases are d and d1.
. - Decimal-2: alias is d2.
. - Decimal-4: alias is d4.
. - Hexadecimal-1: aliases are x and x1.
. - Hexadecimal-2: alias is x2.
. - Hexadecimal-4: alias is x4.
. - Octal-1: aliases are o and o1.
. - Octal-2: alias is o2.
. - Octal-4: alias is o4.
. + New miscellaneous surfaces.
. - data, test7, test8, test15, test16.
build a binary package with this definition would fail as the PLIST is
not correct.
If a package's documentation is overwhelming, it should arguably be handled
in a separate pre-requisite documentation package.
(not needed by any existing package). Now this package fits into the
category "converters" again.
- Because this package isn't X11 related anymore it is installed into
"${LOCALBASE}".
- New, optional Makefile variable HOMEPAGE, specifies a URL for
the home page of the software if it has one.
- The value of HOMEPAGE is used to add a link from the
README.html files.
- pkglint updated to know about it. The "correct" location for
HOMEPAGE in the Makefile is after MAINTAINER, in that same
section.
Implement a new DEPENDS definition, which looks for an installed
package, building it if not present, and use it in preference to
LIB_DEPENDS. This should make the package collection more useful on
NetBSD ELF ports.