Better compatibility with Mozilla/MSIE behaviour.
==== Changes since 3.27 ====
2003-08-19 Gisle Aas <gisle@ActiveState.com>
Release 3.31
The -DDEBUGGING fix in 3.30 was not really there :-(
2003-08-17 Gisle Aas <gisle@ActiveState.com>
Release 3.30
The previous release failed to compile on a -DDEBUGGING perl
like the one provided by Redhat 9.
Got rid of references to perl-5.7.
Further fixes to avoid warnings from Visual C.
Patch by Steve Hay <steve.hay@uk.radan.com>.
2003-08-14 Gisle Aas <gisle@ActiveState.com>
Release 3.29
Setting xml_mode now implies strict_names also for end tags.
Avoid warning from Visual C. Patch by <gsar@activestate.com>.
64-bit fix from Doug Larrick <doug@ties.org>
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=195500
Try to parse similar to Mozilla/MSIE in certain edge cases.
All these are outside of the official definition of HTML but
HTML spam often tries to take advantage of these.
- New configuration attribute 'strict_end'. Unless enabled
we will allow end tags to contain extra words or stuff
that look like attributes before the '>'. This means that
tags like these:
</foo foo="<ignored>">
</foo ignored>
</foo ">" ignored>
are now all parsed as a 'foo' end tag instead of text.
Even if the extra stuff looks like attributes they will not
be reported if requested via the 'attr' or 'tokens' argspecs
for the 'end' handler.
- Parse '</:comment>' and '</ comment>' as comments unless
strict_comment is enabled. Previous versions of the parser
would report these as text. If these comments contain
quoted words prefixed by space or '=' these words can
contain '>' without terminating the comment.
- Parse '<! "<>" foo>' as comment containing ' "<>" foo'.
Previous versions of the parser would terminate the comment
at the first '>' and report the rest as text.
- Legacy comment mode: Parse with comments terminated with a
lone '>' if no '-->' is found before eof.
- Incomplete tag at eof is reported as a 'comment' instead
of 'text' unless strict_comment is enabled.
2003-04-16 Gisle Aas <gisle@ActiveState.com>
Release 3.28
When 'strict_comment' is off (which it is by default)
treat anything that matches <!...> a comment.
Should now be more efficient on threaded perls.
Avoid core dump in some cases where the callback croaks.
The perl_call_method and perl_call_sv needs G_EVAL flag
to be safe.
New parser attributes; 'attr_encoded' and 'case_sensitive'.
Contributed by Guy Albertelli II <guy@albertelli.com>.
HTML::Entities
- don't encode \r by default as suggested by Sean M. Burke.
HTML::HeadParser
- ignore empty http-equiv
- allow multiple <link> elements. Patch by
Timur I. Bakeyev <timur@gnu.org>
Avoid warnings from bleadperl on the uentities test.
The automatic truncation in gensolpkg doesn't work for packages which
have the same package name for the first 5-6 chars.
e.g. amanda-server and amanda-client would be named amanda and amanda.
Now, we add a SVR4_PKGNAME and use amacl for amanda-client and amase for
amanda-server.
All svr4 packages also have a vendor tag, so we have to reserve some chars
for this tag, which is normaly 3 or 4 chars. Thats why we can only use 6
or 5 chars for SVR4_PKGNAME. I used 5 for all the packages, to give the
vendor tag enough room.
All p5-* packages and a few other packages have now a SVR4_PKGNAME.
2001-05-11 Gisle Aas <gisle@ActiveState.com>
Release 3.25
Minor tweaks for build failures on perl5.004_04, perl-5.6.0,
and for macro clash under Windows.
Improved parsing of <plaintext>... :-)
Changes:
If a handler triggered by flushing text at eof called the
eof method then infinite recursion occurred. Fixed.
Bug discovered by Jonathan Stowe <gellyfish@gellyfish.com>.
Allow <!doctype ...> to be parsed as declaration.
HTML::TokeParser's get_tag() method now takes multiple
tags to match. Hopefully the documentation is also a bit clearer.
#define PERL_NO_GET_CONTEXT: Should speed up things for thread
enabled versions of perl.
Quote some more entities that also happens to be perl keywords.
This avoids warnings on perl-5.004.
Unicode entities only triggered for perl-5.7.0 or higher.
The unbroken_text option now works across ignored tags.
Fix casting of pointers on some 64 bit platforms.
Fix decoding of Unicode entities. Only optionally available for
perl-5.7.0 or better.
Expose internal decode_entities() function at the Perl level.
Reindented some code.
The 3.16 release broke MULTIPLICITY builds. Fixed.
There was a C++ style comment left in util.c. Strict C
compilers do not like that kind of stuff.
Avoid the entity2char global. That should make the module
more thread safe. Patch by Gurusamy Sarathy <gsar@ActiveState.com>.
ones to do, and each compiled and installed/de-installed apparently
correctly.
As a side effect of the dynamic PLIST, we no longer need to have separate
-static and -shared PLISTs. It's now easier than ever to make a perl5
package for NetBSD :)
Changes include:
* Allow ":" in attribute names as a workaround for Microsoft Excel
2000 which generates such files.
* Make depreciate warning if netscape_buggy_comment() method is
used. The method to use in strict_comment().
* Avoid duplication of parse_file() method in HTML::HeadParser.
* $p->parse_file() will not close a handle passed to it any more.
If passed a filename that can't be opened it will return undef
instead of raising an exception, and strings like "*STDIN" are not
treated as globs any more.
* HTML::LinkExtor knowns about background attribute of <tables>.
Patch by Clinton Wong <clintdw@netcom.com>
* HTML::TokeParser will parse large inline strings much faster now.
The string holding the document must not be changed during parsing.
* Documentation updates.