Changelog:
version 3.37
date: 2010-10-08
# minor maintenance release
fixed: more tests fixed for HTML::TreeBuilder, hopefully
will pass now
version 3.36
date: 2010-10-07
# minor maintenance release
added: the use_tidy option to XML::Twig->new, which uses
HTML::Tidy to convert HTML to well-formed XHTML, as an
alternative to the default conversion which uses
HTML::TreeBuilder
added: XML::Twig::Elt method siblings which returns the
siblings of the element
added: methods att_accessors, elt_accessors and field_accessor
as well as the similarly named options when creating an
XML::Twig
added: set_outer_xml XML::Twig::Elt method
added: print_to_file on an XML::Twig::Elt
added: can use the tag[nested] form in twig handlers that
triggers on elements 'tag' that include a child 'nested'
added: aliased the add_to_class XML::Twig::Elt method to add_class,
which seems more natural
added: the remove_class method
added: made att and class lvalues (in perl 5.6 and up)
fixed: copy did not copy the empty status of an element
RT#31664 spotted by Roland Minner
https://rt.cpan.org/Ticket/Display.html?id=31664
fixed: cut_children would always set the empty status of an element,
even if it had children left
fixed: tests did not pass with HTML::TreeBuilder 3.23_1 due to a
change in an error message
to trigger/signal a rebuild for the transition 5.10.1 -> 5.12.1.
The list of packages is computed by finding all packages which end
up having either of PERL5_USE_PACKLIST, BUILDLINK_API_DEPENDS.perl,
or PERL5_PACKLIST defined in their make setup (tested via
"make show-vars VARNAMES=..."), minus the packages updated after
the perl package update.
sno@ was right after all, obache@ kindly asked and he@ led the
way. Thanks!
pkgsrc changes:
- remove dead master site
Upstream changes:
version: 3.35
date:
# minor maintenance release
added: the by_file option to xml_grep that limits
the number of hits per file
added: allowed the text of ignored elements to be buffered
in a string
fixed: comments need to be escaped (you can't have 2 '-' in a
row), RT#57389 spotted by Konstantin Tchernov
https://rt.cpan.org/Ticket/Display.html?id=57389
fixed: after $elt->cut_children, $elt->empty is false RT#54570
spotted and patched by Andrew Pimlott
https://rt.cpan.org/Ticket/Display.html?id=54570
fixed: documented the fact that latin1 is ISO-8859-15, see RT#37431
https://rt.cpan.org/Ticket/Display.html?id=37431
pkgsrc changes:
- Add license definition
- Change patches against PERL_MM_USE_DEFAULT environment variable for EU::MM
Upstream changes:
CHANGES
version: 3.34
date: 2010-01-18
# minor maintenance release, test suite fixes
fixed: tests failed when XML::XPath was used as the XPath engine
to trigger/signal a rebuild for the transition 5.8.8 -> 5.10.0.
The list of packages is computed by finding all packages which end
up having either of PERL5_USE_PACKLIST, BUILDLINK_API_DEPENDS.perl,
or PERL5_PACKLIST defined in their make setup (tested via
"make show-vars VARNAMES=...").
- took maintainership
- updated DEPENDS for testing purposes (not required)
ChangeLog:
version: 3.32
date: $Date: 2007-11-13T18:10:03.393214Z $
# minor maintenance release with a bug fix
fix: change to the regexp that parses XPath-like conditions so
it can accept leading non-ascii letters ([^\W\d] does not
work), not used in perl 5.005
fix: set use utf8 (except in 5.005), which gets rid of the dreaded
"SWASHNEW" error in 5.6. fixed things that then broke in 5.6.
version: 3.31
# minor maintenance release, fixing some tests
tests: fixes to stop tests from failing in various configurations
Changes in 3.30
fixed a couple of bugs in namespace handling, spotted by
Shlomo Yonas (see https://rt.cpan.org/Ticket/Display.html?id=27617
and http://www.perlmonks.org/?node_id=624830)
added the XML::Twig::Elt fields method which returns a list of
fields
added the normalize method in XML::Twig and XML::Twig::Elt,
which merge together consecutive pcdata elements. As much as
possible (so far after a cut, delete or erase), the twig is
kept normalized, eg there are no consecutive #PCDATA elements
in it. Suggestion of someone whose name (and emails) I can't
find at the moment.
added the indented_a / cvs format for pretty_print, that makes the
output friendly to line-oriented version control tools, as described
in http://tinyurl.com/2kwscq (RT #24954). Thanks to Sjur Moshagen
for a patch that I adapted to the current version.
fixed bug RT #25113: system entities were not properly resolved
if the XML file was not in the current directory. Thanks to
Dave Charness for the patch.
Added the XML::Twig method finish_now that terminates parsing
immediately, without checking the rest of the XML. This feature was
half suggested by Nick Clayton
added the -s option to xml_split, which splits when the given
size is reached for a file, suggested by Radek Saturka.
added the -g option to xml_split, which groups elements to be
split, suggested and tested by Dhirendra Singh Kholia.
added the safe_parsefile_html and safe_parseurl_html methods,
and a --html option to xml_grep. Suggested by Bill Ricker.
by default xml_grep now skips non well-formed files, the
--strict option makes it die when it finds one
fixed a bunch of bugs in xml_grep
fixed a warning when using optional modules with a version
number that includes an _, spotted and fix suggested by
Bill Ricker.
Fixed test failure on cygwin, thanks to Erik Rantapaa for the
patch.
Fixed a bunch of typos in docs, RT #25836, spotted and fixed by David
Steinbrunner
Improved re-use of XML::Twig objects for repetitive parsing. It
looks like it should be OK now , but I am sure I haven't tested
all cases yet (especially when DTDs and entities are involved).
HTML parsing improved: XML::Twig now tries to find the proper
encoding for the document (that's not done by HTML::TreeBuilder
at the moment).
XML::Twig::Elt purge and flush methods now only purge/flush up to
the element, not up to the current element in the twig (duh!)
Fixed bug in handlers of the form elt[string(subelt)="foo"] and
elt[string(subelt)=1] which did not work at all
fixed bug in parameter entity output, spotted by BenHopkins on
perlmonks (see http://www.perlmonks.org/?node_id=618360)
fixed bug in xml_string: options were not used
improved error reporting for missing SYSTEM entities, including
the option to set twig_expand_external_ents to -1, which makes
missing SYSTEM entities not fatal, but reports them in
$t->{twig_missing_system_entities} Thanks to Frank Wegmann for
his suggestions and for testing the various versions of the feature
fixed internals so new versions of Pod::Coverage won't barf
Changes in 3.29
fixed a bug in the handling of handlers after an ignore (RT #24392,
reported by Robert Eden).
Changes in 3.28
now builds on Windows and OS2
refactored the code that triggers handlers,
more complex expressions can now be handled,
such as '/doc/section[@def="1"]/title'
COMPATIBILITY WARNING
Up to version 3.26, you could change the attribute
of a parent of a node on which you had a handler,
and be able to trigger a handler on that parent node
based on the new attribute value:
XML::Twig->new( twig_handlers =>
{ 'sect/title' => sub { $_->parent->set_att( has_title => 1)},
'sect[@has_title="1"]'=> sub { ... }, # called for any sect that has
} # a title
);
This won't work now. The trigger expression ('sect[@has_title="1"]')
is evaluated strictly against the input XML. This is more logical and
consistent (if you changed the element name, the new name was never
used in the evaluation of the trigger).
The only exception to that rule is if you use "private attributes":
attributes which name starts with a '#'. By definition this in an invalid
XML name, so it can't be in the input, and has to have been created . In
that case the code that evaluates the trigger looks at the attribute in
the element in the tree in memory (if it exists).
So in the example above, if you replace 'has_title' by '#has_title',
everything will work fine. Note that private attributes are not output
when using the print/sprint/xml_string... methods.
fixed xml_pp so it does not leave a tempfile
and a broken original file all when the original
file is not well-formed.
added the nparse_pp method that does an nparse
with pretty_print set to 'indented', nparse_e
that sets error_context, and nparse_ppe that
does both
added XML::Twig::Elt tag_to_span and tag_to_div
methods (turn an element into a span/div and
set its class to the old tag name)
added the quote option for XML::Twig new, which
sets the output quote character for attributes
('single' or 'double')
added the text_only and xml_text_only methods
that return the text of the element, but not of
the sub-elements.
added outer_xml method (synonym for sprint)
fixed bug where entity names were not matched
properly (RT #22854, spotted by Bob Faist)
fixed bug on some DOCTYPE config with
twig_print_outside_roots
fixed bug in set_keep_encoding (the method,
not the option).
fixed bug in simplify: the code attempted to
replace variables in attribute values even if no
option required it, spotted by Klaus Rush
clean-up and fixed bugs in ignore: the method
can now be called from a regular handler (it
always could but the docs did not say so,
thanks to kudra for noticing this). It can
also be called to ignore a parent of the current
element. There were bugs there, and the tree
was not built properly
added error message when an XPath query with
a leading / is used on a node that does not
belong to a whole twig (because it's been cut
or because the twig itself went out of scope)
when parsing HTML with error_context set, the
HTML is indented, in order to give better error
report
Based on patch provided by Martin Wilke via PR 34412.
And modify dependency.
- Remove p5-WeakRef>=0.01: this is optional and Scalar::Util or WeakRef,
and Scalar::Util alredy exists in perl base package.
- Remove p5-File-Temp>=0.12: newer version exsits in perl base package.
Changes:
Changes in 3.26
added argument to -i in the Makefile to prevent
problem in win32
added XML::Twig::Elt former_next_sibling,
former_prev_sibling and former_parent methods
squashed a memory leak when parsing html
(forgot to call delete on the HTML::Tree object)
fixed bug that caused XML::Twig to hang if
there was a syntax error in a predicate
(RT#19499, reported by Dan Dascalescu)
made start_tag and end_tag more consistent: they
now both return the empty string for comments,
PIs... (reported by Dan Dascalescu)
added parsefile_inplace and parsefile_html_inplace
methods (thanks to GrandFather on perlmonks)
added support to add css stylesheet in the
add_stylesheet method (thanks to Georgi Sotirov)
patched tests to work on Win32
added set_inner_xml inner_xml and set_inner_html
methods
Changes in 3.25
patched to work with perl 5.005!
fixed a bug in xml_pp when pretty printing a
file in place in a different file system
Changes in 3.24
added loading the text of entities stored in
separate files (using SYSTEM) when the (awfully
named!) expand_external_ents option is used.
Thanks to jhx for spotting this.
changed set_cdata, set_pi and set_comment so that
if you call them on an element of the wrong kind,
everything works as expected, instead of swallowing
silently the data. Bug spotted by cmccutcheon
fixed a whole bunch of things to make the module
run and the tests pass on VMS, thanks to Peter
(Stig) Edwards who reported bug RT #18655 and
provided a patch.
fixed bug on get_xpath( '/root[1]') expressions,
RT #18789 spotted by memfrob.
added the add_stylesheet method, that... adds a
stylesheet (xsl type is supported, let me know if
other types are needed) to a document.
allowed pasting PI/Comment elements before or after
the root of a document (see discussion at
http://perlmonks.org/index.pl?node_id=538550).
Thanks to rogue90 for noticing the problem, and to
Tanktalus for finding the best way to solve it.
aliased unwrap to erase (eg added the unwrap method
to XML::Twig::Elt, identical to the existing erase)
suggested by Chris Burbridge.
fixed bug RT #17522: flushing twice at the end of
the the parse would output the last fragment twice.
Spotted by Harco de Hilster.
dealt with bug RT #17500: parsing a pipe when using
the UTF8 perlIO layer (through PERL_UNICODE or -C)
now raises an error, found by Nikolaus Rath.
made the tests pass when the UTF8 perlIO layer is
used. At this point potential problems when parsing
non-UTF8 XML in this configuration are not trapped.
Changes in 3.23
added autoflush: there is no more need for the
last $twig->flush after the parsing, it is done
automatically at the end of the parsing, with the
same arguments as the first flush on the twig.
This can be turned of by setting $twig->{twig_autoflush}
to 0.
WARNING: if you finished the output with a direct
print instead of a flush, then this change will
cause a bug. Hopefully this should not be the case
and is easily fixable.
fixed bug RT #17145 where get_xpath('//root/elt[1]/child')
would produce a fatal error if there were no elt
element under root. Spotted by Dan Dascalescu.
fixed bug RT #17064 (comments and PIs after the
root element were not properly processed), spotted
by Dan Dascalescu.
fixed bug RT #17044: the SYSTEM value was not
output in UpdateDTD mode, thanks to Michal
Lewandowski for pointing this out.
changed the way empty tags are expanded with the
'html' style: only tags that are allowed to be
empty in XHTML are output as '<tag />', thanks
to Tom Rathborne for proding me to look into this.
added a 'wrapped' pretty_print option, that is
a bit dodgy I think but that might please some.
fixed bug RT #16540 (tags with specific names
(like 'level'), tripped XML::Twig, spotted by
Graham
added comparison with XML::LibXML in the SEE ALSO
section (and in the FAQ), following a question
from surf on c.l.p.m
XML::Twig now rejects string/regexp condition
in twig_roots
added better error checking in xml_grep
fix for string/regexp condition in xml_grep
added support for ! @att (or not @att) in get_xpath
added support for several predicates in get_xpath
(not nested predicates though).
fixed bug RT #15671 (wrong condition interpretation
for attribute value 0)
added XML::Twig print_to_file method
added XML::Twig::Elt methods: following_elt,
following_elts, preceding_elt, preceding_elts
(needed to support the corresponding axis in
get_xpath)
Changes in 3.22
added the XML::Twig xparse method, which parses
whatever is thrown at it (filehandle, string,
HTML file, HTML URL, URL or file).
added the XML::Twig nparse method, which creates
a twig and then calls xparse on the last parameter.
added the parse_html and parsefile_html methods,
which parse HTML strings (or fh) and files
respectively, whith the help of HTML::TreeBuilder.
the implementation may still change. Note that
at the moment there seems to be encoding problems
with it (if the input is not UTF8).
added info to t/zz_dump_config.t
fixed a bug that caused subs_text to leave empty
#PCDATA elements if the regexp matched at the begining
or at the end of the text of an element.
fixed RT #15014: in a few methods objects were
created as XML::Twig::Elt, instead of in the classu^!F
of the calling object.
fixed RT #14959: problem with wrap_children when
an attribute of one of the child element includes
a '>'
improved the docs for wrap_children
added a better error message when re-using an
existing twig during the parse
partially fixed a bug with windows line-endings in
CDATA sections with keep_encoding set (RT #14815)
added Test::Pod::Coverage test to please the kwalitee
police ;--)
Changes in 3.21
fixed a test that failed if Tie::IxHash was not
available
added link to Atom feed for the CPAN testers
results at http://xmltwig.com/rss/twig_testers.rss
Changes in 3.20
fixed the pod (which caused the tests to fail)
Changes in 3.19
redid the fix to RT # 14008, this one should be ok
u^!F
restructured tests
added the _dump method (probably not finished)
Changes in 3.18
added a fix to deal with a bug in XML::Parser in the
original_string method when used in CDATA sections
longer than 1024 chars (RT # 14008) thanks to Dan
Dascalescu for spotting the bug and providing a test
case.
added better error diagnostics when the wrong arguments
are used in paste
fixed a bug in subs_text when the text of an element
included \n (RT #13665) spotted by Dan Dascalescu
cleaned up the behaviour of erase when the element
being erased has extra_data (comments or pis) attached
fixed a bug in subs_text that sometimes messed up text
after the matching text
fixed the erase/group_tags option of simplify to make
it exactly similar to XML::Simple's
fixed a bug that caused XML::Twig to crash when ignore
was used with twig_roots (RT #13382) spotted by Larry
Siden
fixed bug in xml_split with default entities (they
ended up being doubly escaped)
fixed various bugs when dealing with ids (changing
existing ids, setting the attribute directly...)
mark and split now accepts several tags/ as arguments,
so you can write for example:
$elt->mark( qr/^(\w+): (.*)$/, 'dt', 'dd');
added XML::Twig::Elt children_trimmed_text method,
patch sent by ambrus (RT #12510)
changed children_text and children_trimmed_text to
have them return the entire text in scalar context
fixed bug that caused XML::Twig not to play nice with
XML::Xerces (due to improper import of UNIVERSAL::isa)
spotted and patched by Colin Robertson.
removed most references to 'gi' in the docs, replaced
them by tag. I guess Robin Berjon's relentless teasing
is to be credited with this one.
added tag_regexp condition on handlers (a regexp instead
of a regular condition will trigger the handler if the
tag matches), suggested by Franck Porcher, implementation
helped by a few Perl Monks
(http://perlmonks.org/index.pl?node_id=445677).
fixed typos in xml_split (RT #11911 and #11911),
reported by Alexey Tourbin
added tests for xml_split and xml_merge and fixed
a few bugs in the process
added the -i option to xml_split and xml_merge,
that use XInclude instead of PIs (preliminary
support, the XInclude namespace is not declared
for example).
Added the XML::Twig and XML::Twig::Elt trim method
that trims an element in-place
Added the XML::Twig last_elt method and the XML::Twig::Elt
last_descendant method
Added more tests
developer is officially maintaining the package.
The rationale for changing this from "tech-pkg" to "pkgsrc-users" is
that it implies that any user can try to maintain the package (by
submitting patches to the mailing list). Since the folks most likely
to care about the package are the folks that want to use it or are
already using it, this would leverage the energy of users who aren't
developers.
While here list, as comments, all dependencies needed to run the
self tests
Changes since last packaged version (3.16):
Changes in 3.17
documentation changes, mostly to point better to
the resources at http://www.xmltwig.com
fix a few tests that would fail under perl 5.6.*
and Solaris (t/test_safe_encode.t and t/test_bug_3.15.t),
see RT bug # 11844, thanks to Sven Neuhaus
made the licensing terms in the README match the
ones in the main module (same as Perl), see RT
bug #11725
added a test on XML::SAX::Writer version number to
avoid failing tests with old versions (<0.39)
improved xml_split
Changes in 3.16
added the xml_split/xml_merge tools
fixed PI handler behaviour when used in twig_roots
mode
fix a bug that prevented the DTD to be output
when update_DTD mode is on, no DTD is present but
entities have been created
added level(<n>) trigger for handlers
fixed bug that prevented the output_filter to be
called when printing an element. Spotted thanks to
Louis Strous.
fixed bug in the nsgmls pretty printer that output
invalid XML (an extra \n was added in the end tag)
found by Lee Goddard
fixed test 284 in test_additional to make it pass
in RedHat's version of perl 5.8.0, thanks to
rdhayes for debugging and fixing that test.
first shot at getting Pis and comments back in the
proper place, even in 'keep' mode. At the moment
using set_pcdata (or set_cdata) removes all
embedded comments/pis
fixed a bug with pi's in keep mode (pi's would not
be copied if they were within an element) found by
Pascal Sternis
added a fix to get rid of spurious warnings, sent
by Anthony Persaud
added the remove_cdata option to the XML::Twig new
method, that will output CDATA sections as regular
(escaped) PCDATA
added the index option to the XML::Twig new method,
and the associated XML::Twig index method, which
generates a list of element matching a condition
during parsing
added the XML::Twig::Elt first_descendant method
fixed a bug with the keep_encoding option where
attributes were not parsed when the element name was
followed by more than one space (spotted by Gerald
Sedrati-Dinet),
see https://rt.cpan.org/Ticket/Display.html?id=8137
fixed a bug where whitespace at the begining of an
element could be dropped (if followed by an element
before any other character). Now whitespace is
dropped only if it includes a \n
added feature: when load_DTD is used, default
attributes are now filled
fixed bug on xmlns in path expression trigger
(would not replace prefixes in path expressions),
spotted by amonroy on perlmonks, see
http://perlmonks.org/index.pl?node_id=386764
optimized XML::Twig text, thanks to Nick Lassonde
for the patch
fixed bug that generated an empty line before some
comments (pointed out by Tanya Huang)
fixed tests to check XML::Filter::BufferText version
(1.00 has a bug in the CDATA handling that makes XML::Twig
tests fail).
Added new options --nowrap and --exclude (-v) to xml_grep
fixed warning in tests under 5.8.0 (spotted by Ed Avis)
skipped HTML::Entities tests in 5.8.0 (make test for this
module seem to fail on my system, it might be the same
elsewhere)
Fixed bug RT #6067 (problems with non-standard versions of
Scalar::Utils which do not include weaken)
Fixed bug RT #6092 (error when using safe output filter)
Fixed bug when using map_xmlns, tags in default namespace
were not output
Changes in 3.15
Fixes that allow the tests to pass on more systems (thanks to Ed
Avis for his testing)
Added normalize_space option for simplify (suggestion of Lambert Lum)
Removed usage of $&
Expanded the doc for paste, as it was a bit short (suggestion of
Richard Jolly)
Changes in 3.14
Namespace processing has been enhanced and should work fine now,
as long as twig_roots is not used.
Potentially uncompatible change: the behaviour of simplify has
been changed to mimic as exactly as possible XML::Simple's XMLin
Completed the pod to cover the entire API
Tests now pass with perl 5.005_04-RC1 (fail with 5.005 reported
by David Claughton), added more tests and a config summary at the
end of the tests
Added methods on the class attribute, convenient for dealing with
XHTML or preparing display with CSS:
class set_class add_to_class att_to_class add_att_to_class
move_att_to_class tag_to_class add_tag_to_class set_tag_class in_class
navigation functions can use '.<class>' expressions
fixed (yet another!) bug in the way DTDs were output
fixed bug for pi => 'drop' option
changed the names of lots on internal (undocumented) methods, prefixed
them with _
module directory has changed (eg. "darwin-2level" vs.
"darwin-thread-multi-2level").
binary packages of perl modules need to be distinguishable between
being built against threaded perl and unthreaded perl, so bump the
PKGREVISION of all perl module packages and introduce
BUILDLINK_RECOMMENDED for perl as perl>=5.8.5nb5 so the correct
dependencies are registered and the binary packages are distinct.
addresses PR pkg/28619 from H. Todd Fujinaka.
Changes in 3.13:
Maintenance release to get the tests to pass on various platforms
updated the README
fixed a problem with encoding conversions (using safe_encode and
safe_encode_hex) under perl 5.8.0, see RT ticket #5111
fixed tests to pass when trying to use an unsupported iconv filter
Changes in 3.12:
New features and greatly increased test coverage
added lots of tests (>900), thanks to David Rigaudiere, Forrest
Cahoon, Sebastien Aperghis-Tramoni, Henrik Tougaard and Sam Tregar
for testing this release on various OSs, Perl, XML::Parser and
expat versions.
added XML::Twig::XPath that uses XML::XPath as the XPath engine
for findnodes, findnodes_as_string, findvalue, exists, find and
matches. Just use XML::Twig::XPath instead of use XML::Twig;
(see the tests in t/xmlxpath_*).
Added special case to output some HTML tags ('script' to start with)
as not empty.
XML::Twig::Elt->new now properly flags empty elements (spotted by
Dave Roe)
added XML::Twig::Elt contains_a_single method
added #ENT twig_handlers (not necessarily complete, so not yet
documented, needs more tests)
added doc for XML::Twig and XML::Twig::Elt subs_text methods
tags starting with # are now "invisible" (they are not output),
useful for example for pretty_printing
added new options --wrap '' and --date to xml_grep
improved XPath support (added [nb] support)
added xpath method, which generates a unique XPath for an element
added has_child and has_children as synonyms of first_child
added XML::Twig::set_id_seed to control how generated id's are
created
when using ignore on an element, end_tag_handlers are now tested
at the end of the element (so you can for exemple get the byte
offset in the document), suggestion of Philippe Verdret
added XML::Twig::Elt change_att_name
XML::Twig::Elt new now properly works when called as an object
(and not a class) method
fixed namespace processing somewhat
fixed SAX output methods
fixed bug when keep_atts_order on and using set_att on an element
with no existing attribute (spotted by scharloi)
WARNING - potentially incompatible changes -
when using finish_print, the document used to be flushed. This is no
longer the case, you will have to do it before calling finish_print.
This way you have the choice of doing it or not.
Removed XML::Twig::Elt::unescape function (was no longer used)
Changes in 3.11
added --text_only option to xml_grep (outputs the text of the
result, without tags)
fixed bug where "Comments [was] always dropped after a twig
object set 'comments' to 'drop'" (RT#3711), bug report and first
patch by Simon Flack
by popular demand, added option "keep_atts_order" that keeps the
original attribute order in the output. This option needs the
Tie::IxHash module to work.