pkgsrc/www/py-beautifulsoup/DESCR
darcy 6ac689b419 Add BeautifullSoup package.
Beautiful Soup is a Python HTML/XML parser designed for quick turnaround
projects like screen-scraping. Three features make it powerful:

1. Beautiful Soup won't choke if you give it bad markup. It yields a parse
tree that makes approximately as much sense as your original document. This
is usually good enough to collect the data you need and run away.

2. Beautiful Soup provides a few simple methods and Pythonic idioms for
navigating, searching, and modifying a parse tree: a toolkit for dissecting
a document and extracting what you need. You don't have to create a custom
parser for each application.

3. Beautiful Soup automatically converts incoming documents to Unicode and
outgoing documents to UTF-8. You don't have to think about encodings, unless
the document doesn't specify an encoding and Beautiful Soup can't autodetect
one. Then you just have to specify the original encoding.
2008-09-05 15:46:51 +00:00

12 lines
630 B
Text

Beautiful Soup parses arbitrarily invalid XML- or HTML-like substance
into a tree representation. It provides methods and Pythonic idioms
that make it easy to search and modify the tree.
A well-formed XML/HTML document will yield a well-formed data
structure. An ill-formed XML/HTML document will yield a
correspondingly ill-formed data structure. If your document is only
locally well-formed, you can use this library to find and process the
well-formed part of it. The BeautifulSoup class has heuristics for
obtaining a sensible parse tree in the face of common HTML errors.
WWW: http://www.crummy.com/software/BeautifulSoup/