6b46c62d2e
File too long (should be no more than 24 lines). Line too long (should be no more than 80 characters). Trailing empty lines. Trailing white-space. Trucated the long files as best as possible while preserving the most info contained in them.
23 lines
1.3 KiB
Text
23 lines
1.3 KiB
Text
Beautiful Soup is a Python library designed for quick turnaround projects like
|
|
screen-scraping. Three features make it powerful:
|
|
|
|
* Beautiful Soup provides a few simple methods and Pythonic idioms for
|
|
navigating, searching, and modifying a parse tree: a toolkit for dissecting a
|
|
document and extracting what you need. It doesn't take much code to write an
|
|
application
|
|
* Beautiful Soup automatically converts incoming documents to Unicode and
|
|
outgoing documents to UTF-8. You don't have to think about encodings, unless
|
|
the document doesn't specify an encoding and Beautiful Soup can't autodetect
|
|
one. Then you just have to specify the original encoding.
|
|
* Beautiful Soup sits on top of popular Python parsers like lxml and html5lib,
|
|
allowing you to try out different parsing strategies or trade speed for
|
|
flexibility.
|
|
|
|
Beautiful Soup parses anything you give it, and does the tree traversal stuff
|
|
for you. You can tell it "Find all the links", or "Find all the links of class
|
|
externalLink", or "Find all the links whose urls match "foo.com", or "Find the
|
|
table heading that's got bold text, then give me that text."
|
|
|
|
Valuable data that was once locked up in poorly-designed websites is now within
|
|
your reach. Projects that would have taken hours take only minutes with
|
|
Beautiful Soup.
|