17 lines
724 B
Text
17 lines
724 B
Text
|
The crawl utility starts a depth-first traversal of the web at the specified
|
||
|
URLs. It stores all JPEG images that match the configured constraints.
|
||
|
Crawl is fairly fast and allows for graceful termination. After terminating
|
||
|
crawl, it is possible to restart it at exactly the same spot where it was
|
||
|
terminated. Crawl keeps a persistent database that allows multiple crawls
|
||
|
without revisiting sites.
|
||
|
|
||
|
The main features of crawl are:
|
||
|
|
||
|
* Saves encountered images or other media types
|
||
|
* Media selection based on regular expressions and size contraints
|
||
|
* Resume previous crawl after graceful termination
|
||
|
* Persistent database of visited URLs
|
||
|
* Very small and efficient code
|
||
|
* Asynchronous DNS lookups
|
||
|
* Supports robots.txt
|