ad4f267b7a
search.cpan.org is shutting down. It will redirect to metacpan.org after June 25, 2018. With hat: perl
11 lines
377 B
Text
11 lines
377 B
Text
HTML::ExtractContent is a module for extracting content from HTML with
|
|
scoring heuristics.
|
|
|
|
It guesses which block of HTML looks like content according to scores
|
|
depending on the amount of punctuation marks and the lengths of non-tag
|
|
texts.
|
|
|
|
It also guesses whether content end in the block or continue to the next
|
|
block.
|
|
|
|
WWW: https://metacpan.org/release/HTML-ExtractContent
|