freebsd-ports/www/p5-HTML-ExtractContent/pkg-descr
Sunpoet Po-Chuan Hsieh ad4f267b7a Update WWW
search.cpan.org is shutting down.
It will redirect to metacpan.org after June 25, 2018.

With hat:	perl
2018-05-27 20:15:16 +00:00

11 lines
377 B
Text

HTML::ExtractContent is a module for extracting content from HTML with
scoring heuristics.
It guesses which block of HTML looks like content according to scores
depending on the amount of punctuation marks and the lengths of non-tag
texts.
It also guesses whether content end in the block or continue to the next
block.
WWW: https://metacpan.org/release/HTML-ExtractContent