2007-07-02 04:08:49 +02:00
|
|
|
This is a perl version of simplified Chinese word segmentation.
|
|
|
|
|
2014-07-06 20:49:31 +02:00
|
|
|
The algorithm for this segmenter is to search the longest word at each point
|
|
|
|
from both left and right directions, and choose the one with higher frequency
|
|
|
|
product.
|
2007-07-02 04:08:49 +02:00
|
|
|
|
|
|
|
The original program is from the CPAN module Lingua::ZH::WordSegment
|
2014-07-06 20:49:31 +02:00
|
|
|
(http://search.cpan.org/~chenyr/) I did the follwing changes: 1) make the
|
|
|
|
interface object oriented; 2) make the internal string into utf8; 3) using
|
|
|
|
sogou's dictionary (http://www.sogou.com/labs/dl/w.html) as the default
|
|
|
|
dictionary.
|
2007-07-02 04:08:49 +02:00
|
|
|
|
2014-07-06 20:49:31 +02:00
|
|
|
WWW: http://search.cpan.org/dist/Lingua-ZH-WordSegmenter/
|