10 lines
540 B
Text
10 lines
540 B
Text
|
The module is a probability based, corpus-trained tagger that assigns POS
|
||
|
tags to English text based on a lookup dictionary and a set of probability
|
||
|
values. The tagger assigns appropriate tags based on conditional
|
||
|
probabilities - it examines the preceding tag to determine the appropriate
|
||
|
tag for the current word. Unknown words are classified according to word
|
||
|
morphology or can be set to be treated as nouns or other parts of speech.
|
||
|
|
||
|
The tagger also extracts as many nouns and noun phrases as it can, using a
|
||
|
set of regular expressions.
|