Commit graph

8 commits

Author SHA1 Message Date
wiz
67e1f1a6bf python: egg.mk: add USE_PKG_RESOURCES flag
This flag should be set for packages that import pkg_resources
and thus need setuptools after the build step.

Set this flag for packages that need it and bump PKGREVISION.
2022-01-05 15:40:56 +00:00
wiz
bb579283d0 *: bump PKGREVISION for egg.mk users
They now have a tool dependency on py-setuptools instead of a DEPENDS
2022-01-04 20:53:26 +00:00
wiz
14351a3850 py-nltk: update to 3.6.5.
Version 3.6.5 2021-10-11

* modernised nltk.org website
* addressed LGTM.com issues
* support ZWJ sequences emoji and skin tone modifer emoji in TweetTokenizer
* METEOR evaluation now requires pre-tokenized input
* Code linting and type hinting
* implement get_refs function for DrtLambdaExpression
* Enable automated CoreNLP, Senna, Prover9/Mace4, Megam, MaltParser CI tests
* specify minimum regex version that supports regex.Pattern
* avoid re.Pattern and regex.Pattern which fail for Python 3.6, 3.7

Version 3.6.4 2021-10-01

* deprecate `nltk.usage(obj)` in favor of `help(obj)`
* resolve ReDoS vulnerability in Corpus Reader
* solidify performance tests
* improve phone number recognition in tweet tokenizer
* refactored CISTEM stemmer for German
* identify NLTK Team as the author
* replace travis badge with github actions badge
* add SECURITY.md

Version 3.6.3 2021-09-19
* Dropped support for Python 3.5
* Run CI tests on Windows, too
* Moved from Travis CI to GitHub Actions
* Code and comment cleanups
* Visualize WordNet relation graphs using Graphviz
* Fixed large error in METEOR score
* Apply isort, pyupgrade, black, added as pre-commit hooks
* Prevent debug_decisions in Punkt from throwing IndexError
* Resolved ZeroDivisionError in RIBES with dissimilar sentences
* Initialize WordNet IC total counts with smoothing value
* Fixed AttributeError for Arabic ARLSTem2 stemmer
* Many fixes and improvements to lm language model package
* Fix bug in nltk.metrics.aline, C_skip = -10
* Improvements to TweetTokenizer
* Optional show arg for FreqDist.plot, ConditionalFreqDist.plot
* edit_distance now computes Damerau-Levenshtein edit-distance

Version 3.6.2 2021-04-20
* move test code to nltk/test
* clean up some doctests
* fix bug in NgramAssocMeasures (order preserving fix)
* fixes for compatibility with Pypy 7.3.4

Version 3.6 2021-04-07
* add support for Python 3.9
* add Tree.fromlist
* compute Minimum Spanning Tree of unweighted graph using BFS
* fix bug with infinite loop in Wordnet closure and tree
* fix bug in calculating BLEU using smoothing method 4
* Wordnet synset similarities work for all pos
* new Arabic light stemmer (ARLSTem2)
* new syllable tokenizer (LegalitySyllableTokenizer)
* remove nose in favor of pytest
* misc bug fixes, code cleanups, test cleanups, efficiency improvements
2021-11-24 16:00:18 +00:00
nia
a643c936b3 textproc: Replace RMD160 checksums with BLAKE2s checksums
All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Unfetchable distfiles (fetched conditionally?):
./textproc/convertlit/distinfo clit18src.zip
2021-10-26 11:21:28 +00:00
nia
e05b375eba textproc: Remove SHA1 hashes for distfiles 2021-10-07 15:00:34 +00:00
adam
627c6075a0 py-nltk: updated to 3.5
Version 3.5
* add support for Python 3.8
* drop support for Python 2
* create NLTK's own Tokenizer class distinct from the Treebank reference tokeniser
* update Vader sentiment analyser
* fix JSON serialization of some PoS taggers
* minor improvements in grammar.CFG, Vader, pl196x corpus reader, StringTokenizer
* change implementation <= and >= for FreqDist so they are partial orders
* make FreqDist iterable
* correctly handle Penn Treebank trees with a unlabeled branching top node.
2020-08-10 14:43:10 +00:00
rillig
9637f7852e all: migrate homepages from http to https
pkglint -r --network --only "migrate"

As a side-effect of migrating the homepages, pkglint also fixed a few
indentations in unrelated lines. These and the new homepages have been
checked manually.
2020-01-26 17:30:40 +00:00
wiz
a7cc7ba0ad textproc/py-nltk: import py-nltk-3.4.1
Based on wip version packaged by leot, Hiramatsu Yoshifumi,
Kamel Ibn Aziz Derouiche, and myself.

NLTK - the Natural Language Toolkit - is a suite of open source
Python modules, data and documentation for research and development
in natural language processing. NLTK contains code supporting dozens
of NLP tasks, along with 30 popular Corpora and extensive documentation
including a 360-page online book.
2019-05-28 14:10:04 +00:00