The Observer pattern (also known as publish/subscribe) provides a simple
mechanism for one object to inform a set of interested third-party objects when
its state changes.
Lrama is LALR (1) parser generator written by Ruby. The first goal of this
project is providing error tolerant parser for CRuby with minimal changes on
CRuby parse.y file.
Features:
- Bison style grammar file is supported with some assumptions
- b4_locations_if is always true
- b4_pure_if is always true
- b4_pull_if is always false
- b4_lac_if is always false
- Error Tolerance parser
- Subset of Repairing Syntax Errors in LR Parsers (Corchuelo et al.) algorithm
is supported.
This is the collection of nam files (codepoint subsets) that are used to subset
fonts before serving on the Google Fonts CSS API.
The Python module gfsubsets provides an interface to these subset definitions.
It exports the following functions:
- CodepointsInFont(filename): Lists the Unicode codepoints supported by the font
- ListSubsets(): Returns the name of all defined subsets.
- SubsetsForCodepoint(cp): Returns the names of all subsets including the
codepoint.
- SubsetForCodepoint(cp): Returns the name of the "most relevant" subset
including the codepoint.
- CodepointsInSubset(subset): Returns a set of codepoints included in the
subset.
- SubsetsInFont(filename, min_pct, ext_min_pct): Returns the name of subsets
"well" supported by a font.
htmldate finds original and updated publication dates of any web page. From the
command-line or within Python, all the steps needed from web page download to
HTML parsing, scraping, and text analysis are included.
Starting with spaCy v3.2, alternate loggers are moved into a separate package so
that they can be added and updated independently from the core spaCy library.
spacy-loggers also provides additional utility loggers to facilitate
interoperation between individual loggers.
Structured NLP with LLMs
spacy-llm integrates Large Language Models (LLMs) into spaCy, featuring a
modular system for fast prototyping and prompting, and turning unstructured
responses into robust outputs for various NLP tasks, no training data required.
spacy-legacy includes outdated registered functions for spaCy v3.x, for example
model architectures, pipeline components and utilities. It's installed
automatically as a dependency of spaCy, and allows us to provide backwards
compatibility, while keeping the core library tidy and up to date. All of this
happens under the hood, so you typically shouldn't have to care about this
package.
spaCy is a library for advanced Natural Language Processing in Python and
Cython. It's built on the very latest research, and was designed from day one to
be used in real products.
spaCy comes with pretrained pipelines and currently supports tokenization and
training for 70+ languages. It features state-of-the-art speed and neural
network models for tagging, parsing, named entity recognition, text
classification and more, multi-task learning with pretrained transformers like
BERT, as well as a production-ready training system and easy model packaging,
deployment and workflow management.
sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn
more interesting and detailed word vectors. This library is a simple Python
implementation for loading, querying and training sense2vec models.
language_data is a supplement to the langcodes module, for working with
standardized codes for human languages. It stores the more bulky and
hard-to-index data about languages, particularly what they are named in various
languages.
The functions and test cases for working with this data are in langcodes,
because working with the data correctly requires parsing language codes.
langcodes knows what languages are. It knows the standardized codes that refer
to them, such as en for English, es for Spanish and hi for Hindi.
These are IETF language tags. You may know them by their old name, ISO 639
language codes. IETF has done some important things for backward compatibility
and supporting language variations that you won't find in the ISO standard.
The Slack Events Adapter is a Python-based solution to receive and parse events
from Slack's Events API. This library uses an event emitter framework to allow
you to easily process Slack events by simply attaching functions to event
listeners.
This adapter enhances and simplifies Slack's Events API by incorporating useful
best practices, patterns, and opportunities to abstract out common tasks.
mattermostdriver is the Python Mattermost Driver for API v4.
You interact with this module mainly by using the Driver class. If you want to
access information about the logged in user, like the user id, you can access
them by using Driver.client.userid.
The err-backend-slackv3 backend lets you connect to the Slack messaging service
using the Real-time Messaging Protocol, Events Request-URL or Events Socket
mode.
pyprobables is a pure-python library for probabilistic data structures. The goal
is to provide the developer with a pure-python implementation of common
probabilistic data-structures to use in their work.
To achieve better raw performance, it is recommended supplying an alternative
hashing algorithm that has been compiled in C. This could include using the MD5
and SHA512 algorithms provided or installing a third party package and writing
your own hashing strategy. Some options include the murmur hash mmh3 or those
from the pyhash library. Each data object in pyprobables makes it easy to pass
in a custom hashing function.
Weasel lets you manage and share end-to-end workflows for different use cases
and domains, and orchestrate training, packaging and serving your custom
pipelines. You can start off by cloning a pre-defined project template, adjust
it to fit your needs, load in your data, train a pipeline, export it as a Python
package, upload your outputs to a remote storage and share your results with
your team. Weasel can be used via the weasel command and we provide templates in
our projects repo.
Thinc is a lightweight deep learning library that offers an elegant,
type-checked, functional-programming API for composing models, with support for
layers defined in other frameworks such as PyTorch, TensorFlow and MXNet. You
can use Thinc as an interface layer, a standalone toolkit or a flexible way to
develop new models. Previous versions of Thinc have been running quietly in
production in thousands of companies, via both spaCy and Prodigy.
CodSpeed is a continuous benchmarking platform that allows you to track and
compare the performance of your codebase during development.
It uses a smart runtime engine to measure the performance of your code in an
accurate and reproducible way without creating a huge runtime overhead, unlike
traditional benchmarks. CodSpeed produces detailed performance reports, helping
you improve your codebase performance, directly within your repository
provider(Pull Requests comments, Merge checks, ...).
mmh3 is a Python extension for MurmurHash (MurmurHash3), a set of fast and
robust non-cryptographic hash functions invented by Austin Appleby.
Combined with probabilistic techniques like a Bloom filter, MinHash, and feature
hashing, mmh3 allows you to develop high-performance systems in fields such as
data mining, machine learning, and natural language processing.
Another common use of mmh3 is to calculate favicon hashes used by Shodan, the
world's first IoT search engine.
MARISA Trie provides static memory-efficient Trie-like structures for Python
based on marisa-trie C++ library.
String data in a MARISA-trie may take up to 50x-100x less memory than in a
standard Python dict; the raw lookup speed is comparable; trie also provides
fast advanced methods like prefix search.
Note: There are official SWIG-based Python bindings included in C++ library
distribution; this package provides alternative Cython-based pip-installable
Python bindings.
dirty-equals is a python library that (mis)uses the __eq__ method to make python
code (generally unit tests) more declarative and therefore easier to read and
write.
dirty-equals can be used in whatever context you like, but it comes into its own
when writing unit tests for applications where you're commonly checking the
response to API calls and the contents of a database.
confection is a lightweight library that offers a configuration system letting
you conveniently describe arbitrary trees of objects.
Configuration is a huge challenge for machine-learning code because you may want
to expose almost any detail of any function as a hyperparameter. The setting you
want to expose might be arbitrarily far down in your call stack, so it might
need to pass all the way through the CLI or REST API, through any number of
intermediate functions, affecting the interface of everything along the way. And
then once those settings are added, they become hard to remove later. Default
values also become hard to change without breaking backwards compatibility.
To solve this problem, confection offers a config system that lets you easily
describe arbitrary trees of objects. The objects can be created via function
calls you register using a simple decorator syntax. You can even version the
functions you create, allowing you to make improvements without breaking
backwards compatibility. The most similar config system we're aware of is Gin,
which uses a similar syntax, and also allows you to link the configuration
system to functions in your code using a decorator. confection's config system
is simpler and emphasizes a different workflow via a subset of Gin's
functionality.
cloudpathlib is a Python library with classes that mimic pathlib.Path's
interface for URIs from different cloud storage services.
Why use cloudpathlib?
- Familiar: If you know how to interact with Path, you know how to interact with
CloudPath. All of the cloud-relevant Path methods are implemented.
- Supported clouds: AWS S3, Google Cloud Storage, and Azure Blob Storage are
implemented. FTP is on the way.
- Extensible: The base classes do most of the work generically, so implementing
two small classes MyPath and MyClient is all you need to add support for a new
cloud storage service.
- Read/write support: Reading just works. Using the write_text, write_bytes or
.open('w') methods will all upload your changes to cloud storage without any
additional file management as a developer.
- Seamless caching: Files are downloaded locally only when necessary. You can
also easily pass a persistent cache folder so that across processes and
sessions you only re-download what is necessary.
- Tested: Comprehensive test suite and code coverage.
- Testability: Local filesystem implementations that can be used to easily mock
cloud storage in your unit tests.
meta provides an API for metaprogramming; that is, allowing code to inspect or
manipulate parts of its own program structure. Parts of the perl interpreter
itself can be accessed by means of "meta"-objects provided by this package.
Methods on these objects allow inspection of details, as well as creating new
items or removing existing ones.
The intention of this API is to provide a nicer replacement for existing tricks
such as no strict 'refs' and using globrefs, and also to be a more consistent
place to add new abilities, such as more APIs for inspection and alteration of
internal structures, metaprogramming around the new 'class' feature, and other
such uses.