Scrapy is a fast high-level web crawling and web scraping framework, used to
crawl websites and extract structured data from their pages. It can be used for
a wide range of purposes, from data mining to monitoring and automated testing.
Parsel is a library to extract data from HTML and XML using XPath and CSS
selectors.
Features:
* Extract text using CSS or XPath selectors
* Regular expression helper methods
* remove comments, or tags from HTML snippets
* extract base url from HTML snippets
* translate entites on HTML strings
* convert raw HTTP headers to dicts and vice-versa
* construct HTTP auth header
* converting HTML pages to unicode
* sanitize urls (like browsers do)
* extract arguments from urls
Dispatcher mechanism for creating event models
PyDispatcher is an enhanced version of Patrick K. O'Brien's original
dispatcher.py module. It provides the Python programmer with a robust mechanism
for event routing within various application contexts.
Included in the package are the robustapply and saferef modules, which provide
the ability to selectively apply arguments to callable objects and to reference
instance methods using weak-references.
Queuelib is a collection of persistent (disk-based) queues for Python.
Queuelib goals are speed and simplicity. It was originally part of the Scrapy
framework and stripped out on its own library.
M2R converts a markdown file including reST markups to a valid reST format.
Features:
* Basic markdown and some extensions
o inline/block-level raw html
o fenced-code block
o tables
o footnotes ([^1])
* Inline- and Block-level reST markups
o single- and multi-line directives (.. directive::)
o inline-roles (:code:`print(1)` ...)
o ref-link (see `ref`_)
o footnotes ([#fn]_)
o math extension inspired by recommonmark
* Sphinx extension
o add markdown support for sphinx
o mdinclude directive to include markdown from md or reST files
The fastest markdown parser in pure Python with renderer features, inspired by
marked.
Features:
* Pure Python. Tested in Python 2.6+, Python 3.3+ and PyPy.
* Very Fast. It is the fastest in all pure Python markdown parsers.
* More Features. Table, footnotes, autolink, fenced code etc.
Changelog (abridged):
- rsyslog now builds on AIX
- mmdblookup: new maxminddb lookup message modify plugin
- mmrm1stspace: new module; removes first space in MSG if present
- KSI signature provider: file permissions can now be specified
- omzmq: new features
- change: when the hostname is empty, we now use "localhost-empty-hostname"
- omelasticsearch: remove "asyncrepl" config parameter
- omfwd: Add support for bind-to-device (see below on same for imudp)
- imudp: Add support for bind-to-device
- imudp: limit rcvbufsize parameter to max 1GiB
- rainerscript: implement new "call_indirect" statement
- bugfix imjournal: make state file handling more robust
- bugfix core: lookup table reload was not properly integrated
- bugfix core: potential dealock on shutdown
- bugfix ommongodb: did not work in v8 due to invalid indirection
- bugfix ommongodb: fix tryResume handling
- bugfix omfwd: retry processing was not done correctly, could stall
- bugfix imuxsock: segfault non shutdown when $OmitLocalLogging is on
================================
Features
--------
- Added a new interface,
twisted.internet.interfaces.IHostnameResolver, which is an
improvement to twisted.internet.interfaces.IResolverSimple that
supports resolving multiple addresses as well as resolving IPv6
addresses. This is a native, asynchronous, Twisted analogue to
getaddrinfo. (bug-4362)
- twisted.web.client.Agent now uses HostnameEndpoint internally; as a
consequence, it now supports IPv6, as well as making connections
faster and more reliably to hosts that have more than one DNS name.
(bug-6712)
- twisted.internet.ssl.CertificateOptions now has the new constructor
argument 'raiseMinimumTo', allowing you to increase the minimum TLS
version to this version or Twisted's default, whichever is higher.
The additional new constructor arguments 'lowerMaximumSecurityTo'
and 'insecurelyLowerMinimumTo' allow finer grained control over
negotiated versions that don't honour Twisted's defaults, for
working around broken peers, at the cost of reducing the security
of the TLS it will negotiate. (bug-6800)
- twisted.internet.ssl.CertificateOptions now sets the OpenSSL
context's mode to MODE_RELEASE_BUFFERS, which will free the
read/write buffers on idle TLS connections to save memory. (bug-8247)
- trial --help-reactors will only list reactors which can be
imported. (bug-8745)
- twisted.internet.endpoints.HostnameEndpoint now uses the passed
reactor's implementation of
twisted.internet.interfaces.IReactorPluggableResolver to resolve
hostnames rather than its own deferToThread/getaddrinfo wrapper;
this makes its hostname resolution pluggable via a public API.
(bug-8922)
- twisted.internet.reactor.spawnProcess now does not emit a
deprecation warning on Unicode arguments. It will encode Unicode
arguments down to bytes using the filesystem encoding on UNIX and
Python 2 on Windows, and pass Unicode through unchanged on Python 3
on Windows. (bug-8941)
- twisted.trial._dist.test.test_distreporter now works on Python 3.
(bug-8943)
Bugfixes
--------
- trial --help-reactors will now display iocp and win32er reactors
with Python 3. (bug-8745)
- twisted.logger._flatten.flattenEvent now handles log_format being
None instead of assuming the value is always a string. (bug-8860)
- twisted.protocol.ftp is now Python 3 compatible (bug-8865)
- twisted.names.client.Resolver can now resolve names with IPv6 DNS
servers. (bug-8877)
- twisted.application.internet.ClientService now waits for existing
connections to disconnect before trying to connect again when
restarting. (bug-8899)
- twisted.internet.unix.Server.doRead and
twisted.internet.unix.Client.doRead no longer fail if recvmsg's
ancilliary data contains more than one file descriptor. (bug-8911)
- twist on Python 3 now correctly prints the help text when given no
plugin to run. (bug-8918)
- twisted.python.sendmsg.sendmsg no longer segfaults on Linux +
Python 2. (bug-8969)
- IHandshakeListener providers connected via SSL4ClientEndpoint will
now have their handshakeCompleted methods called. (bug-8973)
- The twist script now respects the --reactor option. (bug-8983)
- Fix crash when using SynchronousTestCase with Warning object which
does not store a string as its first argument (like
libmysqlclient). (bug-9005)
- twisted.python.compat.execfile() does not open files with the
deprecated 'U' flag on Python 3. (bug-9012)
Deprecations and Removals
-------------------------
- twisted.internet.ssl.CertificateOption's 'method' constructor
argument is now deprecated, in favour of the new 'raiseMinimumTo',
'lowerMaximumSecurityTo', and 'insecurelyLowerMinimumTo' arguments.
(bug-6800)
- twisted.protocols.telnet (not to be confused with the supported
twisted.conch.telnet), deprecated since Twisted 2.5, has been
removed. (bug-8925)
- twisted.application.strports.parse, as well as the deprecated
default arguments in strports.service/listen, deprecated since
Twisted 10.2, has been removed. (bug-8926)
- twisted.web.client.getPage and twisted.web.client.downloadPage have
been deprecated in favour of https://pypi.org/project/treq and
twisted.web.client.Agent. (bug-8960)
- twisted.internet.defer.timeout is deprecated in favor of
twisted.internet.defer.Deferred.addTimeout (bug-8971)
* Attributes now can have user-defined metadata which greatly improves attrs‘s extensibility.
* Allow for a __attrs_post_init__ method that – if defined – will get called at the end of the attrs-generated __init__ method.
* Add @attr.s(str=True) that will optionally create a __str__ method that is identical to __repr__. This is mainly useful with Exceptions and other classes that rely on a useful __str__ implementation but overwrite the default one through a poor own one. Default Python class behavior is to use __repr__ as __str__ anyways.
If you tried using attrs with Exceptions and were puzzled by the tracebacks: this option is for you.
* Don’t overwrite __name__ with __qualname__ for attr.s(slots=True) classes.
librelp is a core protocol library for RELP, the "reliable event
logging protocol". It was created to provide ultra-reliable
delivery of syslog messages and is quite good at that.
* Approximately 25% better performance from the R-Tree extension.
* Uses compiler built-ins (ex: __builtin_bswap32() or _byteswap_ulong()) for byteswapping when available.
* Uses the sqlite3_blob key/value access object instead of SQL for pulling content out of R-Tree nodes
* Other miscellaneous enhancements such as loop unrolling.
* Add the SQLITE_DEFAULT_LOOKASIDE compile-time option.
* Increase the default lookaside size from 512,125 to 1200,100 as this provides better performance while only adding 56KB of extra memory per connection. Memory-sensitive applications can restore the old default at compile-time, start-time, or run-time.
* Use compiler built-ins __builtin_sub_overflow(), __builtin_add_overflow(), and __builtin_mul_overflow() when available. (All compiler built-ins can be omitted with the SQLITE_DISABLE_INTRINSIC compile-time option.)
* Added the SQLITE_ENABLE_NULL_TRIM compile-time option, which can result in significantly smaller database files for some applications, at the risk of being incompatible with older versions of SQLite.
* Change SQLITE_DEFAULT_PCACHE_INITSZ from 100 to 20, for improved performance.
* Added the SQLITE_UINT64_TYPE compile-time option as an analog to SQLITE_INT64_TYPE.
* Perform some UPDATE operations in a single pass instead of in two passes.
* Enhance the session extension to support WITHOUT ROWID tables.
* Fixed performance problems and potential stack overflows when creating views from multi-row VALUES clauses with hundreds of thousands of rows.
* Added the sha1.c extension.
* In the command-line shell, enhance the ".mode" command so that it restores the default column and row separators for modes "line", "list", "column", and "tcl".
* Enhance the SQLITE_DIRECT_OVERFLOW_READ option so that it works in WAL mode as long as the pages being read are not in the WAL file.
* Enhance the LEMON parser generator so that it can store the parser object as a stack variable rather than allocating space from the heap and make use of that enhancement in the amalgamation.
* Other performance improvements. Uses about 6.5% fewer CPU cycles.
Bug Fixes:
* Throw an error if the ON clause of a LEFT JOIN references tables to the right of the ON clause. This is the same behavior as PostgreSQL. Formerly, SQLite silently converted the LEFT JOIN into an INNER JOIN.
* Use the correct affinity for columns of automatic indexes.
* Ensure that the sqlite3_blob_reopen() interface can correctly handle short rows.
Use portend to monitor TCP ports for bound or unbound states.
For example, to wait for a port to be occupied, timing out after 3 seconds::
portend.occupied('www.pkgsrc.org', 80, timeout=3)
Or to wait for a port to be free, timing out after 5 seconds::
portend.free('::1', 80, timeout=5)
The portend may also be executed directly. If the function succeeds, it
returns nothing and exits with a status of 0. If it fails, it prints a
message and exits with a status of 1. For example::
python -m portend localhost:31923 free
(exits immediately)
python -m portend -t 1 localhost:31923 occupied
(one second passes)
Port 31923 not bound on localhost.
Objects and routines pertaining to date and time (tempora).
Modules include:
- tempora (top level package module) contains miscellaneous
utilities and constants.
- timing contains routines for measuring and profiling.
- schedule contains an event scheduler.
0.9.2
=====
released 2017-02-13
* FIX if weekstart != 0 ikhal would show wrong weekday names
* FIX allday events added with `khal new DATE TIMEDELTA` (e.g., 2017-01-18 3d)
were lasting one day too long
* FIX no more crashes when using timezones that have a constant UTC offset (like
UTC itself)
* FIX updated outdated zsh completion file
* FIX display search results for events with neither DTEND nor DURATION
* FIX display search results that are all-day events
* in ikhal, update the date-titles on date change
* FIX printing a new event's path if [default] print_new = path
* FIX width of calendar in `khal calendar` was off by two if locale.weeknumbers
was set to "right"
* CHANGED default `agenda_day_format` to include the actual date of the day
* NEW configuration option: [view]dynamic_days = True, if set to False, ikhal's
right column behaves similar as it did in 0.8.x
FEATURES:
- Okta Authentication: A new Okta authentication backend allows you to use
Okta usernames and passwords to authenticate to Vault. If provided with an
appropriate Okta API token, group membership can be queried to assign
policies; users and groups can be defined locally as well.
- RADIUS Authentication: A new RADIUS authentication backend allows using
a RADIUS server to authenticate to Vault. Policies can be configured for
specific users or for any authenticated user.
- Exportable Transit Keys: Keys in `transit` can now be marked as
`exportable` at creation time. This allows a properly ACL'd user to retrieve
the associated signing key, encryption key, or HMAC key. The `exportable`
value is returned on a key policy read and cannot be changed, so if a key is
marked `exportable` it will always be exportable, and if it is not it will
never be exportable.
- Batch Transit Operations: `encrypt`, `decrypt` and `rewrap` operations
in the transit backend now support processing multiple input items in one
call, returning the output of each item in the response.
- Configurable Audited HTTP Headers: You can now specify headers that you
want to have included in each audit entry, along with whether each header
should be HMAC'd or kept plaintext. This can be useful for adding additional
client or network metadata to the audit logs.
- Transit Backend UI (Enterprise): Vault Enterprise UI now supports the transit
backend, allowing creation, viewing and editing of named keys as well as using
those keys to perform supported transit operations directly in the UI.
- Socket Audit Backend A new socket audit backend allows audit logs to be sent
through TCP, UDP, or UNIX Sockets.
IMPROVEMENTS:
- auth/aws-ec2: Add support for cross-account auth using STS
- auth/aws-ec2: Support issuing periodic tokens
- auth/github: Support listing teams and users
- auth/ldap: Support adding policies to local users directly, in addition to
local groups
- command/server: Add ability to select and prefer server cipher suites
- core: Add a nonce to unseal operations as a check (useful mostly for
support, not as a security principle)
- duo: Added ability to supply extra context to Duo pushes
- physical/consul: Add option for setting consistency mode on Consul gets
- physical/etcd: Full v3 API support; code will autodetect which API version
to use. The v3 code path is significantly less complicated and may be much
more stable.
- secret/pki: Allow specifying OU entries in generated certificate subjects
- secret mount ui (Enterprise): the secret mount list now shows all mounted
backends even if the UI cannot browse them. Additional backends can now be
mounted from the UI as well.
BUG FIXES:
- auth/token: Fix regression in 0.6.4 where using token store roles as a
blacklist (with only `disallowed_policies` set) would not work in most
circumstances
- physical/s3: Page responses in client so list doesn't truncate
- secret/cassandra: Stop a connection leak that could occur on active node
failover
- secret/pki: When using `sign-verbatim`, don't require a role and use the
CSR's common name
Updated devel/p5-MooseX-Singleton to 0.30
Updated devel/p5-Log-Dispatch-FileRotate to 1.23
Updated devel/p5-Hash-FieldHash to 0.15
Updated devel/p5-Glib-Object-Introspection to 0.042
Updated devel/p5-File-Save-Home to 0.10
Updated devel/p5-MooX-Types-MooseLike-Numeric to 1.03
Updated devel/p5-File-Find-Object to 0.3.2
Updated devel/p5-File-ChangeNotify to 0.27
------------------------------------------
0.27 2017-01-30
- Inflating File::ChangeNotify::Default::Watcher into a Moose object with
Moose 2.2000 would cause an error or warning because of a bug in how it
defined an attribute. This broke Catalyst::Restarter.
(pkgsrc changes)
- Drop following line, see above
DEPENDS+= p5-Moose>=2:../../devel/p5-Moose
- Add following lines for make test
BUILD_DEPENDS+= p5-Moo-[0-9]*:../../devel/p5-Moo
BUILD_DEPENDS+= p5-Test-Requires-[0-9]*:../../devel/p5-Test-Requires
BUILD_DEPENDS+= p5-Test-Exception-[0-9]*:../../devel/p5-Test-Exception
BUILD_DEPENDS+= p5-Module-Pluggable-[0-9]*:../../devel/p5-Module-Pluggable
Notable changes
- crypto:
- ability to select cert store at runtime
- Use system CAs instead of using bundled ones
- deps:
- upgrade npm to 4.1.2
- upgrade openssl sources to 1.0.2k
- doc: add basic documentation for WHATWG URL API
- process: add NODE_NO_WARNINGS environment variable
- url: allow use of URL with http.request and https.request
Upstream changes:
1.940 2017-01-29 10:33:45-05:00 America/New_York
- no code changes since 1.939 trial release
1.939 2017-01-14 14:58:44-05:00 America/New_York (TRIAL RELEASE)
- do not decode MIME headers known to be never encoded (Pali Roh獺r)
- ...and that includes the Downgraded-* headers (Pali Roh獺r)
1.938 2017-01-01 20:03:38-05:00 America/New_York (TRIAL RELEASE)
- numerous small fixes to header encoding (thanks, Pali Roh獺r)
for more details see https://github.com/rjbs/Email-MIME/pull/32
- When a single-part content type has been provided with multiple
parts, the user is now warned that the type has been changed to
multipart/mixed. This helps catch typos like
"mutlipart/alternative".
------------------------------------------
0.3.2 2017-01-13
- Made the version number consistent across the .pm files.
- https://bitbucket.org/shlomif/perl-file-find-object/issues/1/wrong-version-number
- Thanks to aer0 for the report.
0.3.1 2017-01-09
- Fixed an issue with tracking the depth of the inodes when detecting
a symlink loop.
- Detected by several cygwin reports.
-----------------------------------------------------
1.03 - 2017-01-20
- Add Moo to Build requirements (wpmoore/mendoza)
- Improve POD (meAmidos)
- Simplify type constraint tests by making use of subtype (meAmidos)