d5f9a1b1d6
Scrapy 1.7.3: Enforce lxml 4.3.5 or lower for Python 3.4 (issue 3912, issue 3918). Scrapy 1.7.2: Fix Python 2 support (issue 3889, issue 3893, issue 3896). Scrapy 1.7.1: Re-packaging of Scrapy 1.7.0, which was missing some changes in PyPI. Scrapy 1.7.0: Highlights: Improvements for crawls targeting multiple domains A cleaner way to pass arguments to callbacks A new class for JSON requests Improvements for rule-based spiders New features for feed exports Backward-incompatible changes 429 is now part of the RETRY_HTTP_CODES setting by default This change is backward incompatible. If you don’t want to retry 429, you must override RETRY_HTTP_CODES accordingly. Crawler, CrawlerRunner.crawl and CrawlerRunner.create_crawler no longer accept a Spider subclass instance, they only accept a Spider subclass now. Spider subclass instances were never meant to work, and they were not working as one would expect: instead of using the passed Spider subclass instance, their from_crawler method was called to generate a new instance. Non-default values for the SCHEDULER_PRIORITY_QUEUE setting may stop working. Scheduler priority queue classes now need to handle Request objects instead of arbitrary Python data structures. New features A new scheduler priority queue, scrapy.pqueues.DownloaderAwarePriorityQueue, may be enabled for a significant scheduling improvement on crawls targetting multiple web domains, at the cost of no CONCURRENT_REQUESTS_PER_IP support (issue 3520) A new Request.cb_kwargs attribute provides a cleaner way to pass keyword arguments to callback methods (issue 1138, issue 3563) A new JSONRequest class offers a more convenient way to build JSON requests (issue 3504, issue 3505) A process_request callback passed to the Rule constructor now receives the Response object that originated the request as its second argument (issue 3682) A new restrict_text parameter for the LinkExtractor constructor allows filtering links by linking text (issue 3622, issue 3635) A new FEED_STORAGE_S3_ACL setting allows defining a custom ACL for feeds exported to Amazon S3 (issue 3607) A new FEED_STORAGE_FTP_ACTIVE setting allows using FTP’s active connection mode for feeds exported to FTP servers (issue 3829) A new METAREFRESH_IGNORE_TAGS setting allows overriding which HTML tags are ignored when searching a response for HTML meta tags that trigger a redirect (issue 1422, issue 3768) A new redirect_reasons request meta key exposes the reason (status code, meta refresh) behind every followed redirect (issue 3581, issue 3687) The SCRAPY_CHECK variable is now set to the true string during runs of the check command, which allows detecting contract check runs from code (issue 3704, issue 3739) A new Item.deepcopy() method makes it easier to deep-copy items (issue 1493, issue 3671) CoreStats also logs elapsed_time_seconds now (issue 3638) Exceptions from ItemLoader input and output processors are now more verbose (issue 3836, issue 3840) Crawler, CrawlerRunner.crawl and CrawlerRunner.create_crawler now fail gracefully if they receive a Spider subclass instance instead of the subclass itself (issue 2283, issue 3610, issue 3872) Bug fixes process_spider_exception() is now also invoked for generators (issue 220, issue 2061) System exceptions like KeyboardInterrupt are no longer caught (issue 3726) ItemLoader.load_item() no longer makes later calls to ItemLoader.get_output_value() or ItemLoader.load_item() return empty data (issue 3804, issue 3819) The images pipeline (ImagesPipeline) no longer ignores these Amazon S3 settings: AWS_ENDPOINT_URL, AWS_REGION_NAME, AWS_USE_SSL, AWS_VERIFY (issue 3625) Fixed a memory leak in MediaPipeline affecting, for example, non-200 responses and exceptions from custom middlewares (issue 3813) Requests with private callbacks are now correctly unserialized from disk (issue 3790) FormRequest.from_response() now handles invalid methods like major web browsers
31 lines
1.1 KiB
Makefile
31 lines
1.1 KiB
Makefile
# $NetBSD: Makefile,v 1.9 2019/08/22 08:21:11 adam Exp $
|
|
|
|
DISTNAME= Scrapy-1.7.3
|
|
PKGNAME= ${PYPKGPREFIX}-${DISTNAME:tl}
|
|
CATEGORIES= www python
|
|
MASTER_SITES= ${MASTER_SITE_PYPI:=S/Scrapy/}
|
|
|
|
MAINTAINER= pkgsrc-users@NetBSD.org
|
|
HOMEPAGE= https://scrapy.org/
|
|
COMMENT= High-level Web Crawling and Web Scraping framework
|
|
LICENSE= modified-bsd
|
|
|
|
DEPENDS+= ${PYPKGPREFIX}-OpenSSL>=0.13.1:../../security/py-OpenSSL
|
|
DEPENDS+= ${PYPKGPREFIX}-cssselect>=0.9:../../textproc/py-cssselect
|
|
DEPENDS+= ${PYPKGPREFIX}-lxml>=3.2.4:../../textproc/py-lxml
|
|
DEPENDS+= ${PYPKGPREFIX}-parsel>=1.5:../../www/py-parsel
|
|
DEPENDS+= ${PYPKGPREFIX}-pydispatcher>=2.0.5:../../devel/py-pydispatcher
|
|
DEPENDS+= ${PYPKGPREFIX}-queuelib>=1.1.1:../../devel/py-queuelib
|
|
DEPENDS+= ${PYPKGPREFIX}-service_identity-[0-9]*:../../security/py-service_identity
|
|
DEPENDS+= ${PYPKGPREFIX}-six>=1.5.2:../../lang/py-six
|
|
DEPENDS+= ${PYPKGPREFIX}-twisted>=17.9.0:../../net/py-twisted
|
|
DEPENDS+= ${PYPKGPREFIX}-w3lib>=1.17.0:../../www/py-w3lib
|
|
|
|
USE_LANGUAGES= # none
|
|
|
|
post-install:
|
|
cd ${DESTDIR}${PREFIX}/bin && \
|
|
${MV} scrapy scrapy-${PYVERSSUFFIX} || ${TRUE}
|
|
|
|
.include "../../lang/python/egg.mk"
|
|
.include "../../mk/bsd.pkg.mk"
|