Back up GSoC'2020 blogs
This commit is contained in:
parent
00065e4c0d
commit
c2bcb5c691
|
@ -1,12 +1,12 @@
|
|||
name: build
|
||||
|
||||
|
||||
on:
|
||||
push:
|
||||
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
|
||||
|
|
|
@ -12,7 +12,8 @@ author = 'Nguyễn Gia Phong'
|
|||
# Add any Sphinx extension module names here, as strings.
|
||||
# They can be extensions coming with Sphinx (named 'sphinx.ext.*')
|
||||
# or your custom ones.
|
||||
extensions = ['sphinx.ext.githubpages']
|
||||
extensions = ['sphinx.ext.extlinks', 'sphinx.ext.githubpages']
|
||||
extlinks = {'pip': ('https://github.com/pypa/pip/pull/%s', 'GH-')}
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 495 KiB |
|
@ -0,0 +1,109 @@
|
|||
Unexpected Things When You're Expecting
|
||||
=======================================
|
||||
|
||||
Hi everyone, I hope that you are all doing well and wishes you all good health!
|
||||
The last week has not been really kind to me with a decent amount of
|
||||
academic pressure (my school year is lasting until early Jully).
|
||||
It would be bold to say that I have spent 10 hours working on my GSoC project
|
||||
since the last check-in, let alone the 30 hours per week requirement.
|
||||
That being said, there were still some discoveries that I wish to share.
|
||||
|
||||
The ``multiprocessing[.dummy]`` wrapper
|
||||
---------------------------------------
|
||||
|
||||
Most of the time I spent was to finalize the multi{processing,threading}
|
||||
wrapper for ``map`` function that submit tasks to the worker pool.
|
||||
To my surprise, it is rather difficult to write something that is
|
||||
not only portable but also easy to read and test.
|
||||
|
||||
By :pip:`the latest commit <8320>`, I realized the following:
|
||||
|
||||
#. The ``multiprocessing`` module was not designed for the implementation
|
||||
details to be abstracted away entirely. For example, the lazy ``map``'s
|
||||
could be really slow without specifying suitable chunk size
|
||||
(to cut the input iterable and distribute them to workers in the pool).
|
||||
By *suitable*, I mean only an order smaller than the input. This defeats
|
||||
half of the purpose of making it lazy: allowing the input to be
|
||||
evaluated lazily. Luckily, in the use case I'm aiming for, the length of
|
||||
the iterable argument is small and the laziness is only needed for the output
|
||||
(to pipeline download and installation).
|
||||
#. Mocking ``import`` for testing purposes can never be pretty. One reason
|
||||
is that we (Python users) have very little control over the calls of
|
||||
``import`` statements and its lower-level implementation ``__import__``.
|
||||
In order to properly patch this built-in function, unlike for others
|
||||
of the same group, we have to ``monkeypatch`` the name from ``builtins``
|
||||
(or ``__builtins__`` under Python 2) instead of the module that import stuff.
|
||||
Furthermore, because of the special namespacing, to avoid infinite recursion
|
||||
we need to alias the function to a different name for fallback.
|
||||
#. To add to the problem, ``multiprocessing`` lazily imports the fragile module
|
||||
during pools creation. Since the failure is platform-specific
|
||||
(the lack of ``sem_open``), it was decided to check upon the import
|
||||
of the ``pip``'s module. Although the behavior is easier to reason
|
||||
in human language, testing it requires invalidating cached import and
|
||||
re-import the wrapper module.
|
||||
#. Last but not least, I now understand the pain of keeping Python 2
|
||||
compatibility that many package maintainers still need to deal with
|
||||
everyday (although Python 2 has reached its end-of-life, ``pip``, for
|
||||
example, :pip:`will still support it for another year <6148>`).
|
||||
|
||||
The change in direction
|
||||
-----------------------
|
||||
|
||||
Since last week, my mentor Pradyun Gedam and I set up weekly real-time
|
||||
meeting (a fancy term for video/audio chat in the worldwide quarantine
|
||||
era) for the entire GSoC period. During the last session, we decided to
|
||||
put parallelization of download during resolution on hold, in favor of a
|
||||
more beneficial goal: :pip:`partially download the wheels during
|
||||
dependency resolution <7819>`.
|
||||
|
||||
.. image:: swirl.png
|
||||
|
||||
As discussed by Danny McClanahan and the maintainers of ``pip``, it is feasible
|
||||
to only download a few kB of a wheel to obtain enough metadata for
|
||||
the resolution of dependency. While this is only applicable to wheels
|
||||
(i.e. prebuilt packages), other packaging format only make up less than 20%
|
||||
of the downloads (at least on PyPI), and the figure is much less for
|
||||
the most popular packages. Therefore, this optimization alone could make
|
||||
`the upcoming backtracking resolver`_'s performance par with the legacy one.
|
||||
|
||||
During the last few years, there has been a lot of effort being poured into
|
||||
replacing ``pip``'s current resolver that is unable to resolve conflicts.
|
||||
While its correctness will be ensured by some of the most talented and
|
||||
hard-working developers in the Python packaging community, from the users'
|
||||
point of view, it would be better to have its performance not lagging
|
||||
behind the old one. Aside from the increase in CPU cycles for more
|
||||
rigorous resolution, more I/O, especially networking operations is expected
|
||||
to be performed. This is due to :pip:`the lack of a standard and efficient way
|
||||
to acquire the metadata <7406#issuecomment-583891169>`. Therefore, unlike
|
||||
most package managers we are familiar with, ``pip`` has to fetch
|
||||
(and possibly build) the packages solely for dependency informations.
|
||||
|
||||
Fortunately, :pep:`427#recommended-archiver-features` recommends
|
||||
package builders to place the metadata at the end of the archive.
|
||||
This allows the resolver to only fetch the last few kB using
|
||||
`HTTP range requests`_ for the relevant information.
|
||||
Simply appending ``Range: bytes=-8000`` to the request header
|
||||
in ``pip._internal.network.download`` makes the resolution process
|
||||
*lightning* fast. Of course this breaks the installation but I am confident
|
||||
that it is not difficult to implement this optimization cleanly.
|
||||
|
||||
One drawback of this optimization is the compatibility. Not every Python
|
||||
package index support range requests, and it is not possible to verify
|
||||
the partial wheel. While the first case is unavoidable, for the other,
|
||||
hashes checking is usually used for pinned/locked-version requirements,
|
||||
thus no backtracking is done during dependency resolution.
|
||||
|
||||
Either way, before installation, the packages selected by the resolver
|
||||
can be downloaded in parallel. This warranties a larger crowd of packages,
|
||||
compared to parallelization during resolution, where the number of downloads
|
||||
can be as low as one during trail of different versions of the same package.
|
||||
|
||||
Unfortunately, I have not been able to do much other than
|
||||
:pip:`a minor clean up <8411>`. I am looking forward to accomplishing more
|
||||
this week and seeing what this path will lead us too! At the moment,
|
||||
I am happy that I'm able to meet the blog deadline, at least in UTC!
|
||||
|
||||
.. _the upcoming backtracking resolver:
|
||||
http://www.ei8fdb.org/thoughts/2020/05/test-pips-alpha-resolver-and-help-us-document-dependency-conflicts/
|
||||
.. _HTTP range requests:
|
||||
https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests
|
|
@ -0,0 +1,110 @@
|
|||
The Wonderful Wizard of O'zip
|
||||
=============================
|
||||
|
||||
*Never give up… No one knows what's going to happen next.*
|
||||
|
||||
Preface
|
||||
-------
|
||||
|
||||
Greetings and best wishes! I had a lot of fun during the last week,
|
||||
although admittedly nothing was really finished. In summary,
|
||||
these are the works I carried out in the last seven days:
|
||||
|
||||
* Finilizing :pip:`utilities for parallelization <8320>`
|
||||
* :pip:`Continuing experimenting <8467>`
|
||||
on :pip:`using lazy wheels or dependency resolution <8442>`
|
||||
* Polishing up :pip:`the patch <8411>` refactoring
|
||||
``operations.prepare.prepare_linked_requirement``
|
||||
* Adding ``flake8-logging-format``
|
||||
:pip:`to the linter <8423#issuecomment-645418725>`
|
||||
* Splitting :pip:`the linting patch <8456>` from :pip:`the PR adding
|
||||
the license requirement to vendor README <8332>`
|
||||
|
||||
The ``multiprocessing[.dummy]`` wrapper
|
||||
---------------------------------------
|
||||
|
||||
Yes, you read it right, this is the same section as last fortnight's blog.
|
||||
My mentor Pradyun Gedam gave me a green light to have :pip:`8411` merged
|
||||
without support for Python 2 and the non-lazy map variant, which turns out
|
||||
to be troublesome for multithreading.
|
||||
|
||||
The tests still needs to pass of course and the flaky tests (see failing tests
|
||||
over Azure Pipeline in the past) really gave me a panic attack earlier today.
|
||||
We probably need to mark them as xfail or investigate why they are
|
||||
undeterministic specifically on Azure, but the real reason I was *all caught up
|
||||
and confused* was that the unit tests I added mess with the cached imports
|
||||
and as ``pip``'s tests are run in parallel, who knows what it might affect.
|
||||
I was so relieved to not discover any new set of tests made flaky by ones
|
||||
I'm trying to add!
|
||||
|
||||
The file-like object mapping ZIP over HTTP
|
||||
------------------------------------------
|
||||
|
||||
This is where the fun starts. Before we dive in, let's recall some
|
||||
background information on this. As discovered by Danny McClanahan
|
||||
in :pip:`7819`, it is possible to only download a potion of a wheel
|
||||
and it's still valid for ``pip`` to get the distribution's metadata.
|
||||
In the same thread, Daniel Holth suggested that one may use
|
||||
HTTP range requests to specifically ask for the tail of the wheel,
|
||||
where the ZIP's central directory record as well as where usually
|
||||
``dist-info`` (the directory containing ``METADATA``) can be found.
|
||||
|
||||
Well, *usually*. While :pep:`427` does indeed recommend
|
||||
|
||||
Archivers are encouraged to place the ``.dist-info`` files physically
|
||||
at the end of the archive. This enables some potentially interesting
|
||||
ZIP tricks including the ability to amend the metadata without
|
||||
rewriting the entire archive.
|
||||
|
||||
one of the mentioned *tricks* is adding shared libraries to wheels
|
||||
of extension modules (using e.g. ``auditwheel`` or ``delocate``).
|
||||
Thus for non-pure Python wheels, it is unlikely that the metadata
|
||||
lie in the last few megabytes. Ignoring source distributions is bad enough,
|
||||
we can't afford making an optimization that doesn't work for extension modules,
|
||||
which are still an integral part of the Python ecosystem )-:
|
||||
|
||||
But hey, the ZIP's directory record is warrantied to be at the end of the file!
|
||||
Couldn't we do something about that? The short answer is yes. The long answer
|
||||
is, well, yessssssss! That, plus magic provided by most operating systems,
|
||||
this is what we figured out:
|
||||
|
||||
#. We can download a realatively small chunk at the end of the wheel
|
||||
until it is recognizable as a valid ZIP file.
|
||||
#. In order for the end of the archive to actually appear as the end to
|
||||
``zipfile``, we feed to it an object with ``seek`` and ``read`` defined.
|
||||
As navigating to the rear of the file is performed by calling ``seek``
|
||||
with relative offset and ``whence=SEEK_END`` (see ``man 3 fseek``
|
||||
for more details), we are completely able to make the wheels in the cloud
|
||||
to behave as if it were available locally.
|
||||
|
||||
.. image:: cloud.gif
|
||||
|
||||
#. For large wheels, it is better to store them in hard disks instead of memory.
|
||||
For smaller ones, it is also preferable to store it as a file to avoid
|
||||
(error-prony and often not really efficient) manual tracking and joining
|
||||
of downloaded segments. We only use a small potion of the wheel, however
|
||||
just in case one is wonderring, we have very little control over
|
||||
when ``tempfile.SpooledTemporaryFile`` rolls over, so the memory-disk hybrid
|
||||
is not exactly working as expected.
|
||||
#. With all these in mind, all we have to do is to define an intermediate object
|
||||
check for local availability and download if needed on calls to ``read``,
|
||||
to lazily provide the data over HTTP and reduce execution time.
|
||||
|
||||
The only theoretical challenge left is to keep track of downloaded intervals,
|
||||
which I finally figured out after a few trials and errors. The code
|
||||
was submitted as a pull request to ``pip`` at :pip:`8467`. A more modern
|
||||
(read: Python 3-only) variant was packaged and uploaded to PyPI under
|
||||
the name of lazip_. I am unaware of any use case for it outside of ``pip``,
|
||||
but it's certainly fun to play with d-:
|
||||
|
||||
What's next?
|
||||
------------
|
||||
|
||||
I have been falling short of getting the PRs mention above merged for
|
||||
quite a while. With ``pip``'s next beta coming really soon, I have to somehow
|
||||
make the patches reach a certain standard and enough attention to be part of
|
||||
the pre-release—beta-testing would greatly help the success of the GSoC project.
|
||||
To other GSoC students and mentors reading this, I also hope your projects
|
||||
to turn out successful!
|
||||
|
||||
.. _lazip: https://pypi.org/project/lazip/
|
|
@ -0,0 +1,75 @@
|
|||
I'm Not Drowning On My Own
|
||||
==========================
|
||||
|
||||
Cold Water
|
||||
----------
|
||||
|
||||
Hello there! My schoolyear is coming to an end, with some final assignments
|
||||
and group projects left to be done. I for sure underestimated the workload
|
||||
of these and in the last (and probably next) few days I'm drowning in work
|
||||
trying to meet my deadlines.
|
||||
|
||||
One project that might be remotely relevant is cheese-shop_, which tries to
|
||||
manage the metadata of packages from the real `Cheese Shop`_. Other than that,
|
||||
schoolwork is draining a lot of my time and I can't remember the last time
|
||||
I came up with something new for my GSoC project )-;
|
||||
|
||||
Warm Water
|
||||
----------
|
||||
|
||||
On the bright side, I received a lot of help and encouragement
|
||||
from contributors and stakeholders of ``pip``. In the last week alone,
|
||||
I had five pull requests merged:
|
||||
|
||||
* :pip:`8332`: Add license requirement to ``_vendor/README.rst``
|
||||
* :pip:`8320`: Add utilities for parallelization
|
||||
* :pip:`8504`: Parallelize ``pip list --outdated`` and ``--uptodate``
|
||||
* :pip:`8411`: Refactor ``operations.prepare.prepare_linked_requirement``
|
||||
* :pip:`8467`: Add utitlity to lazily acquire wheel metadata over HTTP
|
||||
|
||||
In addition to helping me getting my PRs merged, my mentor Pradyun Gedam
|
||||
also gave me my first official feedback, including what I'm doing right
|
||||
(and wrong too!) and what I should keep doing to increase the chance of
|
||||
the project being successful.
|
||||
|
||||
:pip:`7819`'s roadmap (Danny McClanahan's discoveries and works on lazy wheels)
|
||||
is being closely tracked by ``hatch``'s maintainter Ofek Lev, which really
|
||||
makes me proud and warms my heart, that what I'm helping build is actually
|
||||
needed by the community!
|
||||
|
||||
Learning How To Swim
|
||||
--------------------
|
||||
|
||||
With :pip:`8467` and :pip:`8530` merged, I'm now working on :pip:`8532`
|
||||
which aims to roll out the lazy wheel as the way to obtain
|
||||
dependency information via the CLI flag ``--use-feature=lazy-wheel``.
|
||||
|
||||
:pip:`8532` was failing initially, despite being relatively trivial and that
|
||||
the commit it used to base on was passing. Surprisingly, after rebasing it
|
||||
on top of :pip:`8530`, it suddenly became green mysteriously. After the first
|
||||
(early) review, I was able to iterate on my earlier code, which used
|
||||
the ambiguous exception ``RuntimeError``.
|
||||
|
||||
The rest to be done is *just* adding some functional tests (I'm pretty sure
|
||||
this will be either overwhelming or underwhelming) to make sure that
|
||||
the command-line flag is working correctly. Hopefully this can make it into
|
||||
the beta of the upcoming release :pip:`this month <8511>`.
|
||||
|
||||
.. image:: lazy-wheel.jpg
|
||||
|
||||
In other news, I've also submitted :pip:`a patch improving the tests for
|
||||
the parallelization utilities <8538>`, which was really messy as I wrote them.
|
||||
Better late than never!
|
||||
|
||||
Metaphors aside, I actually can't swim d-:
|
||||
|
||||
Dive Plan
|
||||
---------
|
||||
|
||||
After :pip:`8532`, I think I'll try to parallelize downloads of wheels
|
||||
that are lazily fetched only for metadata. By the current implementation
|
||||
of the new resolver, for ``pip install``, this can be injected directly
|
||||
between the resolution and build/installation process.
|
||||
|
||||
.. _cheese-shop: https://github.com/McSinyx/cheese-shop
|
||||
.. _Cheese Shop: https://pypi.org
|
|
@ -0,0 +1,84 @@
|
|||
I've Walked 500 Miles…
|
||||
======================
|
||||
|
||||
.. epigraph::
|
||||
|
||||
| … and I would walk 500 more
|
||||
| Just to be the man who walks a thousand miles
|
||||
| To fall down at your door
|
||||
|
||||
.. image:: 500-miles.gif
|
||||
|
||||
The Main Road
|
||||
-------------
|
||||
|
||||
Hi, have you met ``fast-deps``? It's (going to be) the name of ``pip``'s
|
||||
experimental feature that may improve the speed of dependency resolution
|
||||
of the new resolver. By avoid downloading whole wheels to just
|
||||
obtain metadata, it is especially helpful when ``pip`` has to do
|
||||
heavy backtracking to resolve conflicts.
|
||||
|
||||
Thanks to :pip:`Chris Hunt's review on GH-8537 <8532#discussion_r453990728>`,
|
||||
my mentor Pradyun Gedam and I worked out a less hacky approach to inteject
|
||||
the call to lazy wheel during the resolution process. A new PR :pip:`8588`
|
||||
was filed to implement it—I could have *just* worked on top of the old PR
|
||||
and rebased, but my ``git`` skill is far from gud enough to confidently do it.
|
||||
|
||||
Testing this one has been a lot of fun though. At first, integration tests
|
||||
were added as a rerun of the tests for the new resolver, with an additional flag
|
||||
to use feature ``fast-deps``. It indeed made me feel guilty towards Travis_,
|
||||
who has to work around 30 minutes more every run. Per Chris Hunt's suggestion,
|
||||
in the new PR, I instead write a few functional tests for the area relating
|
||||
the most to the feature, namely ``pip``'s subcommands ``wheel``,
|
||||
``download`` and ``install``.
|
||||
|
||||
It was also suggested that a mock server with HTTP range requests support
|
||||
might be better (in term of performance and reliablilty) than for testing.
|
||||
However, :pip:`I have yet to be able to make Werkzeug do it
|
||||
<8584#issuecomment-659227702>`.
|
||||
|
||||
Why did I say I'm half way there? With the parallel utilities merged and a way
|
||||
to quickly get the list of distribution to be downloaded being really close,
|
||||
what left is *only* to figure out a way to properly download them in parallel.
|
||||
With no distribution to be added during the download progress, the model of this
|
||||
will fit very well with the architecture in `my original proposal`_.
|
||||
A batch downloader can be implemented to track the progress of each download
|
||||
and thus report them cleanly as e.g. progress bar or percentage. This is
|
||||
the part I am second-most excited about of my GSoC project this summer
|
||||
(after the synchronization of downloads written in my proposal, which was then
|
||||
superseded by ``fast-deps``) and I can't wait to do it!
|
||||
|
||||
The Side Quests
|
||||
---------------
|
||||
|
||||
As usual, I make sure that I complete every side quest I see during the journey:
|
||||
|
||||
* :pip:`8568`: Declare constants in ``configuration.py`` as such
|
||||
* :pip:`8571`: Clean up ``Configuration.unset_value``
|
||||
and nit the class' ``__init__``
|
||||
* :pip:`8578`: Allow verbose/quite level
|
||||
to be specified via config file and env var
|
||||
* :pip:`8599`: Replace tabs by spaces for consistency
|
||||
|
||||
Snap Back to Reality
|
||||
--------------------
|
||||
|
||||
A bit about me, I actually walked 500 meters earlier today to a bank
|
||||
and walked 500 more to another to prepare my Visa card for purchasing
|
||||
the upcoming Pinephone prototype. It's one of the first smartphones
|
||||
to fully support a GNU/Linux distribution, where one can run desktop apps
|
||||
(including proper terminals) as well as traditional services like SSH,
|
||||
HTTP server and IPFS node because why not? Just a few hours ago,
|
||||
I pre-ordered the `postmarketOS community edition`_ with additional hardware
|
||||
for convergence.
|
||||
|
||||
If you did not come here for a Pinephone ad, please take my apologies though d-;
|
||||
and to ones reading this, I hope you all can become the person who walks
|
||||
a thousand miles to fall down at the door opening to all
|
||||
what you ever wished for!
|
||||
|
||||
.. _Travis: https://travis-ci.com/
|
||||
.. _my original proposal:
|
||||
https://blogs.python-gsoc.org/media/proposals/pip-parallel-dl.pdf
|
||||
.. _postmarketOS community edition:
|
||||
https://postmarketos.org/blog/2020/07/15/pinephone-ce-preorder/
|
|
@ -0,0 +1,44 @@
|
|||
Sorting Things Out
|
||||
==================
|
||||
|
||||
Hi! I really hope that everyone reading this is still doing okay,
|
||||
and if that isn't the case, I wish you a good day!
|
||||
|
||||
``pip`` 20.2 Released!
|
||||
----------------------
|
||||
|
||||
Last Wednesday, ``pip`` 20.2 was released, delivering the ``2020-resolver``
|
||||
as well as many other improvements! I was lucky to be able
|
||||
to get the ``fast-deps`` feature to be included as part of the release.
|
||||
A brief description of this *experimental* feature as well as testing
|
||||
instruction can be found on `Python Discuss`_.
|
||||
|
||||
The public exposure of the feature also remind me of some further
|
||||
:pip:`optimization <8681>` to make on :pip:`the lazy wheel <8670>`.
|
||||
Hopefully without download parallelization it would not be too slow
|
||||
to put off testing by concerned users of ``pip``.
|
||||
|
||||
Preparation for Download Parallelization
|
||||
----------------------------------------
|
||||
|
||||
As of this moment, we already have:
|
||||
|
||||
* :pip:`Multithreading pool fallback working <8162#issuecomment-667504162>`
|
||||
* An opt-in to use lazy wheel to optain dependency information,
|
||||
and thus getting a list of wheels at the end of resolution
|
||||
ready to be downloaded together
|
||||
|
||||
What's left is *only* to interject a parallel download somewhere after
|
||||
the dependency resolution step. Still, this struggles me way more than
|
||||
I've ever imagined. I got so stuck that I had to give myself a day off
|
||||
in the middle of the week (and study some Rust), then I came up with
|
||||
:pip:`something what was agreed upon as difficult to maintain <8638>`.
|
||||
|
||||
Indeed, a large part of this is my fault, for not communicating the design
|
||||
thoroughly with ``pip``'s maintainers and not carefully noting stuff down
|
||||
during (verbal) discussions with my mentor. Thankfully :pip:`Chris Hunt
|
||||
came to the rescue <8685>` and did a refactoring that will make my future work
|
||||
much easier and cleaner.
|
||||
|
||||
.. _Python Discuss:
|
||||
https://discuss.python.org/t/announcement-pip-20-2-release/4863/2
|
|
@ -0,0 +1,47 @@
|
|||
Parallelizing Wheel Downloads
|
||||
=============================
|
||||
|
||||
.. epigraph::
|
||||
|
||||
| And now it's clear as this promise
|
||||
| That we're making
|
||||
| Two progress bars into one
|
||||
|
||||
Hello there! It has been raining a lot lately and some mosquito has given me
|
||||
the Dengue fever today. To whoever reading this, I hope it would never happen
|
||||
to you.
|
||||
|
||||
Download Parallelization
|
||||
------------------------
|
||||
|
||||
I've been working on ``pip``'s download parallelization for quite a while now.
|
||||
As distribution download in ``pip`` was modeled as a lazily evaluated iterable
|
||||
of chunks, parallelizing such procedure is as simple as submitting routines
|
||||
that write files to disk to a worker pool.
|
||||
|
||||
Or at least that is what I thought.
|
||||
|
||||
Progress Reporting UI
|
||||
---------------------
|
||||
|
||||
``pip`` is currently using customly defined progress reporting classes,
|
||||
which was not designed to working with multithreading code. Firstly, I want to
|
||||
try using these instead of defining separate UI for multithreaded progresses.
|
||||
As they use system signals for termination, one must the progress bars has to be
|
||||
running the main thread. Or sort of.
|
||||
|
||||
Since the progress bars are designed as iterators, I realized that we
|
||||
can call ``next`` on them. So quickly, I throw in some queues and locks,
|
||||
and prototyped the first *working* :pip:`implementation of
|
||||
progress synchronization <8771>`.
|
||||
|
||||
Performance Issues
|
||||
------------------
|
||||
|
||||
Welp, I only said that it works, but I didn't mention the performance,
|
||||
which is terrible. I am pretty sure that the slow down is with
|
||||
the synchronization, since the ``map_multithread`` call doesn't seem
|
||||
to trigger anything that may introduce any sort of blocking.
|
||||
|
||||
This seems like a lot of fun, and I hope I'll get better tomorrow
|
||||
to continue playing with it!
|
|
@ -0,0 +1,44 @@
|
|||
First Check-In
|
||||
==============
|
||||
|
||||
Hi everyone, I am McSinyx, a Vietnamese undergraduate student
|
||||
who loves `free software`_. This summer I am working with
|
||||
the maintainers and the contributors of ``pip`` to make
|
||||
the package manager :pip:`download in parallel <825>`.
|
||||
|
||||
What did I do during the community bonding period?
|
||||
--------------------------------------------------
|
||||
|
||||
Aside from bonding with ``pip``'s maintainers and contributors as well as
|
||||
with my mentors, I was also experimenting on the theoretical and technical
|
||||
obstacles blocking this GSoC project. Pradyun Gedam (a mentor of mine)
|
||||
suggested making `a proof of concept`_ to determine if parallel downloading
|
||||
can play nicely with ResolveLib_'s abstraction and we are reviewing it
|
||||
together. On the technical side, we ``pip``'s committers are exploring
|
||||
:pip:`available options for parallelization <8169>` and I made an attempt to
|
||||
:pip:`make use of Python's standard worker pool in a portable way <8320>`.
|
||||
|
||||
Did I get stuck anywhere?
|
||||
-------------------------
|
||||
|
||||
Yes, of course! Neither of the experiments above is finished as of
|
||||
this moment. Though, I am optimistic that the issues will not be
|
||||
real blockers and we will figure that out in the next few days.
|
||||
|
||||
What is coming up next?
|
||||
-----------------------
|
||||
|
||||
As planned, this week I am going to refactor the package downloading code
|
||||
in ``pip``. The main purpose is to decouple the networking code from
|
||||
the package preparation operation and make sure that it is thread-safe.
|
||||
|
||||
In addition, I am also continuing mentioned experiments to have a better
|
||||
confidence on the future of this GSoC project.
|
||||
|
||||
To other GSoC students, mentors and admins reading this, I am wishing
|
||||
you all good health and successful projects this summer!
|
||||
|
||||
.. _free software: https://www.gnu.org/philosophy/free-sw.html
|
||||
.. _a proof of concept:
|
||||
https://gist.github.com/McSinyx/513dbff71174fcc79f1cb600e09881af
|
||||
.. _ResolveLib: https://pypi.org/project/resolvelib/
|
|
@ -0,0 +1,44 @@
|
|||
Second Check-In
|
||||
===============
|
||||
|
||||
Hi everyone and may the odds ever in your favor, especially during this
|
||||
tough time!
|
||||
|
||||
What did I do last week?
|
||||
------------------------
|
||||
|
||||
Not as much I wished, apparently (-:
|
||||
|
||||
* Finalizing :pip:`the refactoring patch <8411>`
|
||||
of ``operations.prepare.prepare_linked_requirement``
|
||||
* :pip:`Nitpicking some logging calls <8423>`. This (as well as the next one)
|
||||
was to fill up the time my brain not being as productive as I want it to XD
|
||||
* :pip:`Beginning to migrate <8423>` from ``%``- to ``{}``-style logging.
|
||||
The amount of tests failing due to this was way beyond my imagination,
|
||||
but I got functional tests for ``pip install`` and unit tests passing now!
|
||||
* :pip:`Mocking up a working partial wheel download during
|
||||
dependency resolution <8442>` for `the new resolver`_.
|
||||
|
||||
Did I get stuck anywhere?
|
||||
-------------------------
|
||||
|
||||
Yes, of course! :pip:`Parallel maps <8320>` are still stalling
|
||||
as well as other small PRs listed above. The failure related to
|
||||
``logging`` are still making me pulling my hair out and the proof of
|
||||
concept for partial wheel downloading is too ugly even for a PoC.
|
||||
I imagine that I will have a lot of clean up to do this week (yay!).
|
||||
|
||||
What is coming up next?
|
||||
-----------------------
|
||||
|
||||
I'm trying get the multi-{threading,processing} facilities merged ASAP
|
||||
to start rolling it out in practice. The first thing popping out of my
|
||||
head is to get back :pip:`the multi-threaded <7962>` ``pip list -o``.
|
||||
|
||||
The other experimental improvement (this phrase does not sound right!)
|
||||
I would like to get done is the partial wheel download. It would be
|
||||
really nice if I can get both included as ``unstable-feature``'s
|
||||
in :pip:`the upcoming beta release of pip 20.2 <7628#issuecomment-636319539>`.
|
||||
|
||||
.. _the new resolver:
|
||||
http://www.ei8fdb.org/thoughts/2020/05/test-pips-alpha-resolver-and-help-us-document-dependency-conflicts/
|
|
@ -0,0 +1,43 @@
|
|||
Third Check-In
|
||||
==============
|
||||
|
||||
Holla, holla, holla! Last seven days has not been a really productive week
|
||||
for me, though I think there are still some nice things to share with
|
||||
you all here! The good news is that I've finish my last leçon as a somophore,
|
||||
the bad news is that I have a bunch of upcoming tests, mainly in the form
|
||||
of group projects and/or presentation (phew!). Enough about me,
|
||||
let's get back to ``pip``:
|
||||
|
||||
What did I do last week?
|
||||
------------------------
|
||||
|
||||
Not much, actually )-:
|
||||
|
||||
* Write some tests for :pip:`the HTTP range mapping for wheel <8467>`.
|
||||
* :pip:`Try to bring back <8504>` multithreaded ``pip list --outdated``
|
||||
and ``--uptodate``, as :pip:`the parallel <8320>` ``map`` was merged
|
||||
earlier today.
|
||||
* Nitpick :pip:`8332`
|
||||
(yep it's a new low for me to include this to the list (-:).
|
||||
|
||||
Did I get stuck anywhere?
|
||||
-------------------------
|
||||
|
||||
Not exactly, since I didn't do much d-; `Many of my PRs`_ are stalling though.
|
||||
On one hand the maintainers of ``pip`` are all volunteers working in
|
||||
their free time, on the other hand I don't think I have tried hard enough
|
||||
to get their attention on my PRs.
|
||||
|
||||
What is coming up next?
|
||||
-----------------------
|
||||
|
||||
I'll try my best getting the following merged upstream before
|
||||
:pip:`the upcoming beta release <8206>`:
|
||||
|
||||
* Parallel networking for ``pip list``: :pip:`8504`
|
||||
* Lazy wheel for dependency information: :pip:`8467`, :pip:`8411`
|
||||
(to determine if hashing is required) and :pip:`a new patch introducing this
|
||||
as an unstable feature <8467#issuecomment-648717032>`
|
||||
|
||||
.. _Many of my PRs:
|
||||
https://github.com/pulls?q=is:open+is:pr+author:McSinyx+repo:pypa/pip+sort:updated-desc
|
|
@ -0,0 +1,33 @@
|
|||
Fourth Check-In
|
||||
===============
|
||||
|
||||
Hello there! I'm having my second year's last exam tomorrow,
|
||||
but it `feels like summer`_ already! I've been finalizing quite a few things
|
||||
to get them ready for pip 20.2b2.
|
||||
|
||||
What did I do last week?
|
||||
------------------------
|
||||
|
||||
I've spent most of the time on getting :pip:`the opt-in <8532>` for obtaining
|
||||
dependency information via lazy wheels ready. It will be available as
|
||||
``--use-feature=fast-deps`` and only has effect when
|
||||
``--use-feature=2020-resolver`` also presents.
|
||||
|
||||
While waiting for reviews and suggestions, I made some patches for
|
||||
internal cleansing, namely :pip:`8568`, :pip:`8571` and :pip:`8578`.
|
||||
Some of the similar patches I made earlier were also merged last week:
|
||||
:pip:`8456` and :pip:`8538`.
|
||||
|
||||
Did I get stuck anywhere?
|
||||
-------------------------
|
||||
|
||||
Not really, everything was going as expected for me.
|
||||
|
||||
What is coming up next?
|
||||
-----------------------
|
||||
|
||||
After :pip:`8532`, I'll work on the parallel download of the postponed wheels.
|
||||
My main current concern is with how the download progress will be reported
|
||||
to the users, but I think I'll figure it out soon.
|
||||
|
||||
.. _feels like summer: https://www.youtube.com/watch?v=F1B9Fk_SgI0
|
|
@ -0,0 +1,35 @@
|
|||
Fifth Check-In
|
||||
==============
|
||||
|
||||
Hello and I hope y'all are still doing well!
|
||||
|
||||
What did I do last week?
|
||||
------------------------
|
||||
|
||||
I was not really productive last week—most of the following tickets are fillers
|
||||
to make use of the spare cycles I had when I was still trying to figure out
|
||||
the way to implement the main work.
|
||||
|
||||
* Finalize the ``--use-feature=fast-deps`` flag (:pip:`8588`)
|
||||
* Improve mocking of environment variables in the test suit (:pip:`8614`)
|
||||
* Finalize the fix for verbose/quiet options specified via
|
||||
configuration files and environment variables (:pip:`8578`)
|
||||
* Clean up a tiny bit in the resolver internal API (:pip:`8629`)
|
||||
* Start working on seperating the download of wheels
|
||||
from dependency resolution (:pip:`8638`)
|
||||
|
||||
Did I get stuck anywhere?
|
||||
-------------------------
|
||||
|
||||
I'm struggling on refactoring the code to support separate download.
|
||||
``pip``'s codebase was not intended for this and thus there are
|
||||
many execution paths and other details entangled around the relevant area.
|
||||
|
||||
What is coming up next?
|
||||
-----------------------
|
||||
|
||||
``pip`` 20.2 is going to be released within the next few days with
|
||||
``--use-feature=fast-deps`` included and I'm mentally prepare to fix
|
||||
any undiscovered problem. At the same time, I will continue working
|
||||
on :pip:`8638` and hopefully get it done soon enough to begin drafting
|
||||
download parallelization strategies, mostly with the UI.
|
|
@ -0,0 +1,31 @@
|
|||
Sixth Check-In
|
||||
==============
|
||||
|
||||
Hello there!
|
||||
|
||||
What did I do last week?
|
||||
------------------------
|
||||
|
||||
It has been a quite fun week for me, given the current state of
|
||||
development and the newly dicovered bugs thanks to pip 20.2 release:
|
||||
|
||||
* Initiate discussion with the maintainers of pip on isolating
|
||||
networking code for late download in parallel (:pip:`8697`)
|
||||
* Discuss the UI of parallel download (:pip:`8698`)
|
||||
* Log debug information relating lazy wheel decision (:pip:`8710`)
|
||||
* Disable caching for range requests (:pip:`8716`)
|
||||
* Dedent late download logs (:pip:`8722`)
|
||||
* Add a hook for batch downloading (third attempt I think) (:pip:`8737`)
|
||||
* Test hash checking for fast-deps (:pip:`8743`)
|
||||
|
||||
Did I get stuck anywhere?
|
||||
-------------------------
|
||||
|
||||
Not exactly, everything is going smoothly and I'm feeling awesome!
|
||||
|
||||
What is coming up next?
|
||||
-----------------------
|
||||
|
||||
I'll try to solve :pip:`8697` and :pip:`8698` within the next few days.
|
||||
I am optimistic that the parallel download prototype will be done
|
||||
within this week.
|
|
@ -0,0 +1,24 @@
|
|||
Final Check-In
|
||||
==============
|
||||
|
||||
Hello there!
|
||||
|
||||
What did I do last week?
|
||||
------------------------
|
||||
|
||||
Not much, but seemingly implementation-wise I have finished my GSoC project:
|
||||
|
||||
* Finish the implementation of wheels' parallel download (:pip:`8771`)
|
||||
* Help make ``pip``'s CI green again (:pip:`8790`)
|
||||
* Reformat a few spots in user guide (:pip:`8795`)
|
||||
|
||||
Did I get stuck anywhere?
|
||||
-------------------------
|
||||
|
||||
I got sick, but I am recovering now!
|
||||
|
||||
What is coming up next?
|
||||
-----------------------
|
||||
|
||||
I will try to spend the time I got left within the scope of GSoC
|
||||
to :pip:`improve cache usage of the fast-deps feature <8720>`.
|
Binary file not shown.
After Width: | Height: | Size: 40 MiB |
|
@ -0,0 +1,21 @@
|
|||
:orphan:
|
||||
|
||||
Google Summer of Code 2020
|
||||
==========================
|
||||
|
||||
.. toctree::
|
||||
:titlesonly:
|
||||
|
||||
checkin20200601
|
||||
blog20200609
|
||||
checkin20200615
|
||||
blog20200622
|
||||
checkin20200629
|
||||
blog20200706
|
||||
checkin20200713
|
||||
blog20200720
|
||||
checkin20200727
|
||||
blog20200803
|
||||
checkin20200810
|
||||
blog20200817
|
||||
checkin20200824
|
Binary file not shown.
After Width: | Height: | Size: 43 KiB |
Binary file not shown.
After Width: | Height: | Size: 136 KiB |
|
@ -5,7 +5,7 @@ I occasionally blog about elementary mathematics
|
|||
under the view of a functional programer who loves anonymous functions:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:titlesonly:
|
||||
|
||||
conseq
|
||||
system
|
||||
|
|
Loading…
Reference in New Issue