This makes get_csv_rows_for_installed simpler, because it is not
modifying its arguments. We can also more easily refactor RECORD file
reading since it is now decoupled from getting the installed RECORD file
rows.
Reducing the scope of variables reduces possible dependencies between
parts of this function, and will make it easier to extract this section
into its own function.
This reduces our dependence on the files being extracted to the
filesystem.
Compare the name extraction to the similar code in
`utils.wheel.wheel_dist_info_dir`.
We don't need to give `.data` directories the same strict
treatment (yet) because it isn't inconvenient if there happen
to be multiple of them in a single Wheel file.
Currently we do processing in `get_entrypoints` so incoming text is more compatible
with `pkg_resources`. It turns out that `pkg_resources` is already doing the same normalization,
so we can omit it.
This simplifies `get_entrypoints`, opening the way for us to pass it a plain string instead
of a file path.
These comments are relevant to this function, since it is long
overdue for refactoring. This code isn't special in that regard, and we
should feel free to consider any piece of code eligible to be broken up
or put into a class. So we remove these comments in fairness to the rest
of the code, and to remove a distraction during upcoming code reviews.
This mainly deals with correctly recording the wheel content in the
RECORD metadata. This metadata file must be written in UTF-8, but the
actual files need to be installed to the filesystem, the encoding of
which is not (always) UTF-8. So we need to carefully handle file name
encoding/decoding when comparing RECORD entries to the actual file.
The fix here makes sure we always use the correct encoding by adding
strict type hints. The entries in RECORD is decoded/encoded with UTF-8
on the read/write boundaries to make sure we always deal with text
types. A type-hint-only type RecordPath is introduced to make sure this
is enforced (because Python 2 "helpfully" coerces str to unicode with
the wrong encoding).
In order to parse metadata from wheel files directly we want to reuse
parse_wheel. Moving it out helps avoid creating an unnecessary
dependence on operations.install.wheel.
This functions as a guard for the rest of our wheel-handling code,
ensuring that we will only get past this point if we have a wheel that
we should be able to handle version-wise.
We return a tuple instead of bundling up the result in a dedicated type
because it's the simplest option. The interface will be easy to update
later if the need arises.
First example of transitioning a directory-aware function to using a
zipfile directly. Since we will not need to maintain the unpacked dir
going forward, we don't need to worry about making wheel_dist_info_dir
"generic", just that the same tests pass for both cases at each commit.
To do this neatly we use pytest.fixture(params=[...]), which
generates a test for each param. Once we've transitioned the
necessary functions we only need to replace the fixture name and remove
the dead code.
Since retrieval of the .dist-info dir already ensures that a
distribution is found, this reduces responsibility on wheel_metadata and
lets us remove a few tests already covered by the tests for
test_wheel_dist_info_dir_*.
This will make it easier to transition to the already-determined
dist-info directory and reduces some of our dependence on pkg_resources.
Despite the name, the `egg_info` member is also populated for
.dist-info dirs.
ensure_str uses encoding='utf-8' and errors='strict' for Python 3
by default, which matches the behavior in
`pkg_resources.NullProvider.get_metadata`.
This will let us re-use the wheel_metadata for other parts of
processing, and by parameterizing checks in terms of metadata we will be
able to substitute in metadata derived directly from the zip.
* Raise exception on exception in finding wheel dist
We plan to replace this code with direct extraction from a zip, so no
point catching anything more precise.
* Raise exception if no dist is found in wheel_version
* Catch file read errors when reading WHEEL
get_metadata delegates to the underlying implementation which tries
to locate and read the file, throwing an IOError (Python 2) or OSError
subclass on any errors.
Since the new explicit test checks the same case as brokenwheel in
test_wheel_version we remove the redundant test.
* Check for WHEEL decoding errors explicitly
This was the last error that could be thrown by get_metadata, so we can
also remove the catch-all except block.
* Move WHEEL parsing outside try...except
This API does not raise an exception, but returns any errors on the
message object itself. We are preserving the original behavior, and can
decide later whether to start warning or raising our own exception.
* Raise explicit error if Wheel-Version is missing
`email.message.Message.__getitem__` returns None on missing values, so
we have to check for ourselves explicitly.
* Raise explicit exception on failure to parse Wheel-Version
This is also the last exception that can be raised, so we remove
`except Exception`.
* Remove dead code
Since wheel_version never returns None, this exception will never be
raised.
* Edit subdirs of top-level instead of checking in each directory
Previously, we were checking whether the top of the relative path ended
with .data. Now, we do not recurse into those directories, so there's no
need to check every time.
* Store info_dir in separate variable
Instead of working with a list everywhere, we use the single info_dir.
* Separate variables for info_dir and the destination path
* Use destination .dist-info dir only when needed
By initially storing just the name of the folder we ensure our code is
agnostic to the destination, so it'll be easier to install from a zip
later.
* Use os.listdir instead of os.walk for wheel dir population
Since we only execute any code when basedir == '', we only need the
top-level directories.
* Inline data_dirs calculation
* Inline info_dirs calculation
Previously we were restricting to a single .dist-info directory anywhere
in the unpacked wheel directory. That was incorrect since only a
top-level .dist-info directory indicates a contained "package". Now we
limit our restriction to top-level .dist-info directories.
This aligns with the previous behavior that would have enforced the
found .dist-info directory starting with the name of the package.
We raise UnsupportedWheel because it looks better in output than the
AssertionError (which includes traceback).