Internal youtube support and the youtube-dl extension both cause the
youtube feed URL to be fetched three times per update. Caching the feed
data from feedcore allows internal support to only load the feed once.
The lru_cache removes one of the youtube-dl fetches, not perfect, but
two is better than three. I saw a 40% decrease in update times when
using the internal youtube code.
Throwing an exception from get_channel_id_url() prevents get_cover() and
get_channel_desc() from attempting to fetch a None URL, and provides
more accurate errors.
The lru_cache on get_youtube_id() saves 1ms per youtube channel when
updating. Which adds up with a lot of channels, and might be more on
slower devices.
The text may contain "};" patterns inside the initial player response.
And the non-greedy regex would fail to match the entire IPR, causing it
to fail during JSON decoding.
The duration is not available in the youtube feed and requires a request
for each episode. It was set when downloading but the user had no way of
knowing how long an episode was before downloading it. Live streams do
not have a duration until they end, and remain blank until downloaded or
the next update.
This will increase the time it takes to update feeds, with new
subscriptions possibly taking 16x longer to update due to the 16
requests vs the previous single request. Updating existing feeds
will only have an additional request for each new episode.
This fixes a bug where the feed URL is parsed to the channel URL,
potentially causing issues subscribing to feed URLs, but causing gpo
failures when enabling or disabling youtube channels.
The best, bestvideo and bestaudio formats can be added to
`preferred_fmt_ids` for use by the Youtube-DL extension with
manage_downloads. Hoever, the built-in support would fail if it
encountered one of these formats. Skipping allows "Download with
Youtube-DL" to use them in the same way it allows adaptive formats to be
used.
The `get_video_info` URL no longer exists (without html5=1) and instead
of throwing a 404 not found error, http_request() was returning an empty
page which threw a no formats found error. The new requests code will
throw the correct error if this happens in the future.
The player response data is still fetched from `get_video_info` but will
fallback to the `watch` URL if `get_video_info` is eventually removed.
The `watch` URL will fail for anyone in Europe due to it redirecting to
a GDPR cookie consent page.
Error messages have been shortened by removing video ID, which could be
removed from the code in the future.
now a requests.Response is returned instead of the file-like object from urllib.
Fixed all usages of util.urlopen: it simplifies getting json, text encoding detection.
In particular feedcore (responsible for fetching feeds) is simplified.
This is a first pass and could benefit from better usage of the requests api
(Sessions for instance, to keep connection pools)
TODO: download.py