There's an issue where sometimes for i686-linux and armhf-linux, only a few
package derivations are computed.
This commit tries to simplify the code, and adds some conditional logging for
the guix package, which might help reveal what's going on.
Now that squee cooperates with suspendable ports, this is unnecessary. Use a
connection pool to still support running queries in parallel using multiple
connections.
Currently I'm seeing failures due to guile-gnutls not supporting suspendable
ports (write_wait_fd), so batch the requested outputs to try and avoid this.
In to two thread pools, a default one, and one reserved for essential
functionality.
There are some pages that use slow queries, so this should help stop those
pages block other operations.
Use the new approach of looking up the distribution of the derivations, and
building a non recursive query specifically for this revision. This should
avoid PostgreSQL picking a poor plan for performing the query.
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.
The data looks like this (for a specified system and target):
┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│ 15 │ 2 │
│ 14 │ 3 │
│ 13 │ 3 │
│ 12 │ 3 │
│ 11 │ 14 │
│ 10 │ 25 │
│ 9 │ 44 │
│ 8 │ 91 │
│ 7 │ 1084 │
│ 6 │ 311 │
│ 5 │ 432 │
│ 4 │ 515 │
│ 3 │ 548 │
│ 2 │ 2201 │
│ 1 │ 21162 │
│ 0 │ 22310 │
└───────┴───────┘
Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.
When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.
When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.