This is mostly a workaround for the occasional problems with the guix-commits
mailing list, as it can break and then the data service doesn't learn about
new revisions until the problem is fixed.
I think it's still a generally good feature though, and allows deploying the
data service without it consuming emails to learn about new revisions, and is
a step towards integrating some kind of way of notifying the data service to
poll.
Now that squee cooperates with suspendable ports, this is unnecessary. Use a
connection pool to still support running queries in parallel using multiple
connections.
In to two thread pools, a default one, and one reserved for essential
functionality.
There are some pages that use slow queries, so this should help stop those
pages block other operations.
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.
The data looks like this (for a specified system and target):
┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│ 15 │ 2 │
│ 14 │ 3 │
│ 13 │ 3 │
│ 12 │ 3 │
│ 11 │ 14 │
│ 10 │ 25 │
│ 9 │ 44 │
│ 8 │ 91 │
│ 7 │ 1084 │
│ 6 │ 311 │
│ 5 │ 432 │
│ 4 │ 515 │
│ 3 │ 548 │
│ 2 │ 2201 │
│ 1 │ 21162 │
│ 0 │ 22310 │
└───────┴───────┘
Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.
When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.
When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
I think the idle connections associated with idle threads are still taking up
memory, so especially now that you can configure an arbitrary number of
threads (and thus connections), I think it's good to close them regularly.
The server part of the guix-data-service doesn't work great as a guix service,
since it often fails to start if the migrations take any time at all.
To address this, start the server before running the migrations, and serve the
pages that work without the database, plus a general 503 response. Once the
migrations have completed, switch to the normal behaviour.
This was good in that it avoided having to deal with long running connections,
but it probably takes some time to open the connection, and these changes are
a step towards offloading the PostgreSQL queries to other threads, so they
don't block the threads for fibers.
Allow for build status information to be submitted by POST request. This
required some changes to the builds and build_status tables, as for example,
the Cuirass build id may not be available, and the derivation may not be know
yet, so just record the derivation file name.
This'll help the a service manager (like the shepherd) know when the service
is ready, which at the moment, means the database migrations have happened.
This runs Sqitch on startup, which should make managing the database easier,
as you just have to restart the service with this option, and the database
should be updated if necessary.
This is a service designed to provide information about Guix. At the
moment, this initial prototype gathers up information about packages,
the associated metadata and derivations.
The initial primary use case is to compare two different revisions of
Guix, detecting which packages are new, no longer present, updated or
otherwise different.
It's based on the Mumi project.
[1]: https://git.elephly.net/software/mumi.git