2
0
Fork 0
mirror of git://git.savannah.gnu.org/guix/data-service.git synced 2023-12-14 03:23:03 +01:00
Commit graph

1258 commits

Author SHA1 Message Date
Christopher Baines e39c9da028 Store the distribution of derivations related to packages
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.

The data looks like this (for a specified system and target):

┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│    15 │     2 │
│    14 │     3 │
│    13 │     3 │
│    12 │     3 │
│    11 │    14 │
│    10 │    25 │
│     9 │    44 │
│     8 │    91 │
│     7 │  1084 │
│     6 │   311 │
│     5 │   432 │
│     4 │   515 │
│     3 │   548 │
│     2 │  2201 │
│     1 │ 21162 │
│     0 │ 22310 │
└───────┴───────┘

Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.

When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.

When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
2023-03-09 08:29:39 +00:00
Christopher Baines 55059558e7 Avoid logging when there are no processes to wait for 2023-03-09 08:24:01 +00:00
Christopher Baines e9ccb66225 Fix counting derivations when there are lots 2023-03-09 08:17:14 +00:00
Christopher Baines a6f239fb8a Add an extra index on blocked_builds
I think this might help with queries that don't use the build_server_id.
2023-03-05 10:21:14 +00:00
Christopher Baines 7e7cd3f5a9 Tweak the comparision query
This seems to generate better plans.
2023-03-05 10:18:30 +00:00
Christopher Baines 784c1da9ae Remove peek 2023-03-05 10:18:22 +00:00
Christopher Baines bf41c6ebb1 Set current-guix-package when computing system test derivations
This is a bit ugly, but might speed up computing derivations for system tests.
2023-02-28 10:51:51 +00:00
Christopher Baines 2d96fbff48 Speed up deleting blocked_builds entries 2023-02-27 22:52:43 +00:00
Christopher Baines 1bce38a69d Move the delete-unreferenced-derivations advisory lock
To better prevent two processes running at the same time.
2023-02-27 22:48:54 +00:00
Leo Famulari 8c2f97eef8 Comparing package derivations: Fix inconsistent verbage.
Harmonize "Build change" options between the selection menu and the
documentation

* guix-data-service/web/compare/html.scm (compare/package-derivations):
Replace "Still broken" with "Still failing" in the "Build change" help text.

Signed-off-by: Christopher Baines <mail@cbaines.net>
2023-02-17 17:10:52 +00:00
Christopher Baines f68822cad2 Include some useful numbers on the package derivations comparison
As it's frequently useful to know how many packages/builds some change has
affected.
2023-02-15 15:48:45 +00:00
Christopher Baines 1266d3d336 Remove redundant postgresql connection when deleting derivations 2023-02-14 20:59:21 +00:00
Christopher Baines ebbcf36dc4 Delete blocked_builds entries when deleting derivations 2023-02-14 20:10:44 +00:00
Christopher Baines 5874c4ee37 Delete git_branches entries
When deleting data for a branch.
2023-02-14 19:57:30 +00:00
Christopher Baines 9872367c01 Avoid errors dropping partition tables if they don't exist 2023-02-13 20:10:23 +00:00
Christopher Baines 078516e0ab Improve dropping package_derivation_by_guix_revision_range partitions 2023-02-13 19:26:44 +00:00
Christopher Baines 6be113f99d Adjust render procedures to not use procedures for responses
The newer Guile Fibers web server will use the chunked transfer encoding when
a procedure is used and the content length is unspecified. This is good for
large responses, but unnecessary here. Also, there's a bug with the charset so
these changes to respond with correctly encoded bytevectors to avoid that.
2023-02-09 11:49:41 +00:00
Christopher Baines 0ce5af2c59 Tweak behaviour when the response body is a procedure
Newer versions of Guile Fibers will now use chunked encoding when a procedure
is used (and no content length is set). This is good, but not always what is
wanted, and there's also an issue with the port encoding.

This commit switches to responding with a string/bytevector when more
appropriate, plus explicitly setting the port encoding where that's needed.
2023-02-09 10:39:24 +00:00
Christopher Baines 3ba8418656 Allow skipping processing system tests
Generating system test derivations are difficult, since you generally need to
do potentially expensive builds for the system you're generating the system
tests for. You might not want to disable grafts for instance because you might
be trying to test whatever the test is testing in the context of grafts being
enabled.

I'm looking at skipping the system tests on data.guix.gnu.org, because they're
not used and quite expensive to compute.
2023-02-08 14:56:48 +00:00
Christopher Baines 9e9fc1ba04 Skip some metrics that apply for each branch
As data.qa.guix.gnu.org has lots of branches and 100,000+ metrics, and this is
causing Prometheus to time out fetching the metrics.

I'm not sure there's much value in these metrics, so cut them out for now.
2023-02-02 15:04:20 +01:00
Christopher Baines d29ef3ed9b Enable database connection instrumentation in the server 2023-01-01 12:43:19 +00:00
Christopher Baines 05c437d26a Support instrumenting the number of database connections
Since this is now quite dynamic, it's useful to have a metric for it.
2023-01-01 12:43:06 +00:00
Christopher Baines 926cb2a5e1 Pull the metrics registry out of the controller
This will allow for instrumenting low level database functionality, before
anything starts using the database.
2023-01-01 12:27:34 +00:00
Christopher Baines 7b69611755 Expose metrics from pg_stats 2022-11-29 11:09:55 +00:00
Christopher Baines 9f05f5f4f9 Only sometimes attempt manually computing n_distinct values
For derivation_inputs.
2022-11-29 10:36:46 +00:00
Christopher Baines 6ada1cb845 Guard against divide by 0 in update-derivation-outputs-statistics 2022-11-28 13:17:20 +00:00
Christopher Baines 1a0c5599eb Do derivation inputs and outputs housekeeping at the end of each job
This should help with query performance, as the recursive queries using
derivation_inputs and derivation_outputs are particularly sensitive to the
n_distinct values for these tables.
2022-11-28 11:36:12 +00:00
Christopher Baines 38b3657013 Use advisory locks to avoid deadlocks during data deletion
In the case where multiple data deleting processes end up running at the same
time.
2022-11-28 10:26:46 +00:00
Christopher Baines ad93a780d3 Improve the inferior cleanup when computing package derivations 2022-11-24 12:37:49 +00:00
Christopher Baines ff6f87a3b9 Skip the derivation linter
And remove the chunking of derivation lint warnings.

The derivation linter computes the derivation for each packages supported
systems, but there are two problems with the approach. By doing this for each
package in turn, it forces inefficient uses of caches, since most of the
cached data is only relevant to a single system. More importantly though,
because the work of checking one package is dependent on it's supported
systems, it's unpredictable how much work will happen, and this will tend to
increase as more packages support more systems.

I think especially because of this last point, it's not worth attempting to
keep running the derivation linter at the moment, because it doesn't seem
sustainable. I can't see an way to run it that's futureproof and won't break
at some point in the future when packages in Guix support more systems.
2022-11-24 12:37:49 +00:00
Christopher Baines 7ae1c97b92 Drop the thread pool idle seconds
To hopefully bring down the memory usage from idle connections.
2022-11-24 12:37:45 +00:00
Christopher Baines 2cf187f10b Fix calling insert-blocked-builds 2022-11-20 15:44:30 +00:00
Christopher Baines e87a8124bf Render a branch not found page if the branch doesn't exist 2022-11-19 10:07:17 +00:00
Christopher Baines 91f0fbdeb5 Fix quasiquoting 2022-11-19 10:06:55 +00:00
Christopher Baines 205f020950 Better guard against exceptions in the build event handlers 2022-11-19 09:46:16 +00:00
Christopher Baines ca1e4819b6 Fix closing thread postgresql connections 2022-11-17 16:32:04 +00:00
Christopher Baines 9fc5821180 Include more information about invalid query parameters
In the /compare response.

This should enable qa.guix.gnu.org to detect when the base revision for a
comparison is unknown.
2022-11-17 16:18:28 +00:00
Christopher Baines cc61bb5f13 Drop the chunk size when gathering lint warnings
To try and bring the peak memory usage down.
2022-11-14 09:26:59 +00:00
Christopher Baines ab7df4c6e5 Include blocked_builds information in comparison responses
This will make it easier to tell when a scheduled build is yet to start, and
can't start due to a missing dependency.
2022-11-14 09:24:49 +00:00
Christopher Baines 8294accffe Remove Build status field from blocking builds page
As this is unused.
2022-11-12 11:54:50 +00:00
Christopher Baines c46ee47632 Make backfilling blocked_builds a bit smarter
And drop the chunk size.
2022-11-12 11:53:14 +00:00
Christopher Baines ed114265cd Handle deleting from blocked_builds when builds are scheduled
As scheduling a build might unblock others.
2022-11-12 11:42:26 +00:00
Christopher Baines b9305d81a4 View scheduled builds like succeeded builds in terms of blocking
This means that an output is viewed to not be blocking if it has a scheduled
build, just as if it has a succeeded build. Also, scheduling builds will
unblock blocked builds.

This is helpful as it means that it reduces noise for blocking builds.
2022-11-12 11:33:37 +00:00
Christopher Baines 7731c6f340 Tweak backfilling the blocked builds 2022-11-12 10:57:53 +00:00
Christopher Baines 8e9ab68d14 Use latest_build_status rather than build_status
In various places in the blocked-builds module.
2022-11-12 10:57:26 +00:00
Christopher Baines a34bf4defc Spawn specific PostgreSQL connections for the blocked_builds updates
So that the queries don't get cancelled by the statement timeout.
2022-11-12 10:46:46 +00:00
Christopher Baines fc5f562731 Add index on derivation_outputs id and derivation_id fields
This might help with doing recursive queries on the derivations graph.
2022-11-12 10:42:04 +00:00
Christopher Baines fb9d99a076 Add extended statistics on package_derivations
This helps row count estimates when filtering on system_id and target.
2022-11-12 10:40:43 +00:00
Christopher Baines 48d8ee885a Have insert-blocked-builds cache when the partitions exist
To make it more efficient.
2022-11-11 11:29:45 +00:00
Christopher Baines 0f22e3ab40 Rework insert-blocked-builds to make it more efficient
This also fixes a typo in the partition name.
2022-11-11 11:29:37 +00:00