2
0
Fork 0
mirror of git://git.savannah.gnu.org/guix/data-service.git synced 2023-12-14 03:23:03 +01:00
Commit graph

82 commits

Author SHA1 Message Date
Christopher Baines 7251c7d653 Stop using a pool of threads for database operations
Now that squee cooperates with suspendable ports, this is unnecessary. Use a
connection pool to still support running queries in parallel using multiple
connections.
2023-07-10 18:56:31 +01:00
Christopher Baines 29d49ba31a Detach the database setup from the main guix-data-service process
This will allow restarting them independently, leaving it up to the operator
to ensure that all processes are compatible.
2023-06-09 16:11:06 +01:00
Christopher Baines 5c9ec28cb5 Query for outputs when build events arrive
This will keep the substitute information more up to date.
2023-06-09 16:11:06 +01:00
Christopher Baines 688f4cd79d Set request timeouts for the thread pools
The request timeout should ensure that the operations don't back up if the
thread pool is overloaded.
2023-04-27 14:58:47 +02:00
Christopher Baines 9f080524bc Split the thread pool used for database connections
In to two thread pools, a default one, and one reserved for essential
functionality.

There are some pages that use slow queries, so this should help stop those
pages block other operations.
2023-04-27 10:31:09 +02:00
Christopher Baines 519f0c6f67 Defer backfilling derivation distribution counts until later
After the migrations have run.
2023-03-09 09:39:47 +00:00
Christopher Baines e39c9da028 Store the distribution of derivations related to packages
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.

The data looks like this (for a specified system and target):

┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│    15 │     2 │
│    14 │     3 │
│    13 │     3 │
│    12 │     3 │
│    11 │    14 │
│    10 │    25 │
│     9 │    44 │
│     8 │    91 │
│     7 │  1084 │
│     6 │   311 │
│     5 │   432 │
│     4 │   515 │
│     3 │   548 │
│     2 │  2201 │
│     1 │ 21162 │
│     0 │ 22310 │
└───────┴───────┘

Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.

When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.

When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
2023-03-09 08:29:39 +00:00
Christopher Baines 3ba8418656 Allow skipping processing system tests
Generating system test derivations are difficult, since you generally need to
do potentially expensive builds for the system you're generating the system
tests for. You might not want to disable grafts for instance because you might
be trying to test whatever the test is testing in the context of grafts being
enabled.

I'm looking at skipping the system tests on data.guix.gnu.org, because they're
not used and quite expensive to compute.
2023-02-08 14:56:48 +00:00
Christopher Baines 7ae1c97b92 Drop the thread pool idle seconds
To hopefully bring down the memory usage from idle connections.
2022-11-24 12:37:45 +00:00
Christopher Baines d06230fcf4 Close postgresql connections when the thread pool thread is idle
I think the idle connections associated with idle threads are still taking up
memory, so especially now that you can configure an arbitrary number of
threads (and thus connections), I think it's good to close them regularly.
2022-10-23 11:28:37 +01:00
Christopher Baines ff77bbea7e Make it possible to increase the number of thread pool threads
And double the default to 16.
2022-10-02 15:08:18 +01:00
Christopher Baines 8e23d38660 Handle migrations and server startup better
The server part of the guix-data-service doesn't work great as a guix service,
since it often fails to start if the migrations take any time at all.

To address this, start the server before running the migrations, and serve the
pages that work without the database, plus a general 503 response. Once the
migrations have completed, switch to the normal behaviour.
2022-06-17 13:13:21 +01:00
Christopher Baines d4bb0ffaaa Fix more issues with the git_commits introduction 2022-05-23 22:49:51 +01:00
Christopher Baines 8beab2511c Query substitutes for latest processed revisions periodically
This is a step towards having up to date substitute availability data.
2021-11-16 19:08:46 +00:00
Christopher Baines d1a2a7125c Fix a regression with running sqitch
Introduced in 0dc05982cd.
2021-07-11 12:40:48 +01:00
Christopher Baines b4188bda9d Run sqitch in the change mode
Since this rolls back migrations less, which is good when the rollback bit
isn't always implemented.
2021-07-04 10:43:13 +01:00
Christopher Baines 0dc05982cd Try to adapt the PostgreSQL paramstring to use with sqitch 2021-06-16 13:44:00 +01:00
Christopher Baines 2a8a574f4a Allow customising the pg_dump command used
As this
2021-01-03 19:05:41 +00:00
Christopher Baines 375a6a37dc Support not querying pending builds
As this can take some time.
2020-11-01 22:52:53 +00:00
Christopher Baines f485423d5a Allow only fetching builds for a specific system 2020-11-01 22:49:49 +00:00
Christopher Baines 6a7f6b5a0e Fix create small backup issue with latest_build_status 2020-10-23 20:01:43 +01:00
Christopher Baines 3225766207 Make it easier to get to a repl 2020-10-10 13:44:37 +01:00
Christopher Baines 18b6dd9e6d Stop opening a PostgreSQL connection per request
This was good in that it avoided having to deal with long running connections,
but it probably takes some time to open the connection, and these changes are
a step towards offloading the PostgreSQL queries to other threads, so they
don't block the threads for fibers.
2020-10-03 09:22:29 +01:00
Christopher Baines 39b5df04eb Remove development code from the process job script 2020-09-28 08:29:20 +01:00
Christopher Baines 033858410b Add a JSON page for repository branches 2020-09-27 16:32:56 +01:00
Christopher Baines fb180e1ebd Replace debug-set! with setenv COLUMNS
As that actually seems to work.
2020-09-26 16:42:18 +01:00
Christopher Baines 53341c70fc Change the locale codeset representation
From the normalized one, to the one actually contained within glibc. Recent
versions of glibc also contain symlinks linking the normalized codeset to the
locales with the .UTF-8 ending, but older ones do not.

Maybe handling codeset normalisation for queries would be good, but the locale
values ending in .UTF-8 are more compatible and allow the code to be
simplified. For querying, maybe there should be a locales table which handles
different representations.
2020-09-26 11:45:57 +01:00
Christopher Baines e38db9eed9 Set the locale at the start of the process jobs script
This might help with the odd [1] errors regarding PostgreSQL queries.

1: invalid byte sequence for encoding "UTF8":
2020-09-20 11:11:03 +01:00
Christopher Baines a0e098a6ce Increase the stack trace width when processing jobs
As this might result in more useful error messages.
2020-09-20 10:59:22 +01:00
Christopher Baines b6754c8a4c Add a lookup_builds field to the build_servers table
This is to allow for build servers where only the substitutes should be
queried, and it shouldn't be assumed that they're running Cuirass.
2020-05-24 17:02:53 +01:00
Christopher Baines 9c72fc23dc Move around --no-tablespaces
Turns out, at the moment, this is ineffective when combined with the archive
formats, like the custom format in use. Therefore, move it to the pg_restore
command, where hopefully it'll work.
2020-05-16 08:42:00 +01:00
Christopher Baines 796c129a36 Don't include tablespace assignments in the backup dump
This is a comprimise, as this won't help restoring the backup in situations
you want tablespaces, but I'm currently viewing tablespaces as a deployment
concern, so maybe the right thing to do is exclude them. This approach will at
least keep the same behaviour in terms of restoring the backups locally.

This will fix the small dump creation process on data.guix.gnu.org, which is
currently broken because of the tablespace assignments when trying to restore
the backups.
2020-05-14 20:49:46 +01:00
Christopher Baines 6baef6ae25 Split out querying of build servers and substitute servers
These are related things, but somewhat separate. This change should make it
easier to deal with changes regarding querying build servers, and querying
substitute servers.
2020-05-03 13:23:43 +01:00
Christopher Baines a0263a0eae Set a statement timeout of 60 seconds for web requests
This will help stop queries running for an unnecessarily long time, longer
than NGinx will wait for example.
2020-04-24 09:00:20 +01:00
Christopher Baines 5081a64c1f Rebuild the package derivation ranges table for the small backup
This is better than just deleting the entries that don't match up with the
remaining revisions, but also not very useful for local development (due to
the lack of data).
2020-03-31 20:46:18 +01:00
Christopher Baines d1c243f7fd Give the temporary database more working memory
In the hope that this makes the script faster.
2020-03-26 20:21:47 +00:00
Christopher Baines 3017765f0c Use EXPLAIN ANALYZE for the creation of tmp_derivations
In the create-small-backup script, as this is quite a slow part, it's useful
to get more information.
2020-03-26 20:21:14 +00:00
Christopher Baines 9a79a5d747 Handle a couple more tables in create-small-backup
derivation_output_details_sets, and derivations_by_output_details_set. This
required moving around some of the code.
2020-03-26 20:20:29 +00:00
Christopher Baines d0eff9da5d Use the --no-comments option to pg_dump
Hopefully this will help with the pg_restore in the create-small-backup
script:

  pg_restore: [archiver (db)] Error while PROCESSING TOC:
  pg_restore: [archiver (db)] Error from TOC entry 2875; 0 0 COMMENT EXTENSION plpgsql
  pg_restore: [archiver (db)] could not execute query: ERROR:  must be owner of extension plpgsql
      Command was: COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
2020-03-25 20:47:53 +00:00
Christopher Baines 8af7130239 Handle channel instances in create-small-backup
Otherwise this table is empty.
2020-03-25 18:27:01 +00:00
Christopher Baines b99854924a Handle system test derivations in create-small-backup
Otherwise this table is empty.
2020-03-25 18:26:33 +00:00
Christopher Baines ca0d3ee754 Stop using package_versions_by_guix_revision_range
It's been replaced by the package_derivations_by_guix_revision_range table.
2020-03-24 20:44:57 +00:00
Christopher Baines cf4082dbeb Avoid failures related to renice and ionice
These parts of the backup scripts are optional, so don't fail if they don't
work.
2020-03-20 20:40:33 +00:00
Christopher Baines ded4df6632 Move and improve the "starting the server" message
Move it after the output relating to narinfo signing, and include the host.
2020-03-14 13:14:29 +00:00
Christopher Baines a03e1601de Improve handling of errors
Adjust the previously unused error page code, and start to use it. Only show
the error if configured to do so, to avoid leaking secret information.
2020-03-14 12:46:02 +00:00
Christopher Baines baeae56de4 Don't use TRUNCATE CASCADE in the create small backup script
As it makes it clearer what tables will be truncated.
2020-03-13 18:38:42 +00:00
Christopher Baines 6ce96ad55b Trip the derivation output details table in the small data dump 2020-03-13 18:38:29 +00:00
Christopher Baines b64e6b19c2 Trim derivation source file tables in the small data dump 2020-03-13 18:37:46 +00:00
Christopher Baines 77caafb019 Add scripts for generating database dumps 2020-03-02 21:44:29 +00:00
Christopher Baines 65f2f21d3a Support customising the latest branch revision max processes
This makes it possible to set a higher or lower value depending on what you
want.
2020-02-28 20:58:21 +00:00