Commit Graph

202 Commits

Author SHA1 Message Date
Christopher Baines e39c9da028 Store the distribution of derivations related to packages
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.

The data looks like this (for a specified system and target):

┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│    15 │     2 │
│    14 │     3 │
│    13 │     3 │
│    12 │     3 │
│    11 │    14 │
│    10 │    25 │
│     9 │    44 │
│     8 │    91 │
│     7 │  1084 │
│     6 │   311 │
│     5 │   432 │
│     4 │   515 │
│     3 │   548 │
│     2 │  2201 │
│     1 │ 21162 │
│     0 │ 22310 │
└───────┴───────┘

Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.

When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.

When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
2023-03-09 08:29:39 +00:00
Christopher Baines e9ccb66225 Fix counting derivations when there are lots 2023-03-09 08:17:14 +00:00
Christopher Baines 784c1da9ae Remove peek 2023-03-05 10:18:22 +00:00
Christopher Baines bf41c6ebb1 Set current-guix-package when computing system test derivations
This is a bit ugly, but might speed up computing derivations for system tests.
2023-02-28 10:51:51 +00:00
Christopher Baines 3ba8418656 Allow skipping processing system tests
Generating system test derivations are difficult, since you generally need to
do potentially expensive builds for the system you're generating the system
tests for. You might not want to disable grafts for instance because you might
be trying to test whatever the test is testing in the context of grafts being
enabled.

I'm looking at skipping the system tests on data.guix.gnu.org, because they're
not used and quite expensive to compute.
2023-02-08 14:56:48 +00:00
Christopher Baines 9f05f5f4f9 Only sometimes attempt manually computing n_distinct values
For derivation_inputs.
2022-11-29 10:36:46 +00:00
Christopher Baines 1a0c5599eb Do derivation inputs and outputs housekeeping at the end of each job
This should help with query performance, as the recursive queries using
derivation_inputs and derivation_outputs are particularly sensitive to the
n_distinct values for these tables.
2022-11-28 11:36:12 +00:00
Christopher Baines ad93a780d3 Improve the inferior cleanup when computing package derivations 2022-11-24 12:37:49 +00:00
Christopher Baines ff6f87a3b9 Skip the derivation linter
And remove the chunking of derivation lint warnings.

The derivation linter computes the derivation for each packages supported
systems, but there are two problems with the approach. By doing this for each
package in turn, it forces inefficient uses of caches, since most of the
cached data is only relevant to a single system. More importantly though,
because the work of checking one package is dependent on it's supported
systems, it's unpredictable how much work will happen, and this will tend to
increase as more packages support more systems.

I think especially because of this last point, it's not worth attempting to
keep running the derivation linter at the moment, because it doesn't seem
sustainable. I can't see an way to run it that's futureproof and won't break
at some point in the future when packages in Guix support more systems.
2022-11-24 12:37:49 +00:00
Christopher Baines cc61bb5f13 Drop the chunk size when gathering lint warnings
To try and bring the peak memory usage down.
2022-11-14 09:26:59 +00:00
Christopher Baines 95064d39a3 Log heap size when computing system tests 2022-11-06 08:53:04 +01:00
Christopher Baines 1e2826e095 Add more granular logging for computing system test derivations 2022-11-06 08:47:58 +01:00
Christopher Baines 640386a84d Insert guix revision lint warnings in chunks
To avoid long running queries.
2022-09-17 08:53:23 +02:00
Christopher Baines 35cf9ba1bc Chunk inserting guix revision package derivation entries 2022-09-15 16:25:41 +02:00
Christopher Baines 076331325a Log more information about heap size when loading derivation info
To better understand the memory usage when this is happening.
2022-09-05 14:23:10 +01:00
Christopher Baines b3d59c650a Use much smaller chunks when trying to run the derivation linter
Since larger chunks still ran in to inferior memory usage problems.
2022-09-05 14:22:38 +01:00
Christopher Baines aa8c9dbffa Compute lint warnings for packages in chunks
In an attempt to reduce the peak memory usage, and avoid running in to the:

  Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS

issue.
2022-09-05 08:40:27 +01:00
Christopher Baines 4b8846a861 Remove cross derivation targets that don't make sense
This matches the previous behaviour without using the platform data.
2022-05-27 10:09:09 +01:00
Christopher Baines fb8353559f Take advantage of the new (guix platform) module
This means there's less reliance on the hardcoded lists of systems and targets
and mappings between them.
2022-05-26 00:24:55 +01:00
Christopher Baines d4bb0ffaaa Fix more issues with the git_commits introduction 2022-05-23 22:49:51 +01:00
Christopher Baines e5cb793d4e Raise a clearer exception when a linter crashes 2022-05-23 19:19:57 +01:00
Christopher Baines 198b6ef719 Only clear the %store-table when it's defined 2022-05-17 12:06:09 +01:00
Christopher Baines 5727703d84 Clear cached store connections when fetching lint warnings
As I'm seeing the inferior process crash with [1] just after fetching the
derivation lint warnings.

This change appears to help, although it's probably just a workaround. When
there's more packages/derivations, the caches might need clearing while
fetching the derivation lint warnings, or this will need to be split across
multiple processes.

1: Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
2022-05-13 12:03:43 +01:00
Christopher Baines ff116d5e64 Clear out cached store connections in the load revision inferior
These cached store connections have caches associated with them, that take up
lots of memory, leading to the inferior crashing. This change seems to help.
2022-05-07 09:55:13 +01:00
Christopher Baines 9607bcedfe Move the builds.derivation_output_details_set_id update
To the end of the main revision processing transaction.

Currently, I think there are issues when this query does update some builds,
as those rows in the build table remain locked until the end of the
transaction. This then causes build event submission to hang. Moving this part
of the revision loading process to the end of the transaction should help to
mitigate this.
2022-04-16 18:47:54 +01:00
Christopher Baines 097e22ab5e Close the load revision inferior prior to inserting data
This means that the lock can be acquired after closing the inferior, freeing
the large amount of memory that the inferior process is probably using.
2022-03-11 13:27:55 +00:00
Christopher Baines 0f07826a20 Extract out the code for starting an inferior 2022-03-11 11:22:08 +00:00
Christopher Baines 8379427cbb Process each system target pair individually
As the cross targets take quite some time.
2022-03-11 11:12:18 +00:00
Christopher Baines 9db755f27d Disable value history in the inferior repl
This might help reduce memory usage a little.
2022-03-11 11:11:53 +00:00
Christopher Baines 0e3f65062a Compute more cross derivations 2022-03-11 10:07:08 +00:00
Christopher Baines fe556f4a4d Deduplicate inferior packages including replacements
Previously, duplicates could creep through if the duplicate wasn't exported,
and only found as a replacement. Now they're filtered out.

This isn't ideal, as duplicates aren't always mistakes, it would be useful
still to capture this package, but having multiple entries for the same
name+version causes the comparison functionality to break.
2022-03-04 14:22:10 +00:00
Christopher Baines 0cc749a1fa Simplify deduplicate-inferior-packages
Use the a-version and b-version variables, rather than calling the functions
again.
2022-03-04 14:20:41 +00:00
Maxime Devos 8ab72e633a Include the nl_NL.UTF-8 locale when processing revisions
It's 100% translated according to
<https://translate.fedoraproject.org/projecs/guix/guix/nl/>.

* guix-data-service/model/package-metadata.scm
  (locales): Add nl_NL.utf-8

Signed-off-by: Christopher Baines <mail@cbaines.net>
2022-03-02 21:05:30 +00:00
Christopher Baines 3a90798567 Address a few issues in the load new guix revision tests 2022-03-02 18:23:26 +00:00
Christopher Baines 4a9d45aa16 Skip dropping the log part sequence if there's a lock
So that the job completes. The sequence can be deleted later.
2022-02-21 13:02:57 +00:00
Christopher Baines a0a7d66b1d Move logging cleanup tasks to after the transaction commits
As I think some operations (like the database backup) can block the DROP
SEQUENCE bit, so at least this approach means that the main transaction should
commit and then the sequence is eventually dropped.
2022-02-19 09:54:39 +00:00
Christopher Baines 17167ef3e4 Change how package supported systems are handled
This code is a bit tricky, since it should be compatible with old and new guix
revisions. I think these changes stop computing package derivations for
invalid systems, while hopefully not breaking anything.
2022-02-18 12:21:08 +00:00
Christopher Baines 3840f588e5 Improve logging for system test derivation issues 2022-01-09 10:11:53 +00:00
Christopher Baines af209170f7 Track package replacements
Start at least looking for package replacements, and storing the
details (particularly the derivation). I'm looking at doing this so that build
servers using the Guix Data Service can build these derivations.
2021-07-11 11:57:05 +01:00
Christopher Baines 1a21bc40a8 Pass #:system to channel-instances->manifest
This is better than setting the %current-system, since more of the process
will run as native code.
2021-06-09 10:59:31 +01:00
Christopher Baines da0ee9dff0 Use filter-map rather than filter and map when processing linters
I guess this is a good change in general, but this seems to avoid a long
stack, which when a linter crashes, and the inferior tries to return the
exception details, and apparently hang the inferior/client as the reply isn't
written/read.
2021-05-16 20:54:07 +01:00
Christopher Baines 410f58cb43 Fix load revision jobs not failing if the extraction of data fails 2021-03-25 17:57:48 +00:00
Christopher Baines 07f903abaf Fix dropping the job lock 2021-02-05 11:07:07 +00:00
Christopher Baines 02b75d857a Reformat queries in the load-new-guix-revision module 2021-02-05 09:16:08 +00:00
Christopher Baines 570c667222 Tweak exception handling when loading revisions 2021-02-03 12:53:53 +00:00
Christopher Baines 643753ff46 Better handle retries for jobs
This was broken when the guix_revisions entry started being added before the
final commit.
2021-02-03 10:35:56 +00:00
Christopher Baines 7fbcb3a3c2 Store channel instance derivations in a separate transaction
This means that these derivations are stored, even if a later part of the
process fails. Having the channel instance derivations stored might help work
out why the failure occurred, or better display information about it.
2021-02-02 23:36:56 +00:00
Christopher Baines ea7331ad25 Don't ignore all system tests if computing one derivation fails 2021-01-14 20:45:03 +00:00
Christopher Baines bd8390673e Fix squee upgrade issues in the load-new-guix-revision module 2021-01-02 11:13:30 +00:00
Christopher Baines 64a4058cce Start to add compatibility with squee returning #f for null values
While maintaining compatibility for older versions of squee.
2021-01-02 10:06:27 +00:00