Commit Graph

90 Commits

Author SHA1 Message Date
Christopher Baines 10bad53ad5 Support polling git repositories for new branches/revisions
This is mostly a workaround for the occasional problems with the guix-commits
mailing list, as it can break and then the data service doesn't learn about
new revisions until the problem is fixed.

I think it's still a generally good feature though, and allows deploying the
data service without it consuming emails to learn about new revisions, and is
a step towards integrating some kind of way of notifying the data service to
poll.
2023-10-09 22:19:02 +01:00
Christopher Baines 1461aa037f Make database changes to help with deleting entries in the nars table 2023-08-01 14:12:48 +01:00
Christopher Baines e39c9da028 Store the distribution of derivations related to packages
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.

The data looks like this (for a specified system and target):

┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│    15 │     2 │
│    14 │     3 │
│    13 │     3 │
│    12 │     3 │
│    11 │    14 │
│    10 │    25 │
│     9 │    44 │
│     8 │    91 │
│     7 │  1084 │
│     6 │   311 │
│     5 │   432 │
│     4 │   515 │
│     3 │   548 │
│     2 │  2201 │
│     1 │ 21162 │
│     0 │ 22310 │
└───────┴───────┘

Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.

When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.

When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
2023-03-09 08:29:39 +00:00
Christopher Baines a6f239fb8a Add an extra index on blocked_builds
I think this might help with queries that don't use the build_server_id.
2023-03-05 10:21:14 +00:00
Christopher Baines fc5f562731 Add index on derivation_outputs id and derivation_id fields
This might help with doing recursive queries on the derivations graph.
2022-11-12 10:42:04 +00:00
Christopher Baines fb9d99a076 Add extended statistics on package_derivations
This helps row count estimates when filtering on system_id and target.
2022-11-12 10:40:43 +00:00
Christopher Baines 1fb291be40 Add support for incrementally tracking blocked builds
This will hopefully provide a less expensive way of finding out if a scheduled
build is probably blocked by other builds failing or being canceled.

By working this out when the build events are recieved, it should be more
feasible to include information about whether builds are likely blocked or not
in various places (e.g. revision comparisons).
2022-11-11 10:35:09 +00:00
Christopher Baines e3062abf33 Speed up finding the locales for a revision 2022-09-14 10:26:03 +01:00
Christopher Baines ad39492cf0 Support per repository configuration for fetching substitutes
This is so that data.qa.guix.gnu.org can be configured only to query the
branches from the main repository.
2022-09-09 12:39:38 +01:00
Christopher Baines d19eb07138 Add an index on git_commits
And also remove the duplicates that have crept in.
2022-06-17 12:54:43 +01:00
Christopher Baines 3b5950024b Add an index on package_derivations_by_guix_revision_range
This helps render the package version range related pages.
2022-06-17 12:51:10 +01:00
Christopher Baines 64be52844e Partition the package_derivations_by_guix_revision_range table
And create a proper git_branches table in the process.

I'm hoping this will help with slow deletions from the
package_derivations_by_guix_revision_range table in the case where there are
lots of branches, since it'll separate the data for one branch from another.

These migrations will remove the existing data, so
rebuild-package-derivations-table will currently need manually running to
regenerate it.
2022-05-23 19:10:25 +01:00
Christopher Baines 5cefa250d8 Add an index on derivation_output_details.id
Where hash is not null, since this helps with the fixed output package
derivations query.
2021-07-11 15:25:12 +01:00
Christopher Baines af209170f7 Track package replacements
Start at least looking for package replacements, and storing the
details (particularly the derivation). I'm looking at doing this so that build
servers using the Guix Data Service can build these derivations.
2021-07-11 11:57:05 +01:00
Canan Talayhan 9c1b989d20 Add index for package_metadata
This index speeds up inserting new entries to the package_metadata table.

Signed-off-by: Christopher Baines <mail@cbaines.net>
2021-05-27 21:26:56 +01:00
Christopher Baines f4a87ec059 Add some indexes
This is from reading the slow query log on data.guix.gnu.org.
2021-05-17 18:40:45 +01:00
Christopher Baines b430c632b7 Create a table for systems
And use it for the systems in the derivations and package derivations tables.

The primary motivation here is to allow quickly working out what systems the
database contains, and having a small table with just the right data seems a
good way to do that.
2021-04-23 11:14:51 +01:00
Christopher Baines 7fbcb3a3c2 Store channel instance derivations in a separate transaction
This means that these derivations are stored, even if a later part of the
process fails. Having the channel instance derivations stored might help work
out why the failure occurred, or better display information about it.
2021-02-02 23:36:56 +00:00
Christopher Baines 6e390c9b9d Increase statistics targets for the derivation_inputs table
It has a lot of rows, and this seems to help the query planning for
select-required-builds-that-failed, as I think it helps PostgreSQL better
estimate the row counts that different parts of the query will produce.
2020-12-27 10:37:06 +00:00
Christopher Baines 67f1ace9eb Add index to speed up deleting revisions
This is necessary to quickly delete entries from package_derivaitons.
2020-10-27 16:59:09 +00:00
Christopher Baines 6a04d474c0 Regenerate the latest_build_status table
As it doesn't handle wierdness for statuses, like builds being canceled with a
timestamp of 0.
2020-10-21 19:46:27 +01:00
Christopher Baines 83884ed2ea Add a latest_build_status table
This will avoid many queries trying to figure out what the latest build status
is, which will hopefuly simplify queries as well as improving performance.
2020-10-13 19:31:43 +01:00
Christopher Baines 9f9c23010e Support storing whether to authenticate commits by git repository
To allow using the Guix Data Service when commits aren't authenticated.
2020-10-07 18:50:13 +01:00
Christopher Baines e2a7705d3d Add an index for derivation_sources.derivation_source_file_id
As this speeds up deleting derivation_source_files.
2020-10-02 20:15:23 +01:00
Christopher Baines 841f5fb186 Change a constraint to add ON DELETE CASCADE
I've not used these in many places, to try and avoid hiding deleting data, but
in this case, this will allow more easily deleting the derivation source file
nars, by just deleting the derivation_source_files table entry.
2020-10-02 20:15:10 +01:00
Christopher Baines 6e0e33addf Change the autovacuum config for some tables
Looking at data for the the patches deployment of the Guix Data Service, these
tables look like they might benefit from vacuuming/analyzing more often, so
adjust the configuration so this will hopefully happen.
2020-10-01 22:30:39 +01:00
Christopher Baines 53341c70fc Change the locale codeset representation
From the normalized one, to the one actually contained within glibc. Recent
versions of glibc also contain symlinks linking the normalized codeset to the
locales with the .UTF-8 ending, but older ones do not.

Maybe handling codeset normalisation for queries would be good, but the locale
values ending in .UTF-8 are more compatible and allow the code to be
simplified. For querying, maybe there should be a locales table which handles
different representations.
2020-09-26 11:45:57 +01:00
Christopher Baines c7048abe31 Change a package_descriptions index to use MD5 of the description
To avoid the index data being too large.

This was first seen in guix commit 1bb4fd64b7bbe5a17eda6f0ca8105283c038f7c8:

  psql-query-error (fatal-error PGRES_FATAL_ERROR ERROR: index row size 2808
  exceeds maximum 2712 for index "package_descriptions_locale_description_key"
  HINT: Values larger than 1/3 of a buffer page cannot be indexed.
2020-08-23 14:42:32 +01:00
Danjela Lura ee613cdb30 Add the package_metadata_tsvectors table
Signed-off-by: Christopher Baines <mail@cbaines.net>
2020-08-12 08:53:35 +01:00
Christopher Baines 5dc9ed65ff Add a index to derivation_output_details_sets
This helps with queries like:

  SELECT *
  FROM derivation_output_details_sets
  WHERE ARRAY[5563227] <@ derivation_output_details_ids;
2020-07-04 08:59:29 +01:00
Christopher Baines 0242bc6a2e Remove the constraint on having one derivation build per server
As the Guix Build Coordinator can build one derivation many times.
2020-07-01 09:35:40 +01:00
Christopher Baines 5ee8207565 Avoid duplicate build status entries
This effectively makes adding build status entries idempotent to some degree.
2020-07-01 09:35:29 +01:00
Christopher Baines ddbe4c2192 Support storing the id's representing builds in other services
Like Cuirass, or the Guix Data Service. This isn't so necessary with Cuirass,
as the derivation/output are unique identifiers, but for the Guix Build
Coordinator, one derivation/output can be built many times, with potentially
differing results.
2020-06-27 14:05:51 +01:00
Christopher Baines 1d48885e6a Make build_status.timestamp nullable
The Guix Build Coordinator doesn't record timestamps for events
currently. This is something to fix, but allow for build statuses without
timestamps as they're not necessary.
2020-06-27 14:05:51 +01:00
Danjela Lura 8d9a4d105c Get the translated package synopsis and descriptions into the database
Signed-off-by: Christopher Baines <mail@cbaines.net>
2020-06-19 16:03:32 +01:00
Danjela Lura f67bea719d Get the translated lint checker descriptions into the database
Signed-off-by: Christopher Baines <mail@cbaines.net>
2020-06-07 20:17:03 +01:00
Christopher Baines 916b594473 Make nar_urls file sizes optional 2020-06-03 06:56:59 +01:00
Christopher Baines b6754c8a4c Add a lookup_builds field to the build_servers table
This is to allow for build servers where only the substitutes should be
queried, and it shouldn't be assumed that they're running Cuirass.
2020-05-24 17:02:53 +01:00
Christopher Baines 111576e3f9 Remove odd package derivations with mismatched systems
This follows on from 79993bfaef, disassociating
the relevant derivations from the revisions.
2020-04-24 21:47:06 +01:00
Christopher Baines 776d0462e9 Fix a missing ; in a Sqitch migration 2020-03-29 12:05:00 +01:00
Christopher Baines b30904e1af Make jobs unique by commit
Given that a commit, regardless of what repository it comes from should
contain the same exactly the same data, just track jobs by commit.
2020-03-28 19:59:35 +00:00
Christopher Baines 0c4e6a2fe4 Create an index on narinfo_fetch_records
This greatly improves the performance of the derivation-outputs page.
2020-03-25 19:11:03 +00:00
Christopher Baines ca0d3ee754 Stop using package_versions_by_guix_revision_range
It's been replaced by the package_derivations_by_guix_revision_range table.
2020-03-24 20:44:57 +00:00
Christopher Baines c44297b615 Generate and store system test derivations for all supported systems
Rather than just the native system. I'm not quite sure of the value here, as I
guess system tests should behave the same regardless of the way the software
is compiled, but this seems like it could be useful, and being explicit about
the system the derivation is for is good.
2020-03-20 08:28:55 +00:00
Christopher Baines e5a6680808 Make a constraint deferrable
To allow for deferring it in a transaction when deleting derivations.
2020-02-16 22:28:37 +00:00
Christopher Baines dfccbf911c Update the build_servers_build_config values
Change target to '' if it matches system, as this matches the updated way of
storing derivations.
2020-02-13 23:11:36 +00:00
Christopher Baines 9be7dbac0b Start storing channel instance derivations
These are the ones that relate to Guix pull.
2020-02-11 08:56:24 +00:00
Christopher Baines c31c9575e0 Support excluding and including branches from a repository
So that you can have the Guix Data Service only pay attention to some of the
branches.
2020-02-08 12:03:41 +00:00
Christopher Baines f1989e8758 Rework cross derivations support
Stop using the system values as targets, and remove package derivation entries
where this is the case.

Switch the non-cross derivation case to have a target of "", rather than
matching the system, as this makes more sense, and is more consistent now that
the target values no longer match the system values.

Hardcode some more correct target values, and use these instead. Hopefully
this can be better integrated with Guix in the future.

This commit also includes a migration attempting to shrink some indexes.
2020-02-08 11:16:08 +00:00
Christopher Baines a3d84b1aec Add missing migration 2020-02-03 22:57:57 +00:00