Commit Graph

48 Commits

Author SHA1 Message Date
Christopher Baines 9f102dbd39 Add code to delete nars entries 2023-08-01 14:13:10 +01:00
Christopher Baines 7495085f63 Delete unreferenced derivations in batches
To avoid a long blocking query.
2023-08-01 10:16:31 +01:00
Christopher Baines bbc53deb1f Rewrite deleting unreferenced derivations
Use fibers more, leaning in on the non-blocking use of Squee for parallelism.
2023-07-25 17:57:00 +01:00
Christopher Baines 7251c7d653 Stop using a pool of threads for database operations
Now that squee cooperates with suspendable ports, this is unnecessary. Use a
connection pool to still support running queries in parallel using multiple
connections.
2023-07-10 18:56:31 +01:00
Christopher Baines 742949cc97 Improve data deletion 2023-07-01 12:01:13 +01:00
Christopher Baines 47c482bdcc Set lock_timeout for some data deletion transactions
As these can cause deadlocks. This will probably cause errors, so some
retrying will need to be added.
2023-05-09 08:55:09 +01:00
Christopher Baines 4fa7a3601e Include distribution counts table in data deletion 2023-04-07 11:21:28 +01:00
Christopher Baines 2d96fbff48 Speed up deleting blocked_builds entries 2023-02-27 22:52:43 +00:00
Christopher Baines 1bce38a69d Move the delete-unreferenced-derivations advisory lock
To better prevent two processes running at the same time.
2023-02-27 22:48:54 +00:00
Christopher Baines 1266d3d336 Remove redundant postgresql connection when deleting derivations 2023-02-14 20:59:21 +00:00
Christopher Baines ebbcf36dc4 Delete blocked_builds entries when deleting derivations 2023-02-14 20:10:44 +00:00
Christopher Baines 5874c4ee37 Delete git_branches entries
When deleting data for a branch.
2023-02-14 19:57:30 +00:00
Christopher Baines 9872367c01 Avoid errors dropping partition tables if they don't exist 2023-02-13 20:10:23 +00:00
Christopher Baines 078516e0ab Improve dropping package_derivation_by_guix_revision_range partitions 2023-02-13 19:26:44 +00:00
Christopher Baines 38b3657013 Use advisory locks to avoid deadlocks during data deletion
In the case where multiple data deleting processes end up running at the same
time.
2022-11-28 10:26:46 +00:00
Christopher Baines 39487cd7e6 Improve deleting derivations
Drop the batch size to get rid of warnings about memory usage and improve the
logging by adding duration information.
2022-07-08 20:55:58 +01:00
Christopher Baines 22c2ed2fa7 Fix ambiguous id column in delete-guix-revisions query 2022-06-16 12:46:32 +01:00
Christopher Baines 754f64718f Fix DELETE query in delete-revisions-from-branch 2022-06-16 12:38:51 +01:00
Christopher Baines be45e4251e Fix ambiguous id column in delete-from-git-commits 2022-06-16 12:30:08 +01:00
Christopher Baines 71aaf1016b Remove duplicate AND from delete-from-git-commits query 2022-06-16 12:25:47 +01:00
Christopher Baines 64be52844e Partition the package_derivations_by_guix_revision_range table
And create a proper git_branches table in the process.

I'm hoping this will help with slow deletions from the
package_derivations_by_guix_revision_range table in the case where there are
lots of branches, since it'll separate the data for one branch from another.

These migrations will remove the existing data, so
rebuild-package-derivations-table will currently need manually running to
regenerate it.
2022-05-23 19:10:25 +01:00
Christopher Baines 971a474f65 Update delete-unreferenced-derivations
To delete from latest_build_status as well.
2020-10-13 20:33:07 +01:00
Christopher Baines f02c245652 Add another guard clause in to the data deletion code
I've see this error [1] which may relate to the derivation-output-details-id
not being a number, so this check should confirm if there is a issue.

1: Throw to key `psql-query-error' with args `(fatal-error "PGRES_FATAL_ERROR" "ERROR:  invalid input syntax for integer: \"\"\n")'.
2020-10-10 13:34:54 +01:00
Christopher Baines 2c463fcdab Guard against derivation IDs that aren't numbers
I saw an error suggesting that something came back that wasn't a number, and
this should give a more informative error.
2020-10-09 19:27:04 +01:00
Christopher Baines 062397e82b Just use map rather than par-map& for deleting derivations
As I think par-map& is probably no faster.
2020-10-08 08:20:03 +01:00
Christopher Baines 936fda57c5 Make the derivation deletion batch size configurable 2020-10-08 07:52:03 +01:00
Christopher Baines b540abaeba Reduce the derivation deletion batch size 2020-10-08 07:49:28 +01:00
Christopher Baines f68166514f Actually delete more of the data for a revision
Previously the package_derivations table wasn't considered, which would mean
derivations would still be referenced. This commit fixes that, along with also
deleting unreferenced entries in some linter related tables.
2020-10-04 15:11:21 +01:00
Christopher Baines 48673b32cb Fix delete-unreferenced-derivations 2020-10-04 13:23:15 +01:00
Christopher Baines a24d3e934d Extract out the ability to delete a range of commits
Some revisions have got disassociated from branches, probably because they
were associated with multiple branches in the first place. This should allow
deleting them.
2020-10-04 12:18:57 +01:00
Christopher Baines e2e55c69de Rework the shortlived PostgreSQL specific connection channel
In to a generic thing more like (ice-9 futures). Including copying some bits
from the (ice-9 threads) module and adapting them to work with this fibers
approach, rather than futures. The advantage being that using fibers channels
doesn't block the threads being used by fibers, whereas futures would.
2020-10-03 21:32:46 +01:00
Christopher Baines 470573b318 Delete derivation_source_files that are unreferenced
This will also delete unreferenced derivation_source_file_nars.
2020-10-02 20:15:23 +01:00
Christopher Baines 54654417a3 Delete derivations in parallel
In an attempt to make this faster.
2020-10-01 19:15:32 +01:00
Christopher Baines 16600b1a43 Remove the deleting derivations progress output
As this is harder to do when deleting derivations in parallel.
2020-10-01 19:14:56 +01:00
Christopher Baines fb4c7ecd4c Delete derivations through a channel
Not much different from before, but this will allow parallelising things.
2020-10-01 19:14:11 +01:00
Christopher Baines 3330f034a4 Remove a now redundant part of the maybe-delete-derivation query
As this is covered by the big query selecting the derivation ids.
2020-09-30 20:34:33 +01:00
Christopher Baines d844b325e2 Stop recursing now that derivation deletion selection is smarter
As this probably won't help with performance.
2020-09-30 20:07:41 +01:00
Christopher Baines 47af6c9661 Attempt to speed up derivation deletion
Stop querying for the file-name, as it's unused. Rather than fetching all ids,
then looking at each to see if it can be deleted, do some imperfect but not
too slow checks in the initial query.
2020-09-30 19:38:56 +01:00
Christopher Baines 02681d7e7a Fix delete builds for derivation output details set 2020-09-27 16:21:51 +01:00
Christopher Baines 5b13ee2251 Delete builds for unreferenced derivations 2020-09-27 11:11:02 +01:00
Christopher Baines 52a23a5333 Further data deletion improvements 2020-09-27 11:10:47 +01:00
Christopher Baines 65e8bf3f8d Add delete-revisions-from-branch-except-most-recent-n 2020-09-26 19:38:56 +01:00
Christopher Baines 992a0af63e Split off delete-revisions-from-branch from delete-data-for-branch
To support not deleting all of the revisions.
2020-09-26 18:23:21 +01:00
Christopher Baines f11421824d Add a helper procedure to delete data for deleted branches 2020-05-23 21:05:44 +01:00
Christopher Baines ca0d3ee754 Stop using package_versions_by_guix_revision_range
It's been replaced by the package_derivations_by_guix_revision_range table.
2020-03-24 20:44:57 +00:00
Christopher Baines 9178bd51a9 Add a function to delete unreferenced derivations 2020-02-16 22:29:25 +00:00
Christopher Baines b087cfca67 Define the code to delete data from non-master branches properly 2020-02-16 10:59:38 +00:00
Christopher Baines 773e5a9c38 Add a module to handle deleting data
This, along with the way of specifying which branches are processed is a way
to manage the data stored within the Guix Data Service.

This only goes so far, it doesn't delete derivations, but it does delete some
of the information related to a revision.
2020-02-15 11:36:31 +00:00