Yet again...
This makes the channel-derivations for each system accessible within the
load-new-guix-revision procedure, in preparation for storing them in the
database.
Stop using the system values as targets, and remove package derivation entries
where this is the case.
Switch the non-cross derivation case to have a target of "", rather than
matching the system, as this makes more sense, and is more consistent now that
the target values no longer match the system values.
Hardcode some more correct target values, and use these instead. Hopefully
this can be better integrated with Guix in the future.
This commit also includes a migration attempting to shrink some indexes.
In an attempt to stop the derivations from being garbage collected between the
time they're generated, and when they're potentially read by the Guix Data
Service.
Previously this would just compare on the version if the name was the same,
but there are package definitions that share the name and version (itstool is
one example).
To try and make this more stable, to avoid weird errors, and unstable
comparisons between revisions, use the line number when deduplicating
packages.
I think this might be happening when packages are filtered out as
duplicates (by name and version), but then a reference to a duplicate occurs
somewhere, like in a lint warning.
This seems to work better for both generating the non-cross and cross
derivations. Previously, using the package-transitive-supported-systems
approach didn't generate some cross derivations.
Use the second argument to package-transitive-supported-systems to correctly
identify the different bootstrap path for non x86_64 and i686-linux. The
previous implementation did work, but only up until a merge of core-updates
changed the bootstrap approach.
Previously, they were nix-base32-string encoded, but the representation in the
derivations is base16, so it doesn't make sense to use a different
representation in the database.
Therefore, add some code that runs before the start of each job to convert the
data in the database. It was easier to do this in Guile with the existing
support for working with these bytevector representations. After some
migration period, the code for converting the old hashes can be removed.
Previously, all the entries for the branch were deleted, but not, only the
entries for the branch, that are present in the revision that was loaded will
be deleted. This is more efficient, as it avoids deleting and recreating
entries in the table that shouldn't have changed.
There's already the package_versions_by_guix_revision_range table, but I think
it would be also useful to be able to see how derivations change over time.
Between b13b9384bc43bf93c754c037956c8ef9a99c2b41 and
601171a9bc7ca6e4acb932895a07c0ca9aedfdac, this method failed, so catch the
error to allow loading the affected revisions.
Ideally this would go in to the database somehow as well, but the only idea I
have for that is pass in a pipe, and then spawn a thread to read from the
other end of that pipe in a loop to send the output to the database.
That hasn't been written yet, so just send the output to stderr for now.
Add a new event 'retry', and run jobs where the number of retry events is
greater or equal to the number of failure events.
Also add an index to the git_branches table to make the finding jobs query a
bit faster.
This isn't new information, but derived from information already in the
database. It's collected here to make querying faster.
The table is updated when each new revision is entered.
Add tests around the package module, extract out the use of the
inferior-package record assessors so that they aren't part of the tests, and
switch across the package-metadata module to use
insert-missing-data-and-return-all-ids.
To associate a set of lint checkers with a specific revision. While there is
the association through the lint warnings, that only works for checkers where
there are lint warnings for that revision.
Therefore, to allow finding all the checkers for a specific revision, also
associate them directly with the revision.
When loading data from an inferior Guix, first build it's latest version of
glibc-locales, and include that in the environment from the inferior.
This improves locale support, which is currently relevant for extracting lint
warnings.
Without this change, you'd only be able to switch to locales supported by the
glibc-utf8-locales package, assuming that the right version is available.
This commit adds the relevant tables and code to store lint warnings in the
database.
Currently, only lint checkers which don't require access to the network will
be run, as this allows the processing to happen without network access. Also,
this functionality won't work in older versions of Guix which don't expose the
lint warnings in a compatible way.
Reserve some capacity to process revisions which are the tip of a branch. This
should reduce the time between new revisions appearing, and then being
processed.
This should speed up processing new revisions, reduce latency between finding
out about new revisions and processing them, as well as help manage memory
usage, by processing each job in a process that then exits.
This is working towards running the jobs in parallel. Each job looks at the
records in the database, and adds missing ones. If other jobs, running in
different transactions insert the same missing records at the same time, this
could cause an error.
Therefore, to just avoid this problem, lock before inserting the data. This
will allow the jobs to be processed in parallel, and it shouldn't have too
much of an effect on performance, as the slow bit is outside of the
transaction.
This is in preparation for running jobs in parallel. The channels code in Guix
uses a cached copy of the Git repository. Multiple jobs can't concurrently
access this without causing issues, so use an advisory lock to ensure that
only one job is using the repository at a time.
To better separate the code that needs to happen after a lock has been
acquired to allow concurrently loading revisions without concurrent insertion
issues.
Previously, the query for the jobs page was really slow, as it checked the
load_new_guix_revision_job_log_parts table for each job, doing a sequential
scan through the potentially large table.
Adding an index didn't seem to help, as the query planner would belive the
query could return loads of rows, where actually, all that needed checking is
whether a single row existed with a given job_id.
To avoid adding the index to the load_new_guix_revision_job_log_parts table,
and fighting with the query planner, this commit changes the
load_new_guix_revision_job_logs table to include a blank entry for jobs which
are currently being processed. This is inserted at the start of the job, and
then updated at the end to combine and replace all the parts.
This all means that the jobs page should render quickly now.
Replace the Guile-side HTML escaping with a less complete, but hopefully
faster PostgreSQL side HTML escaping approach.
Also, allow reading part of the log, by default, the last 1,000,000
characters, as this should render quickly.
So that it can easily be shown through the web interface. There's two tables
being used. One which temporarily stores the output as it's output while the
job is running, and other which stores the whole log once the job has
finished.
Conventionally, you'd process the oldest job in a queue, but at the moment, it
would be more useful to have recent information more promptly, and fill in the
historical gaps later. I'm not sure this'll always be the case, but for now,
flip the order in which jobs are processed.
And use it for the hosting the inferiors, rather than computing the guix
package at runtime. This simplifies the behaviour when the Guix Data Service
is deployed as a Guix package.
Create a new events table for the new guix revision jobs, and update this when
processing a job starts, as well as finished with success or failure.
Additionally, remove the dependnency on open-inferior/container, as this
functionality isn't merged in to Guix master yet.
Previously, the records for jobs would be deleted. It's useful to know when
jobs were inserted in to the database, as well as when they succeeded (if they
have). This change also makes it possible to keep track of jobs that have
failed, as they won't be deleted.
And display this on the package page.
This uses a couple of new tables, and an additional field in the
package_metadata table.
Currently, the order of the licenses in the package definition isn't stored,
as I'm not sure the order in the list is significant.
Rather than just storing the URL in the guix_revisions and
load_new_guix_revision_jobs tables. This will help when storing more
information like tags and branches in the future.
Split the derivations up in to some groups, and run
invalidate-derivation-caches! inbetween to try and reduce the memory
usage.
Also make a couple of other changes to reduce memory usage or protect
against errors.
Compute all derivations at once in the inferior, avoiding round trips
to hopefully speed it up. Close the inferior earlier to free up
memory, and add more debugging output.
A large proportion of these changes relate to changing the way
packages relate to derivations. Previously, a package at a given
revision had a single derivation. This was OK, but didn't account for
multiple architectures.
Therefore, these changes mean that a package has multiple derivations,
depending on the system of the derivation, and the target system.
There are multiple changes, small and large to the web interface as
well. More pages link to each other, and the visual display has been
improved somewhat.
Currently, I think the desired commit can be missing, if patches come
in gradually, and the series changes after the first laminar job has
been run. Therefore, try to ignore some errors and just delete the
job.
These changes mean that more information about derivations is
recorded. There are a number of corresponding changes in the database
schema that are not tracked in the repository unfortunately.
This is a service designed to provide information about Guix. At the
moment, this initial prototype gathers up information about packages,
the associated metadata and derivations.
The initial primary use case is to compare two different revisions of
Guix, detecting which packages are new, no longer present, updated or
otherwise different.
It's based on the Mumi project.
[1]: https://git.elephly.net/software/mumi.git