Reserve some capacity to process revisions which are the tip of a branch. This
should reduce the time between new revisions appearing, and then being
processed.
This should speed up processing new revisions, reduce latency between finding
out about new revisions and processing them, as well as help manage memory
usage, by processing each job in a process that then exits.
This is working towards running the jobs in parallel. Each job looks at the
records in the database, and adds missing ones. If other jobs, running in
different transactions insert the same missing records at the same time, this
could cause an error.
Therefore, to just avoid this problem, lock before inserting the data. This
will allow the jobs to be processed in parallel, and it shouldn't have too
much of an effect on performance, as the slow bit is outside of the
transaction.
This is in preparation for running jobs in parallel. The channels code in Guix
uses a cached copy of the Git repository. Multiple jobs can't concurrently
access this without causing issues, so use an advisory lock to ensure that
only one job is using the repository at a time.
To better separate the code that needs to happen after a lock has been
acquired to allow concurrently loading revisions without concurrent insertion
issues.
Previously, the query for the jobs page was really slow, as it checked the
load_new_guix_revision_job_log_parts table for each job, doing a sequential
scan through the potentially large table.
Adding an index didn't seem to help, as the query planner would belive the
query could return loads of rows, where actually, all that needed checking is
whether a single row existed with a given job_id.
To avoid adding the index to the load_new_guix_revision_job_log_parts table,
and fighting with the query planner, this commit changes the
load_new_guix_revision_job_logs table to include a blank entry for jobs which
are currently being processed. This is inserted at the start of the job, and
then updated at the end to combine and replace all the parts.
This all means that the jobs page should render quickly now.
Replace the Guile-side HTML escaping with a less complete, but hopefully
faster PostgreSQL side HTML escaping approach.
Also, allow reading part of the log, by default, the last 1,000,000
characters, as this should render quickly.
So that it can easily be shown through the web interface. There's two tables
being used. One which temporarily stores the output as it's output while the
job is running, and other which stores the whole log once the job has
finished.
Conventionally, you'd process the oldest job in a queue, but at the moment, it
would be more useful to have recent information more promptly, and fill in the
historical gaps later. I'm not sure this'll always be the case, but for now,
flip the order in which jobs are processed.
And use it for the hosting the inferiors, rather than computing the guix
package at runtime. This simplifies the behaviour when the Guix Data Service
is deployed as a Guix package.
Create a new events table for the new guix revision jobs, and update this when
processing a job starts, as well as finished with success or failure.
Additionally, remove the dependnency on open-inferior/container, as this
functionality isn't merged in to Guix master yet.
Previously, the records for jobs would be deleted. It's useful to know when
jobs were inserted in to the database, as well as when they succeeded (if they
have). This change also makes it possible to keep track of jobs that have
failed, as they won't be deleted.
And display this on the package page.
This uses a couple of new tables, and an additional field in the
package_metadata table.
Currently, the order of the licenses in the package definition isn't stored,
as I'm not sure the order in the list is significant.
Rather than just storing the URL in the guix_revisions and
load_new_guix_revision_jobs tables. This will help when storing more
information like tags and branches in the future.
Split the derivations up in to some groups, and run
invalidate-derivation-caches! inbetween to try and reduce the memory
usage.
Also make a couple of other changes to reduce memory usage or protect
against errors.
Compute all derivations at once in the inferior, avoiding round trips
to hopefully speed it up. Close the inferior earlier to free up
memory, and add more debugging output.
A large proportion of these changes relate to changing the way
packages relate to derivations. Previously, a package at a given
revision had a single derivation. This was OK, but didn't account for
multiple architectures.
Therefore, these changes mean that a package has multiple derivations,
depending on the system of the derivation, and the target system.
There are multiple changes, small and large to the web interface as
well. More pages link to each other, and the visual display has been
improved somewhat.
Currently, I think the desired commit can be missing, if patches come
in gradually, and the series changes after the first laminar job has
been run. Therefore, try to ignore some errors and just delete the
job.
These changes mean that more information about derivations is
recorded. There are a number of corresponding changes in the database
schema that are not tracked in the repository unfortunately.
This is a service designed to provide information about Guix. At the
moment, this initial prototype gathers up information about packages,
the associated metadata and derivations.
The initial primary use case is to compare two different revisions of
Guix, detecting which packages are new, no longer present, updated or
otherwise different.
It's based on the Mumi project.
[1]: https://git.elephly.net/software/mumi.git