Reserve some capacity to process revisions which are the tip of a branch. This
should reduce the time between new revisions appearing, and then being
processed.
There are some revisions of Guix which take forever to process (or days at
least). To avoid jobs being processed forever, kill them after they've been
running for a while (default 24 hours).
This should speed up processing new revisions, reduce latency between finding
out about new revisions and processing them, as well as help manage memory
usage, by processing each job in a process that then exits.
This allows easily processing an individual job by id. This may be useful to
use manually, but also when processing jobs in parallel, as forking doesn't
work well with the libpq library used by squee.
This is working towards running the jobs in parallel. Each job looks at the
records in the database, and adds missing ones. If other jobs, running in
different transactions insert the same missing records at the same time, this
could cause an error.
Therefore, to just avoid this problem, lock before inserting the data. This
will allow the jobs to be processed in parallel, and it shouldn't have too
much of an effect on performance, as the slow bit is outside of the
transaction.
This is in preparation for running jobs in parallel. The channels code in Guix
uses a cached copy of the Git repository. Multiple jobs can't concurrently
access this without causing issues, so use an advisory lock to ensure that
only one job is using the repository at a time.
To better separate the code that needs to happen after a lock has been
acquired to allow concurrently loading revisions without concurrent insertion
issues.
Previously, the query for the jobs page was really slow, as it checked the
load_new_guix_revision_job_log_parts table for each job, doing a sequential
scan through the potentially large table.
Adding an index didn't seem to help, as the query planner would belive the
query could return loads of rows, where actually, all that needed checking is
whether a single row existed with a given job_id.
To avoid adding the index to the load_new_guix_revision_job_log_parts table,
and fighting with the query planner, this commit changes the
load_new_guix_revision_job_logs table to include a blank entry for jobs which
are currently being processed. This is inserted at the start of the job, and
then updated at the end to combine and replace all the parts.
This all means that the jobs page should render quickly now.
Replace the Guile-side HTML escaping with a less complete, but hopefully
faster PostgreSQL side HTML escaping approach.
Also, allow reading part of the log, by default, the last 1,000,000
characters, as this should render quickly.
So that it can easily be shown through the web interface. There's two tables
being used. One which temporarily stores the output as it's output while the
job is running, and other which stores the whole log once the job has
finished.
It isn't very useful, as you have to know the commits. Now that there's Git
branch information, it should be possible to access this in a more useful way.
To the compare/derivations page. Previously, the compare/derivations page was
comparing more than the derivations, notably the package metadata. This change
avoids that, and also reduces the information overload on the compare page.
Conventionally, you'd process the oldest job in a queue, but at the moment, it
would be more useful to have recent information more promptly, and fill in the
historical gaps later. I'm not sure this'll always be the case, but for now,
flip the order in which jobs are processed.
So that they can also be used for the /branch/foo/latest-processed-revision
pages. The content is the same, but the title, link, and some of the links on
the page are different.