Fixes a bug whereby the sender ID bytevector returned by
‘receive-message’ could be modified behind our back (when the C msg
object is reclaimed) by the time it is passed as #:recipient to
‘send-message’.
As a result, ‘zmq-send’ would be passed an invalid peer ID and would
thus silently drop the message; the other end, ‘cuirass remote-worker’,
would wait for a reply to its ‘worker-request-work’ message that would
never come, thus remaining idle forever.
The bug has been there most likely forever, though it might have become
more frequent with the refactoring in
445198e2a0.
Thanks to Nicolas Dandrimont for putting me on the right track!
* src/cuirass/remote.scm (receive-message): Call ‘bytevector-copy’ on
the result of ‘zmq-message-content’.
* src/cuirass/scripts/remote-server.scm (zmq-start-proxy): Set
ZMQ_ROUTER_MANDATORY on BUILD-SOCKET.
Fixes a regression introduced in
8fa6bcf1b9.
* src/cuirass/base.scm (jobset-registry): Honor SPEC’s period when it is
non-zero, passing it to ‘spawn-jobset-monitor’.
This is a followup to c4743b5472.
* src/cuirass/database.scm (with-db-worker-thread): Rename to…
(with-db-connection): … this.
Adjust users.
* src/cuirass/metrics.scm: Likewise.
That way, the /api/queue endpoint returns something that matches actual
scheduling.
* src/cuirass/database.scm (db-get-builds): Adjust ordering for
‘status+submission-time’.
Previously, build jobs would use the default #:max-build-jobs
and #:build-cores specified by guix-daemon. This would typically lead
each worker to use as many cores as available, leading to unreasonable
over-commitment.
With this change, each worker is assigned a fraction of the build cores.
Because it’s a static policy, it may lead to resource waste, but avoids
the problem mentioned above.
* src/cuirass/scripts/remote-worker.scm (run-build): Add #:parallelism
and pass it to ‘set-build-options*’.
(run-command): Add #:parallelism and pass it to ‘run-build’.
(start-worker): Add #:parallelism and pass it to ‘run-command’.
(worker-management-thunk): Pass #:parallelism to ‘start-worker’.
* src/cuirass/remote.scm (set-build-options*): Add #:build-cores and
pass it to ‘set-build-options’, along with #:max-build-jobs.
* src/cuirass/scripts/remote-server.scm (%fetch-queue-size): Remove.
(fetch-worker): Define ‘queue-size’ and spawn a fiber to log its size.
(spawn-periodic-updates-fiber): Remove reference to ‘%fetch-queue-size’.
* src/cuirass/scripts/remote-server.scm (zmq-fetch-workers-endpoint)
(zmq-fetch-worker-socket, start-fetch-worker): Remove.
(fetch-worker, spawn-fetch-worker): New procedures.
(zmq-start-proxy): Add ‘fetch-worker’ parameter. Remove
‘fetch-socket’. Send message on ‘fetch-worker’ instead of
‘fetch-socket’.
(cuirass-remote-server): Call ‘spawn-fetch-worker’ only once. Pass the
result to ‘zmq-start-proxy’.
Previously, ‘close-port’ could throw, leading the calling fiber to
terminate prematurely.
* src/cuirass/remote.scm (send-log): Ignore errors when closing
COMPRESSED from the ‘catch’ handler.
Previously, builds would all get the default priority, meaning that jobset
priority would effectively ignored when scheduling builds via
‘db-get-pending-build’.
* src/cuirass/scripts/evaluate.scm (user-alists->builds): Change
‘specification-name’ parameter to ‘spec’. Set the ‘priority’ field of <build>.
(inferior-evaluation): Adjust call accordingly.
Previously, for workers supporting multiple systems, this would pick a
system at random and return #f, even if pending builds are available for
one of the other systems supported by the worker. Thus, it would
practically divide throughput by N for a worker supporting N systems.
* src/cuirass/scripts/remote-server.scm (random-seed, shuffle): New
procedures.
(pop-build): Change to return a build for *any* of the systems supported
by WORKER.
* src/cuirass/base.scm (%build-remote?): Remove.
(restart-builds): Add BUILDER argument; sent it the build request.
(build-packages): Remove.
(local-builder, spawn-local-builder)
(remote-builder, spawn-remote-builder): New procedures.
(jobset-evaluator): Add BUILDER argument. Inline some of what was in
‘build-packages’. Send build request to BUILDER.
(spawn-jobset-evaluator): Add #:builder and honor it.
* src/cuirass/scripts/register.scm (cuirass-register): Remove reference
to ‘%build-remote?’. Cap THREADS at 8. Define ‘builder’ and pass it to
‘spawn-jobset-evaluator’ and ‘restart-builds’.
* src/cuirass/base.scm (start-evaluation, jobset-evaluator)
(spawn-jobset-evaluator): New procedures.
(jobset-monitor): Add #:evaluator. Replace inline evaluation with a
message to EVALUATOR.
(spawn-jobset-monitor): Add #:evaluator and pass honor it.
(jobset-registry, spawn-jobset-registry): Likewise.
* src/cuirass/scripts/register.scm (cuirass-register): Call
‘spawn-jobset-evaluator’ and pass it to ‘spawn-jobset-registry’.
* src/cuirass/http.scm (url-handler): Add 'bridge' parameter.
In "/admin/specification/add" route, write to BRIDGE.
(run-cuirass-server): Add #:bridge-socket-file-name. When true, open
connection to the bridge. Pass it to 'url-handler'.
* tests/http.scm ("cuirass-run"): Pass #:bridge-socket-file-name to
'run-cuirass-server'.
Fixes a bug introduced in c445d2d642 where
<build-product> records would end up being passed to ‘scm->json’.
* src/cuirass/http.scm (build->hydra-build): Convert <build-product>
records to alists.
Division by zero could happen in the unlikely case of a successful
evaluation containing zero jobs.
* src/cuirass/templates.scm (specifications-table)[summary->percentage]:
Check whether TOTAL is zero before dividing.
Previously we’d pass #f to ‘eval-summary’ for jobsets that do not have
any successful evaluation.
* src/cuirass/templates.scm (specifications-table): Correctly handle
‘spec->latest-eval-ok’ returning #f.
Previously it would only return checkouts changed compared to the
previous evaluation of that jobset.
* src/cuirass/http.scm (evaluation->json-object): Use ‘latest-checkouts’
instead of ‘evaluation-checkouts’.
* tests/http.scm (evaluations-query-result): Adjust accordingly.
Previously we’d get 500 for anything other than a symbol.
* src/cuirass/http.scm (specification->json-object): Handle build types
other than symbols describing a subset.
* src/cuirass/database.scm (db-get-evaluations-absolute-summary): Extend
SQL query and fill out all the <evaluation-summary> fields.
* tests/database.scm ("db-get-evaluation-absolute-summary"): Check ‘status’.
("db-get-evaluations-absolute-summary"): Expect three results.
Fixes a bug introduced in c445d2d642
whereby we’d call ‘evaluation-badges’ with #f as its first argument, due
to the last of a summary for LAST-EVAL.
* src/cuirass/templates.scm (specifications-table): When
LAST-EVAL-STATUS-OK? is false, call ‘broken-evaluation-badge’.
Fixes a regression introduced in
77bf78ecf7.
* src/cuirass/http.scm (url-handler): In “/search/latest” and
“/search/latest/PRODUCT-TYPE”, use symbols instead of keywords as the
second argument to ‘assoc-ref’.
* examples/random-manifest.scm (random-computed-file): Add ‘dependency’
parameter and honor it.
<top level>: Replace ‘unfold’ call with a loop; pass ‘dependency’
argument to ‘random-computed-file’.