With commit 55af0f70c0, GC roots created
by ‘cuirass remote-worker’ would no longer be deleted (unless it’s
running on the same machine as ‘cuirass remote-server’).
* src/cuirass/scripts/remote-worker.scm (cuirass-remote-worker): Add
call to ‘spawn-gc-root-cleaner’.
* src/cuirass/base.scm (delete-old-gc-roots): Add #:check-database? and
honor it.
(spawn-gc-root-cleaner): Likewise.
This is a followup to 55af0f70c0.
* src/cuirass/store.scm (%gc-root-ttl): Remove.
* src/cuirass/scripts/register.scm (cuirass-register): Remove references
to ‘%gc-root-ttl’.
* src/cuirass/scripts/remote-server.scm (%options): Warn about ‘--ttl’
having no effect. Remove reference to ‘%gc-root-ttl’.
* src/cuirass/scripts/remote-worker.scm (%options): Warn about ‘--ttl’
having no effect. Remove reference to ‘%gc-root-ttl’.
Previously, build jobs would use the default #:max-build-jobs
and #:build-cores specified by guix-daemon. This would typically lead
each worker to use as many cores as available, leading to unreasonable
over-commitment.
With this change, each worker is assigned a fraction of the build cores.
Because it’s a static policy, it may lead to resource waste, but avoids
the problem mentioned above.
* src/cuirass/scripts/remote-worker.scm (run-build): Add #:parallelism
and pass it to ‘set-build-options*’.
(run-command): Add #:parallelism and pass it to ‘run-build’.
(start-worker): Add #:parallelism and pass it to ‘run-command’.
(worker-management-thunk): Pass #:parallelism to ‘start-worker’.
* src/cuirass/remote.scm (set-build-options*): Add #:build-cores and
pass it to ‘set-build-options’, along with #:max-build-jobs.
Currently 'remote-worker' doesn't actually use this information, but i
can't hurt.
* src/cuirass/scripts/remote-server.scm (read-worker-exp): Pass #:system
to 'build-request-message'.
* src/cuirass/scripts/remote-worker.scm (run-command): Display SYSTEM in
log message.
Fixes a regression introduced in
0dbd460cf1 whereby 'cuirass remote-worker'
would keep spinning and displaying "warning: low disk space" when that
condition is met.
* src/cuirass/scripts/remote-worker.scm (start-worker): Call 'sleep' on
low disk space.
Fixes a regression introduced in
de8586080e, whereby 'cuirass
remote-worker' would block forever in 'avahi-browse-service-thread'
because nobody would get the message on MANAGEMENT-CHANNEL.
* src/cuirass/scripts/remote-worker.scm (cuirass-remote-worker): Wrap
'avahi-browse-service-thread' in 'call-with-new-thread'.
This turns 'cuirass remote-worker' into a fiberized program instead of a
multi-process program (previously 'cuirass remote-worker' would create
one child process per actual "worker").
* src/cuirass/remote.scm (send-log): Pass SOCK_CLOEXEC | SOCK_NONBLOCK
to 'socket'. Remove 'select' call.
* src/cuirass/scripts/remote-worker.scm (spawn-worker-ping): Replace
'call-with-new-thread' by 'spawn-fiber'.
(start-worker): Replace 'primitive-fork' by 'spawn-fiber'.
(worker-management-thunk): New procedure.
(%worker-pids, add-to-worker-pids!): Remove.
(signal-handler): Adjust accordingly.
(cuirass-remote-worker): Define 'management-channel'. Spawn
a fiber running 'worker-management-thunk'. Create workers by sending
message to MANAGEMENT-CHANNEL.
This hides serialization/deserialization, assembly of message parts, and
the actual send/receive operation behind 'send-message' and
'receive-message'.
* src/cuirass/remote.scm (zmq-remote-address)
(zmq-message-string, zmq-read-message): Remove.
(send-message, receive-message): New procedures.
* src/cuirass/remote.scm (build-request-message):
(no-build-message, build-started-message)
(build-failed-message, build-succeeded-message)
(worker-ping, worker-ready-message)
(worker-request-work-message)
(worker-request-info-message, server-info-message): Remove 'format'
call and return an sexp instead.
* src/cuirass/scripts/remote-server.scm (read-worker-exp):
Add #:peer-address. Change 'msg' to 'sexp'.
(need-fetching?): Remove call to 'zmq-read-message'. Remove
inappropriate use of 'else' keyword.
(run-fetch): Remove call to 'zmq-read-message'. Use 'receive-message'
instead of 'zmq-message-receive*' & co.
(zmq-start-proxy): Use 'receive-message' and 'send-message' instead of
'zmq-message-receive*', 'zmq-message-send' & co. Pass #:peer-address to
'read-worker-exp'.
* src/cuirass/scripts/remote-worker.scm (run-command): Remove call to
'zmq-read-message'.
(spawn-worker-ping)[ping]: Use 'send-message'.
(start-worker): Use 'send-message' and 'receive-message' instead of
the whole shebang.
* src/cuirass/scripts/remote-server.scm (%options): Fix program name
passed as a 'show-version-and-exit' argument.
* src/cuirass/scripts/remote-worker.scm (%options): Likewise.
Previously, a non-local exit (such as an uncaught exception) in the
child process would cause it to execute the same code as its parent.
* src/cuirass/scripts/remote-worker.scm (start-worker): Wrap child body
in 'dynamic-wind'.
This is a follow-up of 1fb4b0ac12 that tried to
work around the remote-worker hangs by introducing a non-blocking read.
This solution was problematic because when the server is unresponsive, the
request-work requests are queued on the worker. When the server is back
online, the requests were all sent to server.
Use instead the ZMQ_PROBE_ROUTER option that causes the server to send an
empty boostrap message to the worker when a connection is established. This
empty message will unlock the workers that were hanging on the request-work
response.
* src/cuirass/scripts/remote-server.scm (zmq-start-proxy): Set the
ZMQ_PROBE_ROUTER option on the build socket.
* src/cuirass/scripts/remote-worker.scm (start-worker): Ignore the bootstrap
message when reading server info however, when receiving a bootstrap message
while waiting for a request-work response, keep going.
* src/cuirass/scripts/remote-worker.scm (show-help, %options): Adapt them.
(%minimum-disk-space): Define if before %default-options in order to use it to
set the default value. Use a 5GiB threshold because image builds that are
frequently failing due to the lack of space require a lot more than 100MiB.
This helps ensure workers don't pick up builds that are likely to fail
due to ENOSPC.
* src/cuirass/scripts/remote-worker.scm (show-help, %options): Add
'--minimum-disk-space' option.
(%default-options): Add 'minimum-disk-space'.
(%minimum-disk-space): New variable.
(low-disk-space?): New procedure.
(start-worker): Call 'request-work' only when 'low-disk-space?' returns #f.
(cuirass-remote-worker): Parameterize %MINIMUM-DISK-SPACE.
When the worker sends a request-work message to the server, it then waits
undefinitely for a response. If the server receives the response but dies
before answering, the client can be blocked forever.
* src/cuirass/remote.scm (EAGAIN-safe): New macro.
(zmq-get-msg-parts-bytevector/no-wait): New procedure.
* src/cuirass/scripts/remote-worker.scm (start-worker): Use the above
procedure not to wait the server response undefinitely.
* src/cuirass/scripts/remote-worker.scm (run-build): If the worker was not
able to send the build logs, report it, dump the build logs them and keep
things going.
This allows to select the substitutes-urls that the remote-worker should use.
* src/cuirass/remote.scm (set-build-options*): Take a list of substitutes-urls
as argument.
* src/cuirass/scripts/remote-server.scm (add-to-store): Adapt it.
* src/cuirass/scripts/remote-worker.scm (%options, %default-options): Add a
new substitutes-urls option.
(%substitute-urls): New parameter.
(run-build): If the remote-server uses its own publish server, add it to the
list of substitute servers, otherwise only use the provided substitute
servers.
(cuirass-remote-worker): Honor the substitutes-urls argument.
* doc/cuirass.texi (Invoking the cuirass remote-worker): Document it.
* src/cuirass/scripts/remote-server.scm (%options, %default-options): Add a
no-publish argument.
(show-help): Document it.
(cuirass-remote-server): Honor it.
* src/cuirass/scripts/remote-worker.scm (start-worker): Do not call
publish-url if the publish-port is false.
* src/cuirass/remote.scm (avahi-service->server): Ditto.
* doc/cuirass.texi (Invokint the cuirass remote-server): Document it.
Add a TTL argument and use it to register GC roots for the successfully built
items.
* src/cuirass/scripts/remote-worker.scm (show-help): Add a TTL argument.
(%options): Ditto.
(%default-options): Ditto.
(run-build): Register GC roots for the successfully built derivation outputs.
(remote-worker): Add a TTL argument.