Commit Graph

45 Commits

Author SHA1 Message Date
Ludovic Courtès 0a9776e57d
remote-worker: Use a separate GC root directory.
* src/cuirass/scripts/remote-worker.scm (cuirass-remote-worker): Change
‘%gc-root-directory’.
2023-10-25 20:11:18 +02:00
Ludovic Courtès 476324286b
remote-worker: Periodically delete old GC roots.
With commit 55af0f70c0, GC roots created
by ‘cuirass remote-worker’ would no longer be deleted (unless it’s
running on the same machine as ‘cuirass remote-server’).

* src/cuirass/scripts/remote-worker.scm (cuirass-remote-worker): Add
call to ‘spawn-gc-root-cleaner’.
* src/cuirass/base.scm (delete-old-gc-roots): Add #:check-database? and
honor it.
(spawn-gc-root-cleaner): Likewise.
2023-10-25 20:10:39 +02:00
Ludovic Courtès 505ce3f8dc
Switch from SRFI-11 to SRFI-71.
* src/cuirass/base.scm, src/cuirass/http.scm,
src/cuirass/scripts/remote-worker.scm, src/cuirass/zabbix.scm: Use
SRFI-71 instead of SRFI-11.
2023-10-25 19:09:40 +02:00
Ludovic Courtès 2eb3e13580
remote-worker: Discard log anytime ‘send-log’ throws.
* src/cuirass/scripts/remote-worker.scm (run-build): Discard build logs
when ‘send-log’ throws, no matter which exception is thrown.  Improve
logging.
2023-10-25 18:29:21 +02:00
Ludovic Courtès 4cc37f540c
remote-worker: Ignore the return value of ‘build-derivations&’.
* src/cuirass/scripts/remote-worker.scm (run-build): Ignore the return
value of ‘build-derivations&’ since it’s always #t.
2023-10-25 18:28:20 +02:00
Ludovic Courtès ab3265bad0
store: Remove ‘%gc-root-ttl’ parameter.
This is a followup to 55af0f70c0.

* src/cuirass/store.scm (%gc-root-ttl): Remove.
* src/cuirass/scripts/register.scm (cuirass-register): Remove references
to ‘%gc-root-ttl’.
* src/cuirass/scripts/remote-server.scm (%options): Warn about ‘--ttl’
having no effect.  Remove reference to ‘%gc-root-ttl’.
* src/cuirass/scripts/remote-worker.scm (%options): Warn about ‘--ttl’
having no effect.  Remove reference to ‘%gc-root-ttl’.
2023-10-21 18:44:30 +02:00
Ludovic Courtès f08cd30afb
remote-worker: Using ‘ceiling-quotient’ for build parallelism.
* src/cuirass/scripts/remote-worker.scm (worker-management-thunk): Use
‘ceiling-quotient’ instead of ‘quotient’.
2023-09-28 14:39:59 +02:00
Ludovic Courtès 980ef61098
remote-worker: Statically determine build process parallelism.
Previously, build jobs would use the default #:max-build-jobs
and #:build-cores specified by guix-daemon.  This would typically lead
each worker to use as many cores as available, leading to unreasonable
over-commitment.

With this change, each worker is assigned a fraction of the build cores.
Because it’s a static policy, it may lead to resource waste, but avoids
the problem mentioned above.

* src/cuirass/scripts/remote-worker.scm (run-build): Add #:parallelism
and pass it to ‘set-build-options*’.
(run-command): Add #:parallelism and pass it to ‘run-build’.
(start-worker): Add #:parallelism and pass it to ‘run-command’.
(worker-management-thunk): Pass #:parallelism to ‘start-worker’.
* src/cuirass/remote.scm (set-build-options*): Add #:build-cores and
pass it to ‘set-build-options’, along with #:max-build-jobs.
2023-09-26 15:20:52 +02:00
Ludovic Courtès c9914f90ea
remote-worker: Clarify ‘--systems’ argument.
* src/cuirass/scripts/remote-worker.scm (show-help): Clarify SYSTEMS.
2023-09-22 00:14:45 +02:00
Ludovic Courtès b0f93551bc
Move store and GC helpers from (cuirass base) to (cuirass store).
* src/cuirass/base.scm (default-gc-root-directory, %gc-root-directory)
(%gc-root-ttl, gc-roots, gc-root-expiration-time)
(register-gc-root, register-gc-roots)
(non-blocking-port, ensure-non-blocking-store-connection)
(with-store/non-blocking, process-build-log, build-derivations&): Move
to…
* src/cuirass/store.scm: … here.  New file.
* src/cuirass/scripts/remote-server.scm: Adjust accordingly.
* src/cuirass/scripts/remote-worker.scm: Likewise.
* src/cuirass/scripts/register.scm: Likewise.
* Makefile.am (dist_pkgmodule_DATA): Add ‘src/cuirass/store.scm’.
2023-09-13 19:00:12 +02:00
Ludovic Courtès b09737967b
remote-server: Specify 'system' in build request messages.
Currently 'remote-worker' doesn't actually use this information, but i
can't hurt.

* src/cuirass/scripts/remote-server.scm (read-worker-exp): Pass #:system
to 'build-request-message'.
* src/cuirass/scripts/remote-worker.scm (run-command): Display SYSTEM in
log message.
2023-08-24 10:54:14 +02:00
Ludovic Courtès dfd1ff4582
remote-worker: Sleep after reporting low disk space.
Fixes a regression introduced in
0dbd460cf1 whereby 'cuirass remote-worker'
would keep spinning and displaying "warning: low disk space" when that
condition is met.

* src/cuirass/scripts/remote-worker.scm (start-worker): Call 'sleep' on
low disk space.
2023-08-23 12:57:23 +02:00
Ludovic Courtès bb659b02ad
remote-worker: Browse Avahi services in a separate thread.
Fixes a regression introduced in
de8586080e, whereby 'cuirass
remote-worker' would block forever in 'avahi-browse-service-thread'
because nobody would get the message on MANAGEMENT-CHANNEL.

* src/cuirass/scripts/remote-worker.scm (cuirass-remote-worker): Wrap
'avahi-browse-service-thread' in 'call-with-new-thread'.
2023-08-22 22:26:43 +02:00
Ludovic Courtès 0dbd460cf1
remote-worker: Sleep only after a 'no-build response.
* src/cuirass/scripts/remote-worker.scm (run-command): Move 'no-build
clause to...
(start-worker): ... here.  Sleep only after 'no-build'.
2023-07-15 15:48:41 +02:00
Ludovic Courtès de8586080e
remote-worker: Fiberize.
This turns 'cuirass remote-worker' into a fiberized program instead of a
multi-process program (previously 'cuirass remote-worker' would create
one child process per actual "worker").

* src/cuirass/remote.scm (send-log): Pass SOCK_CLOEXEC | SOCK_NONBLOCK
to 'socket'.  Remove 'select' call.
* src/cuirass/scripts/remote-worker.scm (spawn-worker-ping): Replace
'call-with-new-thread' by 'spawn-fiber'.
(start-worker): Replace 'primitive-fork' by 'spawn-fiber'.
(worker-management-thunk): New procedure.
(%worker-pids, add-to-worker-pids!): Remove.
(signal-handler): Adjust accordingly.
(cuirass-remote-worker): Define 'management-channel'.  Spawn
a fiber running 'worker-management-thunk'.  Create workers by sending
message to MANAGEMENT-CHANNEL.
2023-07-14 01:38:10 +02:00
Ludovic Courtès 9f3f625c1c
remote-worker: Add missing argument to 'log-info'.
Fixes a regression introduced in
b498ff8f75.

* src/cuirass/scripts/remote-worker.scm (cuirass-remote-worker): Add
missing argument to 'log-info'.
2023-07-13 17:35:33 +02:00
Ludovic Courtès 22ce3b9de2
remote-worker: Clarify that an empty bytevector is expected when connecting.
* src/cuirass/scripts/remote-worker.scm (start-worker)[read-server-info]:
Replace 'empty' variable with a literal empty bytevector.
2023-07-05 16:18:29 +02:00
Ludovic Courtès 445198e2a0
remote: Simplify interface to send and receive messages.
This hides serialization/deserialization, assembly of message parts, and
the actual send/receive operation behind 'send-message' and
'receive-message'.

* src/cuirass/remote.scm (zmq-remote-address)
(zmq-message-string, zmq-read-message): Remove.
(send-message, receive-message): New procedures.
* src/cuirass/remote.scm (build-request-message):
(no-build-message, build-started-message)
(build-failed-message, build-succeeded-message)
(worker-ping, worker-ready-message)
(worker-request-work-message)
(worker-request-info-message, server-info-message): Remove 'format'
call and return an sexp instead.
* src/cuirass/scripts/remote-server.scm (read-worker-exp):
Add #:peer-address.  Change 'msg' to 'sexp'.
(need-fetching?): Remove call to 'zmq-read-message'.  Remove
inappropriate use of 'else' keyword.
(run-fetch): Remove call to 'zmq-read-message'.  Use 'receive-message'
instead of 'zmq-message-receive*' & co.
(zmq-start-proxy): Use 'receive-message' and 'send-message' instead of
'zmq-message-receive*', 'zmq-message-send' & co.  Pass #:peer-address to
'read-worker-exp'.
* src/cuirass/scripts/remote-worker.scm (run-command): Remove call to
'zmq-read-message'.
(spawn-worker-ping)[ping]: Use 'send-message'.
(start-worker): Use 'send-message' and 'receive-message' instead of
the whole shebang.
2023-07-01 00:11:02 +02:00
Ludovic Courtès 1e5b87b0a6
remote: Remove 'zmq-' prefix from our own message bindings.
* src/cuirass/remote.scm (zmq-build-request-message)
(zmq-no-build-message, zmq-build-started-message)
(zmq-build-failed-message, zmq-build-succeeded-message)
(zmq-worker-ping, zmq-worker-ready-message)
(zmq-worker-request-work-message, zmq-worker-request-info-message):
Strip 'zmq-' prefix from the name.
(zmq-server-info): Rename to...
(server-info-message): ... this.
* src/cuirass/scripts/remote-server.scm: Adjust accordingly.
* src/cuirass/scripts/remote-worker.scm: Likewise.
(worker-ping): Rename to...
(spawn-worker-ping): ... this.
2023-06-29 22:36:17 +02:00
Ludovic Courtès b498ff8f75
remote: Add more logging.
* src/cuirass/scripts/remote-server.scm (read-worker-exp)[update-worker!]:
Add 'log-debug' call.
* src/cuirass/scripts/remote-worker.scm (start-worker): Add 'log-info'
calls.
(cuirass-remote-worker): Likewise.
* tests/remote.scm (start-worker, start-server): Set 'CUIRASS_LOGGING_LEVEL'.
2023-06-29 11:44:56 +02:00
Ludovic Courtès 480ae5440d
Show the right program name in '--version'.
* src/cuirass/scripts/remote-server.scm (%options): Fix program name
passed as a 'show-version-and-exit' argument.
* src/cuirass/scripts/remote-worker.scm (%options): Likewise.
2023-06-28 23:41:56 +02:00
Ludovic Courtès 9fb6f21d29 remote-worker: Prevent non-local exits in child processes.
Previously, a non-local exit (such as an uncaught exception) in the
child process would cause it to execute the same code as its parent.

* src/cuirass/scripts/remote-worker.scm (start-worker): Wrap child body
in 'dynamic-wind'.
2022-11-23 15:23:52 +01:00
Mathieu Othacehe 1341725f2c
remote-worker: Increase the request period to 30 seconds
* src/cuirass/scripts/remote-worker.scm (%request-period): Increase it to
reduce the server pressure.
2022-11-20 18:32:44 +01:00
Mathieu Othacehe fc1641381d
remote-worker: Prevent a dead-hang on server disconnection.
This is a follow-up of 1fb4b0ac12 that tried to
work around the remote-worker hangs by introducing a non-blocking read.

This solution was problematic because when the server is unresponsive, the
request-work requests are queued on the worker. When the server is back
online, the requests were all sent to server.

Use instead the ZMQ_PROBE_ROUTER option that causes the server to send an
empty boostrap message to the worker when a connection is established. This
empty message will unlock the workers that were hanging on the request-work
response.

* src/cuirass/scripts/remote-server.scm (zmq-start-proxy): Set the
ZMQ_PROBE_ROUTER option on the build socket.
* src/cuirass/scripts/remote-worker.scm (start-worker): Ignore the bootstrap
message when reading server info however, when receiving a bootstrap message
while waiting for a request-work response, keep going.
2022-11-20 18:31:51 +01:00
Mathieu Othacehe 553f107d37
Revert "remote-worker: Do not block on request-work response."
This reverts commit 1fb4b0ac12.
2022-11-20 18:31:51 +01:00
Mathieu Othacehe 06745e45ad
remote-worker: Use GiB thresholds.
* src/cuirass/scripts/remote-worker.scm (show-help, %options): Adapt them.
(%minimum-disk-space): Define if before %default-options in order to use it to
set the default value. Use a 5GiB threshold because image builds that are
frequently failing due to the lack of space require a lot more than 100MiB.
2022-11-20 18:31:50 +01:00
Mathieu Othacehe 54630d87a1
remote-worker: Fix a typo.
This is a follow-up of: 228b4a4f72.

* src/cuirass/scripts/remote-worker.scm (low-disk-space?): Fix a typo.
2022-11-20 18:16:51 +01:00
Ludovic Courtès 228b4a4f72 remote-worker: Do not request work when disk space is low.
This helps ensure workers don't pick up builds that are likely to fail
due to ENOSPC.

* src/cuirass/scripts/remote-worker.scm (show-help, %options): Add
'--minimum-disk-space' option.
(%default-options): Add 'minimum-disk-space'.
(%minimum-disk-space): New variable.
(low-disk-space?): New procedure.
(start-worker): Call 'request-work' only when 'low-disk-space?' returns #f.
(cuirass-remote-worker): Parameterize %MINIMUM-DISK-SPACE.
2022-11-18 16:32:21 +01:00
Mathieu Othacehe 1fb4b0ac12
remote-worker: Do not block on request-work response.
When the worker sends a request-work message to the server, it then waits
undefinitely for a response. If the server receives the response but dies
before answering, the client can be blocked forever.

* src/cuirass/remote.scm (EAGAIN-safe): New macro.
(zmq-get-msg-parts-bytevector/no-wait): New procedure.
* src/cuirass/scripts/remote-worker.scm (start-worker): Use the above
procedure not to wait the server response undefinitely.
2022-11-12 11:50:51 +01:00
Mathieu Othacehe b9c36654cc
remote-worker: Catch send-log errors.
* src/cuirass/scripts/remote-worker.scm (run-build): If the worker was not
able to send the build logs, report it, dump the build logs them and keep
things going.
2021-12-17 15:07:29 +01:00
Mathieu Othacehe 4c2b45216e
Introduce log levels.
* src/cuirass/logging.scm (log-info, log-debug, log-warning, log-error): New procedures.
* src/cuirass/base.scm: Introduce log levels.
* src/cuirass/database.scm: Ditto.
* src/cuirass/http.scm: Ditto.
* src/cuirass/metrics.scm: Ditto.
* src/cuirass/notification.scm: Ditto.
* src/cuirass/remote.scm: Ditto.
* src/cuirass/scripts/register.scm: Ditto.
* src/cuirass/scripts/remote-server.scm: Ditto.
* src/cuirass/scripts/remote-worker.scm: Ditto.
* src/cuirass/scripts/web.scm: Ditto.
* src/cuirass/utils.scm: Ditto.
* src/cuirass/watchdog.scm: Ditto.
2021-12-06 14:15:41 +01:00
Mathieu Othacehe da377832ce
remote-worker: Use the log-message procedure. 2021-12-01 12:55:03 +01:00
Mathieu Othacehe 14b21fa06b
remote-worker: Include worker name to logs.
* src/cuirass/scripts/remote-worker.scm (run-command): Include worker name to
logs.
2021-11-29 13:57:52 +01:00
Mathieu Othacehe bee8156fe3
remote-worker: Fix typo.
* src/cuirass/scripts/remote-worker.scm (run-command): Fix typo.
2021-11-29 13:52:50 +01:00
Mathieu Othacehe 545a19765d
remote-worker: Fix typo.
* src/cuirass/scripts/remote-worker.scm (start-worker): Fix typo.
2021-11-29 13:52:07 +01:00
Mathieu Othacehe 5bde9287b2
remote-worker: Fix typo.
* src/cuirass/scripts/remote-worker.scm (start-worker): Fix typo.
2021-11-29 13:50:36 +01:00
Mathieu Othacehe 6a2263c80a
remote-worker: Add more logs.
* src/cuirass/scripts/remote-worker.scm: Add more logs.
2021-11-29 13:36:08 +01:00
Mathieu Othacehe 91e8b2ec2c
remote-worker: Add a substitutes-urls option.
This allows to select the substitutes-urls that the remote-worker should use.

* src/cuirass/remote.scm (set-build-options*): Take a list of substitutes-urls
as argument.
* src/cuirass/scripts/remote-server.scm (add-to-store): Adapt it.
* src/cuirass/scripts/remote-worker.scm (%options, %default-options): Add a
new substitutes-urls option.
(%substitute-urls): New parameter.
(run-build): If the remote-server uses its own publish server, add it to the
list of substitute servers, otherwise only use the provided substitute
servers.
(cuirass-remote-worker): Honor the substitutes-urls argument.
* doc/cuirass.texi (Invoking the cuirass remote-worker): Document it.
2021-08-12 14:16:54 +02:00
Mathieu Othacehe 830817aaac
remote-server: Add a no-publish argument.
* src/cuirass/scripts/remote-server.scm (%options, %default-options): Add a
no-publish argument.
(show-help): Document it.
(cuirass-remote-server): Honor it.
* src/cuirass/scripts/remote-worker.scm (start-worker): Do not call
publish-url if the publish-port is false.
* src/cuirass/remote.scm (avahi-service->server): Ditto.
* doc/cuirass.texi (Invokint the cuirass remote-server): Document it.
2021-08-12 10:54:34 +02:00
Mathieu Othacehe bfea9ab4b9
remote-worker: Create the GC root directory.
* src/cuirass/scripts/remote-worker.scm (remote-worker): Create the GC root directory.
2021-06-03 11:58:00 +02:00
Mathieu Othacehe 6d7cb31931
remote-worker: Lower the default TTL to 1d.
* src/cuirass/scripts/remote-worker.scm (%default-options): Lower the default
TTL to 1d.
2021-05-26 10:23:31 +02:00
Mathieu Othacehe df2e945005
remote-worker: Add a TTL argument.
Add a TTL argument and use it to register GC roots for the successfully built
items.

* src/cuirass/scripts/remote-worker.scm (show-help): Add a TTL argument.
(%options): Ditto.
(%default-options): Ditto.
(run-build): Register GC roots for the successfully built derivation outputs.
(remote-worker): Add a TTL argument.
2021-05-19 09:49:38 +02:00
Mathieu Othacehe f97bf6b75f
worker: Fix REQUEST_PERIOD read.
* src/cuirass/scripts/remote-worker (%request-period): Fix period read.
2021-04-14 12:00:11 +02:00
Mathieu Othacehe b645f4eb0c
Add remote building tests. 2021-03-22 18:29:07 +01:00
Mathieu Othacehe 43d29317d9
Use a single Cuirass binary. 2021-03-22 18:29:06 +01:00
Renamed from src/cuirass/remote-worker.scm (Browse further)