* doc/infra-handbook.org (Repairing a non-bootable Guix System via a PXE booted image): Describe procedure as done via the BIOS instead of the iDRAC web page.
12 KiB
#:TITLE Guix Infrastructure Handbook
This handbook is intended for sysadmin volunteers taking care of the infrastructure powering the Guix website, substitutes and other services offered via https://guix.gnu.org/.
The different machines involved are registered in the file:../hydra/machines.rec file.
Berlin
Berlin is the main machine, which hosts the website (https://guix.gnu.org/), the MUMI issue tracker (https://issues.guix.gnu.org/), runs the build farm (https://ci.guix.gnu.org/) and serves the cached substitutes. It is graciously provided by the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) and hosted at their datacenter in Berlin, hence its name.
Specifications
Dell PowerEdge R7425 server with the following specifications:
- 2x AMD EPYC 7451 24-Core processors
- Storage Area Network (SAN) of 100 TiB
- SAN connected to two QLogic QLE2692 16G Fibre Channel adapters (qla2xxx)
- PERC 730p RAID/HBA disk controller with 8 slots
- 2x 1 TB hard drives in a RAID 1 configuration (attached to the PERC)
- 188 GiB of memory
The machine can be remotely administered via iDRAC, the Dell server management platform.
Its configuration is defined in file:../hydra/berlin.scm. Berlin has a machine intended to become a fallback, known as node 129, which is deployed from Berlin via the deploy file: file:../hydra/deploy-node-129.scm.
Boot device
The PowerEdge R7425 firmware works best in UEFI mode. The boot device is made of two 931 GB rotational disks attached to the PERC controller card and configured in RAID 1. It holds the UEFI partition as well as another partition for /boot. It is made necessary because the SAN is not visible to GRUB.
SSH access to Berlin and node 129
The following ~/.ssh/config
snippets can be defined to access the
Berlin machine:
Host berlin
HostName berlin.guix.gnu.org
DynamicForward 8022
ForwardAgent yes
The DynamicForward
on port 8022 will be explained in the iDRAC web
access section below, while ForwardAgent
is useful to have your
agent credentials used to deploy to node 129 from Berlin available.
For node 129, you can use:
Host hydra-guix-129
HostName 141.80.181.41
DynamicForward 8022
iDRAC web page access
The Dell iDRAC management suite offers a web site to easily do actions such as rebooting a machine, changing parameters or simply checking its current status. The iDRAC page of Berlin can be accessed at https://141.80.167.225, while node 129's page can be accessed at https://141.80.167.229. Because the iDRAC web interface can only be accessed locally from the MDC, it is necessary to configure some HTTP proxy. This can be accomplished via OpenSSH's SOCKS proxy support. For it to work, two things are needed:
- A
DynamicForward
directive on your SSH host, as shown in the snippets from the above /guix/maintenance/src/commit/8eea4b83119e581a62cd4b20cbb382504acc19ee/doc/SSH%20access%20to%20Berlin%20and%20node%20129 section. - A proxy auto-configuration (PAC) file to configure your browser to relay requests to specific domains to through the SOCKS proxy.
For GNU IceCat, the PAC file can be defined as below, and placed for
example at ~/.mozilla/proxy.pac
. Then you should navigate to the
IceCat Settings -> General -> Network Settings (completely at the
bottom), and tick the "Automatic proxy configuration URL" checkbox,
inputting the PAC file URI in the associated text box, e.g.:
/home/maxim/.mozilla/proxy.pac. Click the "Reload" button to
have it effective.
function FindProxyForURL(url, host) {
if (isInNet(dnsResolve(host), "141.80.167.0", "255.255.255.0")) {
return "SOCKS localhost:8022; DIRECT";
} else {
return "DIRECT";
}
}
After that, navigating to https://141.80.167.229 should display the
iDRAC login page, as long as you have an active connection to either
berlin
or hydra-guix-129
.
iDRAC serial console access to Berlin
iDRAC also provides access to a server's serial console, which can be
very handy to debug boot problems (before an SSH server is available).
The iDRAC main console interfaces reachable per specific IPs private
to the MDC network, so it is necessary to proxy jump through Berlin or
node 129 to reach them, as shown in the ~/.ssh/config
configuration
snippets below:
Host hydra-guix-129-idrac
ProxyJump berlin
HostName 141.80.167.229
User guix
Host berlin-idrac
ProxyJump hydra-guix-129
HostName 141.80.167.225
User guix
You may notice that we don't proxy jump through berlin itself to access its iDRAC interface, because this wouldn't work in case berlin is not currently running. For the same reason, the iDRAC interface of node 129 is reached by proxy jumping through berlin.
After having connected to the iDRAC interface, the serial console can
be entered by typing the console com2
command at the racadm>> ~
prompt. To exit, press ~C-\
.
Repairing a non-bootable Guix System via a PXE booted image
One way to fix a non-bootable Guix System is to boot a different GNU/Linux system and mount the partitions and make changes to them. This is made possible for Berlin and node 129 by having their boot mode fallback to a network (PXE) boot, and using the serial console to navigate the boot menus.
Pressing F12 as suggested during the boot to reach PXE doesn't seem to
work. The most reliable way I've found is to change the Boot
Settings
in a persistent fashion by entering the System Setup (F2) at
boot:
System Setup
-
System BIOS
-
Boot Settings
- UEFI Boot Settings
-
Leave only the PXE Device checkbox enabled, then press ESC, ESC, ESC, Yes, OK, ESC and YES to save and exit. The PXE boot typically succeeds on the second reboot, which it attempts automatically after failing once.
The images are made available by the MDC infrastructure team via
Cobbler , and only a few of the images available are bootable (sadly,
Guix System is not one of them). One image which works and has Btrfs
support is "Ubuntu-22.04-server-amd64". Upon selecting that entry and
pressing RET, a sub-menu should appear, containing
"Ubuntu-22.04-server-amd64-GuixFarm". Before booting it, you need to
adjust its 'clinux' kernel arguments at the GRUB menu boot to add
console=ttyS0,115200
in order to see the serial output. There is a
convenient way to turn on SSH at the installer screen, which you can
connect to from the hydra-guix-129
or berlin
machines.
You can then mount the file systems and modify /boot/grub/grub.cfg
or anything. If you need to reconfigure the machine, you can refer
to: info:guix#Chrooting to chroot into an existing system, except
you'll need to use the
--substitute-urls=https://bordeaux.guix.gnu.org
to avoid blocking on
attempting to fetch substitutes from https://ci.guix.gnu.org, in vain.
If the reconfiguration hangs, you may also need to use --no-grafts
.
To allow connecting to a root shell from a remote machine
(e.g. berlin
), set the PermitRootLogin
to yes
in
/etc/ssh/sshd_config
and set a password for the root
user via the
passwd
command, then systemctl restart sshd
.
Scribbled Notes
To replicate node-129
's file system under /mnt
, use:
mount -o subvol=@root /dev/mapper/mpathb /mnt mount -o subvol=@cache /dev/mapper/mpathb /mnt/var/cache mount -o subvol=@home /dev/mapper/mpathb /mnt/home mount /dev/sda3 /mnt/boot/ mount /dev/sda2 /mnt/boot/efi mount /dev/sdb2 /mnt/boot/efi2/
Btrfs file system
Due to not being susceptible to the EXT4 inodes exhaustion problem and offering zstd compression which can almost double the actual capacity of a storage device at little computation cost, Btrfs is the current file system of choice for GNU/Linux-based Guix System build machines.
Btrfs compression and mount options
To get the most out of Btrfs, enabling zstd compression is
recommended. When using RAID arrays, it can also be useful to use the
degraded
mount option, otherwise the RAID could fail to assemble at
boot if any drive part of the array has a problem. Here's an alist of
recommended mount options, taken from
file:../hydra/deploy-node-129.scm for a build machine when high
availability is preferred over data safety (degraded):
(define %common-btrfs-options '(("compress" . "zstd")
("space_cache" . "v2")
"degraded"))
Btrfs balance mcron job
To ensure it operates without manual intervention, a balance job
should run periodically to ensure the unallocated space (a
Btrfs-specific concept) remains in check with the actual free space.
Otherwise, the system could report ENOSPC
even when common utilities
such as df -h
report plenty of free space. To view the amount of
available unallocated space, the btrfs filesystem usage /
can be
used.
The following mcron job example is taken from the file:../hydra/deploy-node-129.scm machine configuration:
(define btrfs-balance-job
;; Re-allocate chunks which are using less than 5% of their chunk
;; space, to regain Btrfs 'unallocated' space. The usage is kept
;; low (5%) to minimize wear on the SSD. Runs at 5 AM every 3 days.
#~(job '(next-hour-from (next-day (range 1 31 3)) '(5))
(lambda ()
(system* #$(file-append btrfs-progs "/bin/btrfs")
"balance" "start" "-dusage=5" "/"))
"btrfs-balance"))
Problems/solutions knowledge base
The boot fails with kernel panick on qla2xxx-related errors
Here's an example:
[ 51.266790] Call Trace: [ 51.266792] <TASK> [ 51.266794] _raw_spin_lock_irqsave+0x46/0x60 [ 51.266799] qla2xxx_dif_start_scsi_mq+0x2b7/0xe60 [qla2xxx 124f4fec4ef588623af420625c6af8b5bcce53fd] [ 51.266823] qla2xxx_mqueuecommand+0x222/0x2d0 [qla2xxx 124f4fec4ef588623af420625c6af8b5bcce53fd] [ 51.266838] qla2xxx_queuecommand+0x1a1/0x3d0 [qla2xxx 124f4fec4ef588623af420625c6af8b5bcce53fd] [ 51.266852] scsi_queue_rq+0x390/0xc00 [ 51.266857] __blk_mq_try_issue_directly+0x176/0x1e0 [ 51.266861] blk_mq_plug_issue_direct.constprop.0+0x93/0x180 [ 51.266865] blk_mq_flush_plug_list+0x23d/0x2a0 [ 51.266868] __blk_flush_plug+0xed/0x130 [ 51.266872] blk_finish_plug+0x31/0x50 [ 51.266874] read_pages+0x1f5/0x300 [ 51.266879] page_cache_ra_unbounded+0x131/0x180 [ 51.266882] force_page_cache_ra+0xc7/0x100 [ 51.266885] page_cache_sync_ra+0x34/0x90 [ 51.266887] filemap_get_pages+0x127/0x700 [ 51.266893] filemap_read+0xde/0x420 [ 51.266898] blkdev_read_iter+0xbd/0x1e0 [ 51.266901] new_sync_read+0x13e/0x1c0 [ 51.266905] vfs_read+0x151/0x1a0 [ 51.266908] ksys_read+0x73/0xf0 [ 51.266911] __x64_sys_read+0x1e/0x30 [ 51.266913] do_syscall_64+0x60/0xc0 [ 51.266919] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 51.266922] RIP: 0033:0x4e73de [ 51.266924] Code: 0f 1f 40 00 48 c7 c2 bc ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 [ 51.266926] RSP: 002b:00007ffc403f39e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 51.266928] RAX: ffffffffffffffda RBX: 0000000001a98738 RCX: 00000000004e73de [ 51.266929] RDX: 0000000000000100 RSI: 0000000001a98748 RDI: 0000000000000006 [ 51.266930] RBP: 0000000001a51bc0 R08: 0000000001a98720 R09: 0000000001a3ef10 [ 51.266932] R10: 0000000000000007 R11: 0000000000000246 R12: 000009ffffffe000 [ 51.266933] R13: 0000000000000100 R14: 0000000001a98720 R15: 0000000001a51c10 [ 51.266936] </TASK> [ 54.246148] NMI watchdog: Watchdog detected hard LOCKUP on cpu 64
Solution: This is indicative of a device failure part of the backing devices of the SAN (Storage Area Network) array. Ensure multipath is in use to mount the SAN (TBD), which adds resiliency to this problem, and report the problem to Ricardo Wurmus/the SIMB infrastructure department.