7.5 KiB
Free and Happy Users — Reproducible and User-Controlled Package Management in HPC with GNU Guix
- Introduction
- Common HPC Practice Hinders Reproducibility
- Related Work
- Functional Package Managers
- Use Cases
- Limitations & Challenges
- Conclusion
- References
- COMMENT Emacs stuff
Introduction
Idea
- HPC cluster distros are old, not customizable, and sysadmin policy is often conservative ("don't touch anything")
- users resort to manual or "overlay" package management -> badly affects reproducibility
- solutions being developed don't guarantee reproducibility, prevent sharing among users, which in turn prevents reproducibility
- use of Guix helps maximize reproducibility while allowing sharing among users, both on the same cluster on in different environments
- we present concrete use cases
-
define reproducibility:
- for one user at different times
- among users of a cluster at a given time
- among users on different machines, at different times
Common HPC Practice Hinders Reproducibility
-
reproducible research approaches don't address this
- "A Universal Identifier for Computational Results" doesn't take reproducibility into account
- "Effective Reproducible Research with Org-Mode and Git" (Section 4) suggests either a vague approach (record version numbers of a few packages) or a heavyweight approach (use VMs)
- some give up too easily: cf. Reproducibility responsibilities in the HPC arena
-
single HPC users may find themselves unable to reproduce their own results
- sysadmin-managed "modules", can disappear/change anytime
- thus, keeping binaries in $HOME isn't enough
-
cluster users can hardly share with other users on the same cluster
- copy binaries? fragile, and prevents customization
- hard to have shared deployments
-
HPC users can hardly share the software with other clusters
- different cluster means different software (compilers, libraries, etc.)
- makes it difficult to assess the impact of a given architecture on the software
Related Work
historical package managers
-
dpkg, apt, rpm, etc.
- not flexible (hard to write or customize recipes)
- binaries uploaded by maintainers (optional)
- package databases cannot be composed in the case of rpm (see email from Ricardo)
unprivileged package management & customizability
- GNU Stow
-
Conda & EasyBuild
- co-existence of versions/builds via different prefixes
- EasyBuild: lots of efforts spent on ad hoc module naming convention
- recipes are Python objects, eases customization
- EasyBuild is mostly for "HPC support teams"
-
- highly customizable from the command line + API
- thought with HPC-style usage in mind
- uses external tools, which prevents reproducibility (example)
- Gitian, reproducible.debian.net
- Vesta
- Google's Bezel
at the other end of the spectrum: VMs & containers
-
full deployments: G5K/Kadeploy
- slow, does not compose
- imposes a significant burden on users: choosing and preparing complete images
-
VMs (cf. "SHARE: a web portal for creating and sharing executable research papers")
- heavyweight
- unsuitable for HPC
-
containers (Docker et al.)
- unavailable to unprivileged users in HPC
- heavyweight, coarse-grain, not composable
- yet: "Immutable Application Containers" by "HPC Advisory Council" (cf. http://doc.qnib.org/HPCAC2015.pdf)
- vulnerabilities at DockerHub
Functional Package Managers
- Nix & Guix
-
reproducibility is goal #1
- containers, etc.
- (cite that paper on influence of env. vars on OpenMP performance?)
- reproducibility implies free software
- usual stuff
Use Cases
-
reproducible profiles: different levels
- symbolic: 'guix package –manifest'
- binary: 'guix archive –export'
-
reproducibility
- users choose when and what to upgrade
- whole DAG can be saved/restored anytime
- contrast with sysadmin-managed packages & modules
-
storage resources shared
- contrast with Spack, EasyBuild, etc.
- build environment tightly controlled (containers)
- rollback, etc.
-
different levels of reproducibility:
- exact: a specific branch of Guix
- symbolic: an externally-maintained package set
- workflow: publish Guix branch or external package set
-
deployment of complex stacks
- example1: bioinfo at U. Berlin
- example2: OpenLab deliverable D.29 (GNUnet)
- describe specific challenges…
-
deployment + customization of the software stack (off topic?)
- example: MORSE
- 12+ actively developed, tightly-integrated packages
- people want to be able to the specific part they work on (hwloc, StarPU, solver, etc.) while still being able to deploy the whole stack
- makes it easy to assess the performance impact of a specific part of the stack (e.g., StarPU)
- 'guix environment'
-
"active/executable papers" (?)
- integrate with Skribilo or Org-mode (cf. "The Collage Authoring Environment", 2011)
Limitations & Challenges
- needs to be installed by cluster sysadmin
-
remaining sources of non-determinism
- cpuid, /proc/cpuinfo, etc.
- profile-driven optimization
- build system non-determinism ("make -j" with broken makefiles)
- non-determinism due to scheduling (cf. "Determinism and Reproducibility in Large-Scale HPC Systems")
-
numerical library tuning (ATLAS, etc.)
- configured on the build machine, which may undermine reproducibility (see above)
- binaries become non-portable
- tweaking the recipe of say, ATLAS, means rebuilding a large part of the DAG
-
no proprietary software
- common in HPC (GPUs, linear algebra)
- but this is a strength: reproducible science cannot be built on black boxes, and experimentation needs the ability to fiddle with the software
- no "virtual dependencies" like "mpi", "runtime system" à la Spack
- no command-line interface (yet) to tweak the DAG à la Spack
-
software "archeology" is limited
- reusing specific, old versions of compilers or libraries means rewriting those recipes (they may have never existed in Guix itself since it's relatively young)
-
use of Guix on all cluster nodes?
- daemon, substitutes, network access, etc.
- numerical reproducibility? (cf. "Designing Bit-Reproducible Portable High-Performance Applications")
Conclusion
- functional package management & Guix make users happy
References
COMMENT Emacs stuff
LocalWords: reproducibility workflow