bootstrappable/notes.org

234 lines
12 KiB
Org Mode
Raw Permalink Normal View History

2016-12-15 16:33:35 +01:00
* Problems
- depending on binaries for building compilers and/or build systems is bad for trust
- the scale of the problem is getting larger as time passes: we started with just having GCC as a problem, but as time passed we got more and more new languages that are self-hosted (i.e. need an older version of themselves to be built).
- having a long chain of builds means that build systems have to be kept alive in modern environments.
- Go has its own linker, so to keep the bootstrap working patches have to be backported
- later versions of GCC are written in C++
- if youre going to maintain an old version GCC just for a bootstrap chain youll have to backport new architectures. This means that our bootstrap chain is forever tied to x86 (PowerPPC users cannot bootstrap on their own).
- generated C code is not source — it may make portability possible, but its not trustworthy
- bootstrapping is not a *functional* feature, so the value isnt immediately obvious (very much like reproducible builds)
- the whole toolchain has bootstrapping issues (including linker and kernel…)
* Ideas
** Consensus
- languages with multiple implementations are great, because diversity makes bootstrapping easier. For single-implementation languages we need alternative ways to get started (even if its inefficient)
** Rather than depend on more binary blobs, throw more CPU time at it, e.g. by emulating an x86 CPU with qemu and then work from there.
** Need to reach out to compiler developers: make sure that theres a non-self-hosted path to build the first compiler — find cooperative people in compiler projects to “bootstrap” a bootstrapping project
** try to depend only on the smallest C compiler possible
- e.g. [[https://www.landley.net/code/tinycc/][tinycc]], [[http://pcc.ludd.ltu.se/][pcc]], [[https://www.landley.net/qcc/][qcc]]
2016-12-15 16:33:35 +01:00
- coreboot folks have a simple C compiler RAMCC(?)
** register https://bootstrappable.org, collect stories there!
2016-12-15 16:33:35 +01:00
*** motivation: collect examples of backdoored compilers
- toy example: https://manishearth.github.io/blog/2016/12/02/reflections-on-rusting-trust/
- ken thompson: reflections on trusting trust
- "Defending Against Compiler-Based Backdoors"
https://blog.regehr.org/archives/1241
2016-12-15 16:33:35 +01:00
- PoC||GTFO
https://www.alchemistowl.org/pocorgtfo/pocorgtfo08.pdf
- TODO: need more!
*** Examples
- e.g. the bootstrap chain for GCC > GCJ > IcedTea 6 > IcedTea7 in Guix
- GNU Make doesnt need make but just a bash script
- Cook
- Guile Scheme: includes an interpreter written in C.
- Bazel needs Bazel to build itself, *but* you can build a minimal variant of Bazel with a shell script that runs =javac= on all Java sources, etc
- Ant, needs itself but can be build with plain Java
*** best practices
2016-12-15 16:33:35 +01:00
- dont throw old code away (to allow for a bootstrapping chain)
- have an alternative implementation backend (e.g. written in C or in a language that traces back to C eventually) — simplifies porting
*** Call for Action!
- target it to different audiences: “if youre a compiler writer, do this…”, “if youre a free software dev, consider …”
* Notes for the manifesto
- Don't give "bad" examples, since we don't want to piss off upstreams. Only give "good" examples.
* Homepage / overview of problem
(short summary from the problem - 2-3 concise sentences based on the Problems section?)
(try to clarify what bootstrapping actually is)
To have trust in our computing platforms, we should be able to follow the
bootstrapping process - how each part was produced from source - to then feel
confident it is built on good foundations.
(more detail on the intended outcomes and benefits)
(1. trust/security - most powerful/appealing motivation, mention this one first)
We want to draw attention to the need for an auditable, repeatable process for
bootstrapping programming languages, compilers, pieces of the toolchain and whole
distributions.
(2. easier porting (new platforms? languages?) - secondary benefit, important but less people are interested)
Another benefit would be that it becomes easier to port these things to new hardware
platforms.
(Motivation / benefits could become a separate section of it gets too big)
Compilers are often written in the language they are compiling.
This creates a chicken-and-egg problem that leads users and
distributors to rely on opaque, pre-built binaries of those
compilers that they use to build newer versions of the compiler.
We believe that those opaque binaries are a threat to user
security and user freedom since they are not auditable; we believe
the amount of bootstrap binaries should be minimized.
* Best Practices (incl. examples, success stories?)
** For compiler writers...
If you're working on a compiler that is written in a language other than
the one it's compiling, you're all set!
If your compiler is written in the language that it's compiling
(“self-hosted”), it probably falls in one of the following categories.
If other implementations of this programming language exist, please make
sure your compiler can be built with one of these. Examples include:
- The Go programming language has two implementations: [[https://golang.org/][the reference
implementation]] is self-hosted, and that in [[https://gcc.gnu.org][GCC]] is written in C++.
(TODO: check if we can build one with the other) Furthermore,
version 1.4 of the reference implementation was written in a
different language and can be used to build version 1.5.
- Common Lisp has several implementations. Notably [[http://www.clisp.org/][GNU clisp]] is
written and C and can be used to build self-hosted implementations
such as [[http://www.sbcl.org/][SBCL]].
If your compiler targets a language for which no other implementation
exists, then please consider maintaining a (minimal) implementation of
the language written in a different language. Most likely this
implementation exists, or existed at the point the programming language
was created. Maintaining this alternate implementation has a cost;
however, this cost should be minimal if this alternate implementation is
used routinely to build the compiler, and if this implementation is kept
simple—it does not need to be optimized.
Examples include:
- [[https://www.gnu.org/software/guile/][GNU Guile]], a Scheme implementation with a self-hosted compiler,
relies on an [[https://git.savannah.gnu.org/cgit/guile.git/tree/libguile/eval.c][Scheme interpreter written in C]] for bootstrapping
2016-12-15 16:33:35 +01:00
purposes.
Please let us know if youd like to add your compiler to this list!
** For build systems writers...
Build systems sometimes have chicken-and-egg problems: they may
need a version of themselves to get built. If you are developing
a build system, this can be avoided. We recommend that you
provide an alternative way to build your build system.
Examples include:
- [[https://www.gnu.org/software/make/][GNU Make]] does not require a make implementation. It can be built
using a [[https://git.savannah.gnu.org/cgit/make.git/tree/build.template][shell script]].
- [[https://ant.apache.org/][Apache Ant]] can bootstrap with a [[https://git-wip-us.apache.org/repos/asf?p=ant.git;a=blob;f=bootstrap.sh;h=60b6ece03ce78716bc036a44226f4934b541f326;hb=HEAD][shell script]]
2016-12-15 16:33:35 +01:00
that only relies on the Java compiler.
- [[https://bazel.build/][Bazel]] does not require Bazel to build itself but
2019-09-09 02:17:07 +02:00
can be bootstrapped with a [[https://github.com/bazelbuild/bazel/blob/master/compile.sh][shell script]].
2016-12-15 16:33:35 +01:00
- [[https://buckbuild.com/][Buck]] does not require Buck to build itself. Instead, it can be
built using [[https://github.com/facebook/buck/blob/master/build.xml][Ant]].
Build system, compared to compiler, do not need to write a full
language compiler of its language to bootstrap. A really slow and
unefficient build written in shell script or another older
build system (Ant, GNU Make) can generate a minimal version of the
build system to bootstrap a complete version of it.
** For distros
It is unavoidable that distributions use some binaries as part of
their bootstrap chain. However, distributions should endeavour to
provide traceacibility and automated reproducibility for such
binaries. This means that:
- It should be clear where the binary came from and how it was
produced.
- Users can reproduce the binary to verify that it has not been
tampered with.
For example, a distribution might use a binary package of GCC to build
GCC from source. This bootstrap binary is in most cases built from a
previous revision of the distribution's GCC package. Thus, the
distribution can label the binary with something like "this package
was built by running <command> on revision <hash> of the
distribution's package repository." A user can then easily reproduce
the binary by fetching the specified sources and running the specified
command. This build will in most cases depend on a previous generation
of bootstrap binaries. Thus, we get a chain of verifiable bootstrap
binaries stretching back in time.
Bootstrap binaries may also come from upstream. This would typically
be the case when a language is first added to a distribution. In this
case, it may not be obvious how the binary can be reproduced, but the
distribution should at least clearly label the provenance of the
binary, e.g. "this binary was downloaded from
https://upstream-compiler.example.org/upstream-compiler-20161211-x86_64-linux.tar.xz".
TODO: provide an example of how we do this / are going to do this in
Nixpkgs / Guix / ...?
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=062134985802d85066418f6ee2f327122166a567
2016-12-15 16:33:35 +01:00
* Collaboration projects
** Continued maintenance of the GNU Compiler for Java (GCJ)
Until recently the latest Java Development Kit (JDK) could be
bootstrapped in a chain starting with GCJ (the GNU Compiler for Java)
and the IcedTea build system. GCJ was deleted from the GNU Compiler
Collection in October 2016, so it is now unclear how to bootstrap the
JDK in future. To ensure that the JDK can be built from sources
without the need for an existing installation of the OpenJDK we
propose to continue maintaining GCJ.
** Collectively maintaining GCC 4.7
The C and C++ compilers of the GNU Compiler Collection make up the
foundation of many free software distributions. Current versions of
GCC are written in C++, which means that a C++ compiler is needed to
build it from source. GCC 4.7 was the last version of the collection
that could be built with a plain C compiler, a much simpler task. We
propose to collectively maintain a subset of GCC 4.7 to ensure that we
can build the foundation of free software distributions starting with
a simple C compiler (such as tinyCC, pcc, etc).
* Who are we? Which projects are participating
* Buy-in
This is nice, but what are the actual benefits of “bootstrappable”
implementations?
** For users
As a user, bootstrappable implementations, together with [[https://reproducible-builds.org][reproducible
builds]], provide confidence that you are running the code you expect to
be running. Its source code is auditable by the developer community,
which in turns provides reassurance that the code youre running does
not have backdoors.
** For distributors
Bootstrappable implementations provide clear provenance tracking: the
dependency graph of your distribution packages shows how each binary was
obtained.
** For developers
If you are a compiler writer, making your compiler bootstrappable from a
different language will simplify the development process (no need to
carry large pre-built binaries around). It will also make it easier to
port the compiler to a different platform for which no bootstrap
binaries exist yet.
* Next Steps
Try building gcc using gcc-4.7 <-- this already works (we used GCC 4.7 some months ago in Guix, but updated later for unrelated reasons)
Try building GCC 4.7 with TinyCC