and copy it to the client. This has two benefits:
1) Avoids spamming the master with dozens of md5 processes when
dosetupnode is spawned for all client machines at ocne
2) Avoids silly copy attempts on disconnected nodes for which the file
is copied to itself
portbuild.conf, builds will each be done in a separate swap-backed md.
This dramatically improves build performance since
* Every transaction is not written to disk, so disk bandwidth is not
a bottleneck
* Multiple builds do not contend with each other for the same set of
filesystem locks and other per-device resources
The size of the md devices is controlled by the md_size variable. '2g'
seems to be a good size.
Currently we mdconfig -u each device after each port build, since
otherwise dirty blocks accumulate and the md eventually uses a full
amount of backing store (2g in the above example). This is a problem
if there is unsufficient swap backing to accomodate them all.
XXX This should be made configurable to avoid the performance penalty on
systems that do have enough swap backing
around silly failures in some ports when it is present
* Add a footer to completed builds that is recognised by pdispatch to
retry truncated builds
* If ALWAYS_KEEP_DISTFILES is not present in the environment or port
makefile, then clean up DISTDIR after the build finishes, to prevent
collection of distfiles for this port.
* Finish flipping the switch on -noplistcheck - this is activated by
passing in the NOPLISTCHECK environment variable instead of
using PLISTCHECK in the opposite case
passing in the NOPLISTCHECK environment variable instead of
using PLISTCHECK in the opposite case
* Remove the unused -nodummy function
* Pass in ALWAYS_KEEP_DISTFILES when -distfiles is set
* Finish flipping the switch on -noplistcheck - this is activated by
passing in the NOPLISTCHECK environment variable instead of
using PLISTCHECK in the opposite case
* Always pull in the distfiles from the client if they exist (needed
for forthcoming ports tree changes to satisfy GPL license requirements)
* If the build did not complete "cleanly", e.g. it was interrupted by a
network outage or client machine panic, then retry it until it succeeds
instead of just leaving a dirty truncated log
* Finish flipping the switch on -noplistcheck; set NOPLISTCHECK instead
of PLISTCHECK in the opposite case
* Clean up the distfiles/ directory when starting build, so it is not
contaminated by old distfiles
* Remove commented-out code to don't back up old distfiles/ directory,
it's just too big to do this.
look for processes holding open references within the FS and kill
them, then use regular umount. This is necessary now that devfs
cannot be force-unmounted, and has the benefit that processes can't
hang around holding references to files between port builds.
* Preliminary work to support using ccache to accelerate builds.
processes holding open references within the FS and kill them, then use
regular umount. This is necessary now that devfs cannot be force-unmounted,
and has the benefit that processes can't hang around holding references to
files between port builds.
look for processes holding open references within the FS and kill
them, then use regular umount. This is necessary now that devfs
cannot be force-unmounted, and has the benefit that processes can't
hang around holding references to files between port builds.
* Reduce possibility for error by testing for presence of executable
ldconfig inside the chroot before attempting to run it (e.g. it may not
be there if the chroot was not completely initialized)
fetch from ftp-master and pointyhat; they'll just get timeouts.
Instead, each machine is expected to set up their own MASTER_SITE_*
variables in etc/make.conf via a bindist-${hostname}.tar file.
Approved by: portmgr (self)
on a disconnected client, without running the time-consuming rsyncs.
This is useful when a build is interrupted and needs to be restarted.
* After we have cleaned up the machine, reset the queue counter by using
pollmachine -queue. This has a race condition if other builds are being
dispatched to the machine (e.g. builds on another branch):
getmachine can claim a directory and increment the counter, then the
machine is polled and finds e.g. 0 chroots in use, and resets the
counter to 0, then claim-chroot is run and the build dispatched, with
the counter now off-by-one. This could be fixed by running
claim-chroot with the .lock held, but this turns out to be too
time-consuming. A two-level lock approach might also fix this
efficiently.
same time, assuming that the admin has already built the INDEX and
INDEX.old in advance.
* Adapt to new method of calculating build concurrency, by summing the
value of ${maxjobs} listed in every portbuild.${machine}
* Support 5-exp builds
(i.e. if the package lists a dependency on the relevant package in the
PACKAGE_BUILDING case). This allows packages that require an
available DISPLAY to again build (with some forthcoming fixes to
existing ports).
Improve the reporting of detected filesystem anomalies (extra files
left behind after deinstallation, changes to and removal of
pre-existing files)
synchronously instead of probabilistically scheduling jobs, which
means that the job load on a machine never exceeds a desired
threshold, and we can preferentially use faster machines when they are
available. This has a dramatic effect on package build throughput,
although I don't yet have precise measurements of the performance
improvements.
Specifically, the changes are:
* Introduce the new variable maxjobs in portbuild. This replaces the
build scheduling weights previously listed in the mlist file, which
now changes format to list the build machines only, ranked in order of
preference for job dispatches (i.e. faster machines first).
* The ${arch}/queue directory is used to list machines available for
jobs (file content is the number of jobs currently running on the
machine). Changes to files in this directory are serialized using
lockf on the .lock file.
* Claim a machine with the getmachine script, with the .lock held.
This picks the machine with the fewestnumber of jobs running, which is
listed highest in the mlist file in case of multiple machines with
equal load. The job counter is incremented, and the file removed if
the counter reaches ${maxjobs} for that machine. If all machines are
busy, sleep for 15 seconds and retry.
* After we have claimed a machine, we run claim-chroot on it to claim
an empty chroot, as before. If the claim fails, release the job from
the queue with the releasemachine script and retry after a 15 second
wait.
* When the build is finished, decrement the job counter with the
releasemachine script, with .lock held.
* The checkmachines script now exists only to poll the load averages
for admin convenience (every 2 minutes), and to ping for unreachable
machines. When a machine cannot be reached, remove the entry in the
queue directory to stop further job dispatches to it. This needs more
work to deal with reinitialization of machines after they become
available again.
synchronously instead of probabilistically scheduling jobs, which
means that the job load on a machine never exceeds a desired
threshold, and we can preferentially use faster machines when they are
available. This has a dramatic effect on package build throughput,
although I don't yet have precise measurements of the performance
improvements.
Specifically, the changes are:
* Introduce the new variable maxjobs in portbuild. This replaces the
build scheduling weights previously listed in the mlist file, which
now changes format to list the build machines only, ranked in order of
preference for job dispatches (i.e. faster machines first).
* The ${arch}/queue directory is used to list machines available for
jobs (file content is the number of jobs currently running on the
machine). Changes to files in this directory are serialized using
lockf on the .lock file.
* Claim a machine with the getmachine script, with the .lock held.
This picks the machine with the fewestnumber of jobs running, which is
listed highest in the mlist file in case of multiple machines with
equal load. The job counter is incremented, and the file removed if
the counter reaches ${maxjobs} for that machine. If all machines are
busy, sleep for 15 seconds and retry.
* After we have claimed a machine, we run claim-chroot on it to claim
an empty chroot, as before. If the claim fails, release the job from
the queue with the releasemachine script and retry after a 15 second
wait.
* When the build is finished, decrement the job counter with the
releasemachine script, with .lock held.
* The checkmachines script now exists only to poll the load averages
for admin convenience (every 2 minutes), and to ping for unreachable
machines. When a machine cannot be reached, remove the entry in the
queue directory to stop further job dispatches to it. This needs more
work to deal with reinitialization of machines after they become
available again.
Additional changes to this file:
* Exit if passed a null package name, to avoid badness later on
* Send a nag-mail if pkg-plist errors are detected in the build
/rescue/mount -t linprocfs, so assume that the i386 build hosts have
statically-built copies of the necessary binaries in /sbin, until this is
fixed.
Create /usr/X11R6 inside the chroot so that mtree has something to do, since
this directory is otherwise orphaned.