Algorithm
Rewrite of the core int32/avx2 implementation for (1) higher speed and
(2) reduced memory consumption. Stack allocation is now at most a few
kilobytes, even for gigantic arrays.
Internally, the sorting algorithm is now mostly bitonic to simplify
indexing, although odd-even speedups are still applied when
convenient. Lanes are complemented to take the down-up decision out of
the inner loops.
As in previous djbsort versions, data is sorted first in vector lanes
and then transposed for final merges, reducing the overall number of
vector permutations. Unlike previous versions, transposition is done
in-place. The transposition in this version is bit-reversal on the outer
6 bits (bottom 3 bits and the top 3 bits), but leaves intermediate bits
alone. Non-power-of-2 array sizes are handled by an extra, more
traditional, merge step.
Sizes 2, 3, 4, 5, 6, 7, 8, 16, 32 are now special-cased. Non-power-of-2
sizes below 256 are padded to the next power of 2.
Portable implementations: The out-of-place int32/portable1 and
int32/portable2 implementations are now gone; the in-place
int32/portable3 and int32/portable4 implementations remain.
C API
float32_sort is now supported. The arithmetic in the reduction from
float32 to int32 is int32 31-bit right shift, uint32 1-bit right shift,
xor; this is slightly more efficient than the reduction from float32 to
uint32 from 2001 Herf.
Compiling
Tests now have more variation (without much slowdown): the uint32 test
cases now deviate from int32 in more than the sign; float32 uses
floating-point numbers that aren't integers; int32 does more loops for
small cases, and some larger cases.
Internals
API for 2-input sorting is now MINMAX macro operating on two
inputs in place.
Better inline assembly from Jason Donenfeld for 2-input sorting: more
flexibility in compiler's register allocation.
The package version number is now automatically copied to version.c as
the implementation version number for implementations that don't provide
version.c.
Verification
minmax now supports more peephole optimizations for complemented bitonic
sorting and for padding: xor(s,xor(s,t)) ⇒ t; xor(-1,s) ⇒ invert(s);
Reverse(Reverse(s)) ⇒ s; signedmin(invert(s),invert(t)) ⇒
invert(signedmax(s,t)); signedmax(invert(s),invert(t)) ⇒
invert(signedmin(s,t)); invert(s)[high:low] ⇒ invert(s[high:low]);
s[bits-1:0] ⇒ s; s[high:low][high2:low2] ⇒ s[high2+low:low2+low];
Concat(...)[high:low] ⇒ ...[high-pos:low-pos] when possible;
Reverse(s)[high:low] ⇒ Reverse(s[...]) when possible; eliminate
signedmin/signedmax when one input is the minimum or maximum constant.
verifymany now includes the implementation version number on
verified lines.
machine, not on the machine preparing a binary package." (Also: "The
issues are explained in Section 8 of
https://pqcrypto.eu.org/deliverables/d2.4.pdf.")
For this to work, we install the source tree (with built objects) to
${PREFIX}/share/djbsort. Then we run tests, install to ${PREFIX}/include
and ${PREFIX}/lib, and check the installed files against pseudo-PLIST.
This means pkg_add(1) will fail if no C compiler is present, which is
unusual behavior for pkg_add but perhaps not entirely unreasonable for a
C library.
Bump PKGREVISION.
djbsort is a software library for sorting arrays of integers. It
provides the following features:
- Speed: djbsort holds new speed records for in-memory sorting.
- Security: djbsort is designed to be safe for cryptographic contexts.
- Verification: djbsort includes tools to automatically verify correctness.
These features are not separate options: there is a single sorting
function that is simultaneously fast, secure, and verified.
Packaged for wip by schmonz@