- Change size_needed to expect 1 hash instead of 2. 2 hashes is when
Monero added tx pruning and they include a secondary hash for those
purposes. At Loki we don't quite support pruning yet as blocks are
needed for the Service Node network to operate.
- replace the gross cmake shell script tools being used to create the data file with
cmake code and a single configure_file
- change checkpoint data to string_view
- tools::sha256sum() had two different overloads that did very different
things, yet presents two overloads: one that takes a pointer+size, the
other that takes a string lvalue ref, and yet the string lvalue
reference is *entirely* different (it reads a file). Append _str and
_file suffixes to make it less dangerous.
This adds a static dependency script for libraries like boost, unbound,
etc. to cmake, invokable with:
cmake .. -DBUILD_STATIC_DEPS=ON
which downloads and builds static versions of all our required
dependencies (boost, unbound, openssl, ncurses, etc.). It also implies
-DSTATIC=ON to build other vendored deps (like miniupnpc, lokimq) as
static as well.
Unlike the contrib/depends system, this is easier to maintain (one
script using nicer cmake with functions instead of raw Makefile
spaghetti code), and isn't concerned with reproducible builds -- this
doesn't rebuild the compiler, for instance. It also works with the
existing build system so that it is simply another way to invoke the
cmake build scripts but doesn't require any external tooling.
This works on Linux, Mac, and Windows.
Some random comments on this commit (for preserving history):
- Don't use target_link_libraries on imported targets. Newer cmake is
fine with it, but Bionic's cmake doesn't like it but seems okay with
setting the properties directly.
- This rebuilds libzmq and libsodium, even though there is some
provision already within loki-core to do so: however, the existing
embedded libzmq fails with the static deps because it uses libzmq's
cmake build script, which relies on pkg-config to find libsodium which
ends up finding the system one (or not finding any), rather than the one
we build with DownloadLibSodium. Since both libsodium and libzmq are
faily simple builds it seemed easiest to just add them to the cmake
static build rather than trying to shoehorn the current code into the
static build script.
- Half of the protobuf build system ignores CC/CXX just because Google,
and there's no documentation anywhere except for a random closed bug
report about needing to set these other variables (CC_FOR_BUILD,
CXX_FOR_BUILD) instead, but you need to. Thanks Google.
- The boost build is set to output very little because even the minimum
-d1 output level spams ~15k lines of output just for the headers it
installs.
- updating to latest loki-mq (1.0.0 + various linking fixes)
- BUILD_SHARED_LIBS was being handled very strangely; make it a full
option instead (defaulting to off) that a cmake invoker can specify, as
per cmake recommendations.
- travis ci tweaks/changes:
- Add a static bionic build
- Simplify cmake argument code
- Add `--version` invocation for lokid and loki-wallet-cli to test
that the binaries were linked properly.
- always build an embedded sodium statically; if we do it dynamically
and an older system one exists we are going to have trouble.
- don't force epee and blocks to be static; rather they get controlled
by the above BUILD_SHARED_LIBS, just like all the other internal
libraries.
- use some PkgConfig:: imported targets rather than bunch-of-variables.
The archaic (i.e. decade old) cmake usage here really got in the way of
trying to properly use newer libraries (like lokimq), so this undertakes
overhauling it considerably to make it much more sane (and significantly
reduce the size).
I left more of the architecture-specific bits in the top-level
CMakeLists.txt intact; most of the efforts here are about properly
loading dependencies, specifying dependencies and avoiding a whole pile
of cmake antipatterns.
This bumps the required cmake version to 3.5, which is what xenial comes
with.
- extensive use of interface libraries to include libraries,
definitions, and include paths
- use Boost::whatever instead of ${Boost_WHATEVER_LIBRARY}. The
interface targets are (again) much better as they also give you any
needed include or linking flags without needing to worry about them.
- don't list header files when building things. This has *never* been
correct cmake usage (cmake has always known how to wallet_rpc_headers
the headers that .cpp files include to know about build changes).
- remove the loki_add_library monstrosity; it breaks target names and
makes compiling less efficient because the author couldn't figure out
how to link things together.
- make loki_add_executable take the output filename, and set the output
path to bin/ and install to bin because *every single usage* of
loki_add_executable was immediately followed by setting the output
filename and setting the output path to bin/ and installing to bin.
- move a bunch of crap that is only used in one particular
src/whatever/CMakeLists.txt into that particular CMakeLists.txt instead
of the top level CMakeLists.txt (or src/CMakeLists.txt).
- Remove a bunch of redundant dependencies; most of them look like they
were just copy-and-pasted in, and many more aren't needed (since they
are implied by the PUBLIC linking of other dependencies).
- Removed `die` since it just does a FATAL_ERROR, but adds color (which
is useless since CMake already makes FATAL_ERRORs perfectly visible).
- Change the way LOKI_DAEMON_AND_WALLET_ONLY works to just change the
make targets to daemon and simplewallet rather than changing the build
process (this should make it faster, too, since there are various other
things that will be excluded).
Bockchain:
1. Optim: Multi-thread long-hash computation when encountering groups of blocks.
2. Optim: Cache verified txs and return result from cache instead of re-checking whenever possible.
3. Optim: Preload output-keys when encoutering groups of blocks. Sort by amount and global-index before bulk querying database and multi-thread when possible.
4. Optim: Disable double spend check on block verification, double spend is already detected when trying to add blocks.
5. Optim: Multi-thread signature computation whenever possible.
6. Patch: Disable locking (recursive mutex) on called functions from check_tx_inputs which causes slowdowns (only seems to happen on ubuntu/VMs??? Reason: TBD)
7. Optim: Removed looped full-tx hash computation when retrieving transactions from pool (???).
8. Optim: Cache difficulty/timestamps (735 blocks) for next-difficulty calculations so that only 2 db reads per new block is needed when a new block arrives (instead of 1470 reads).
Berkeley-DB:
1. Fix: 32-bit data errors causing wrong output global indices and failure to send blocks to peers (etc).
2. Fix: Unable to pop blocks on reorganize due to transaction errors.
3. Patch: Large number of transaction aborts when running multi-threaded bulk queries.
4. Patch: Insufficient locks error when running full sync.
5. Patch: Incorrect db stats when returning from an immediate exit from "pop block" operation.
6. Optim: Add bulk queries to get output global indices.
7. Optim: Modified output_keys table to store public_key+unlock_time+height for single transaction lookup (vs 3)
8. Optim: Used output_keys table retrieve public_keys instead of going through output_amounts->output_txs+output_indices->txs->output:public_key
9. Optim: Added thread-safe buffers used when multi-threading bulk queries.
10. Optim: Added support for nosync/write_nosync options for improved performance (*see --db-sync-mode option for details)
11. Mod: Added checkpoint thread and auto-remove-logs option.
12. *Now usable on 32-bit systems like RPI2.
LMDB:
1. Optim: Added custom comparison for 256-bit key tables (minor speed-up, TBD: get actual effect)
2. Optim: Modified output_keys table to store public_key+unlock_time+height for single transaction lookup (vs 3)
3. Optim: Used output_keys table retrieve public_keys instead of going through output_amounts->output_txs+output_indices->txs->output:public_key
4. Optim: Added support for sync/writemap options for improved performance (*see --db-sync-mode option for details)
5. Mod: Auto resize to +1GB instead of multiplier x1.5
ETC:
1. Minor optimizations for slow-hash for ARM (RPI2). Incomplete.
2. Fix: 32-bit saturation bug when computing next difficulty on large blocks.
[PENDING ISSUES]
1. Berkely db has a very slow "pop-block" operation. This is very noticeable on the RPI2 as it sometimes takes > 10 MINUTES to pop a block during reorganization.
This does not happen very often however, most reorgs seem to take a few seconds but it possibly depends on the number of outputs present. TBD.
2. Berkeley db, possible bug "unable to allocate memory". TBD.
[NEW OPTIONS] (*Currently all enabled for testing purposes)
1. --fast-block-sync arg=[0:1] (default: 1)
a. 0 = Compute long hash per block (may take a while depending on CPU)
b. 1 = Skip long-hash and verify blocks based on embedded known good block hashes (faster, minimal CPU dependence)
2. --db-sync-mode arg=[[safe|fast|fastest]:[sync|async]:[nblocks_per_sync]] (default: fastest:async:1000)
a. safe = fdatasync/fsync (or equivalent) per stored block. Very slow, but safest option to protect against power-out/crash conditions.
b. fast/fastest = Enables asynchronous fdatasync/fsync (or equivalent). Useful for battery operated devices or STABLE systems with UPS and/or systems with battery backed write cache/solid state cache.
Fast - Write meta-data but defer data flush.
Fastest - Defer meta-data and data flush.
Sync - Flush data after nblocks_per_sync and wait.
Async - Flush data after nblocks_per_sync but do not wait for the operation to finish.
3. --prep-blocks-threads arg=[n] (default: 4 or system max threads, whichever is lower)
Max number of threads to use when computing long-hash in groups.
4. --show-time-stats arg=[0:1] (default: 1)
Show benchmark related time stats.
5. --db-auto-remove-logs arg=[0:1] (default: 1)
For berkeley-db only. Auto remove logs if enabled.
**Note: lmdb and berkeley-db have changes to the tables and are not compatible with official git head version.
At the moment, you need a full resync to use this optimized version.
[PERFORMANCE COMPARISON]
**Some figures are approximations only.
Using a baseline machine of an i7-2600K+SSD+(with full pow computation):
1. The optimized lmdb/blockhain core can process blocks up to 585K for ~1.25 hours + download time, so it usually takes 2.5 hours to sync the full chain.
2. The current head with memory can process blocks up to 585K for ~4.2 hours + download time, so it usually takes 5.5 hours to sync the full chain.
3. The current head with lmdb can process blocks up to 585K for ~32 hours + download time and usually takes 36 hours to sync the full chain.
Averate procesing times (with full pow computation):
lmdb-optimized:
1. tx_ave = 2.5 ms / tx
2. block_ave = 5.87 ms / block
memory-official-repo:
1. tx_ave = 8.85 ms / tx
2. block_ave = 19.68 ms / block
lmdb-official-repo (0f4a036437)
1. tx_ave = 47.8 ms / tx
2. block_ave = 64.2 ms / block
**Note: The following data denotes processing times only (does not include p2p download time)
lmdb-optimized processing times (with full pow computation):
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 1.25 hours processing time (--db-sync-mode=fastest:async:1000).
2. Laptop, Dual-core / 4-threads U4200 (3Mb) - 4.90 hours processing time (--db-sync-mode=fastest:async:1000).
3. Embedded, Quad-core / 4-threads Z3735F (2x1Mb) - 12.0 hours processing time (--db-sync-mode=fastest:async:1000).
lmdb-optimized processing times (with per-block-checkpoint)
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 10 minutes processing time (--db-sync-mode=fastest:async:1000).
berkeley-db optimized processing times (with full pow computation)
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 1.8 hours processing time (--db-sync-mode=fastest:async:1000).
2. RPI2. Improved from estimated 3 months(???) into 2.5 days (*Need 2AMP supply + Clock:1Ghz + [usb+ssd] to achieve this speed) (--db-sync-mode=fastest:async:1000).
berkeley-db optimized processing times (with per-block-checkpoint)
1. RPI2. 12-15 hours (*Need 2AMP supply + Clock:1Ghz + [usb+ssd] to achieve this speed) (--db-sync-mode=fastest:async:1000).