Check that lns_end_height is non-zero, to see if it's valid to process
blocks in the LNS subsystem. Otherwise, the start_height of LNS can
potentially override the start height, but, LNS won't process any of
blocks causing the message that multiple blocks are being processed
into Loki subsystems every block.
The wallet does something funky with the key storage that the
values have changed, even after decrypting the wallet keys. The ed keys are
different from when we originally derived them, so for now, just re-derive
Call service_node_lists's block added hook function manually instead of
hooking the hook. There's a common operation between the 2 subsystems,
Loki Name Service and Service Node List, in which they generate state
based off incoming blocks from the blockchain.
Upon profiling, the slowest factor of this is retrieving the blocks from
the DB in batches of a 1000 rather than the processing of the blocks in
the subsystems. So flatten out the hierarchy so that we can retrieve the
batch and simultaneously process the blocks in both subsystems on the
same fetch request.
A complete removal of the hook system so we have more flexibility at the
callsite can be done in a later PR, to avoid too many unneccesary
changes in the LNS PR.
* Creates a default address for the sweep_all command in the cli wallet
Previously the sweep all command required the user pass in the wallet
address for it to function. This commit allows for no address to be
passed in which case the software will use the same address that the
address command produces.
* Rename variable to addr, only call wallet if needed and update usage statements
* modify locked blocks parameters after making address optional
* removed log file and single sweep description
* updated the help descriptions for the 3 affected commands
Previously the loki-wallet-rpc print function set the bright parameter in set_console_color(int color, bool bright) to true when its emphasis parameter was false. This changes the bright flag so it mirrors the emphasis flag.
This enables optional support for systemd notification which allows
lokid to be run via `Type=notify`, allowing it to better signal status
to systemd and enables systemd watchdog handling to restart if something
goes wrong.
Enabled here are:
- systemd watchdog ping every 10s
- systemd status update every 10s, so that `systemctl status loki-node`
gives you a status line such as:
Status: "Height: 450085, SN: active, proof: 15m12s, storage: 3m7s, lokinet: 27s"
- initialization notification so that systemd can wait for
and report on initialization status rather than just that the process
has launched.
- shutdown notification
All of these require changing the service type to `Type=notify` in the
`[Service]` section of the systemd service file; enabling the watchdog
also requires adding a `WatchdogSec=5min` line in the `[Service]`
section.
The systemd support is optional and requires the libsystemd-dev package
to be built (and is probably not feasible at all for a static build).
cryptonote_protocol_handler calls `pool.get_blink(hash)` while already
holding a blink shared lock, which should have been
`pool.get_blink(hash, true)` to avoid `get_blink` trying to take its own
lock.
That double lock is undefined behaviour and can cause a deadlock on the
mutex, although it appears rare that it actually does. If it does,
however, this eventually backs up into vote relaying during the idle
loop, which then stalls the idle loop so we stop sending out uptime
proofs (since that is also in the idle loop).
A simple fix here is to add the `true` argument, but on reconsideration
this extra argument to take or not take a lock is messy and error prone,
so this commit instead removes the second argument entirely and instead
documents which call must and must not hold a lock, getting rid of the
three methods (get_blink, has_blink, and add_existing_blink) that had
the `have_lock` argument. This ends up having only a small impact on
calling code - the vast majority of callers already hold a lock, and the
few that don't are easily adjusted.
This fixes two issues: first when running with --disable-rpc-long-poll
the long poll isn't present, so the list of pool txes will always be
empty, which means unconfirmed_txs will get cleared on refresh because
they don't appear to be in the node's pool. This makes us likely to
wind up with double spending failures on subsequent txes since we don't
know about the outgoing unconfirmed_tx anymore.
Secondly, it seems like there's a potential race condition here even
when long polling is enabled that can cause the same failure depending
on the timing of polling in the long poll thread (for example, if it
hits the 30s cooldown from a MAX_CONNECTIONS response), so just fixing
this for the no-long-polling case doesn't seem sufficient.
The previous (< v6.1.1) code didn't have this issue because the tx
construction and refreshing new pool data were synchronous. (There is a
potential race condition if refresh requests span the node finding a new
block between the block refresh and the pool tx refresh, but that's
already handled in the code by requiring two refreshes before setting it
as unspent).
This reverts the old synchronous fetch-pool-txes behaviour on refresh so
that there is no window of opportunity with long polling for us to
prematurely treat unconfirmed_txs as failed/unspent.
set_daemon got changed to set the long polling daemon, but then this
later got removed but the extra logic (now unnecessary) remained. This
(mostly) reverts it back to the simpler pre-rpc-long-poll code.
The lock here also looked later than it should be: elsewhere it is
guarding changes to m_daemon_address and m_daemon_login but in the
change above it was moved to not guard them anymore.
x25519 -> pubkey cache entries weren't being updated on new but
unchanged proofs, so would eventually expire and cause SNs to stop
authenticating new quorumnet connections as being from SNs.
Updating the checkpoint table in place, something must have been done
incorrectly or some bug, such that querying MDB_LAST on the checkpoint
table returns not the latest expected checkpoint.
Pulling out all the old checkpoints, generating a new table and
reinserting them resolves this.
Handle errors better when long polling is disabled instead of endlessly
spamming logs.
Avoid lock contention when set_daemon is called. Instead of immediately
affecting the long polling thread (which could be engaging the mutex
until RPC timeout, meaning the program stalls for that duration), update
the address on the next iteration of the long polling thread.
Wallets handle daemons that disable long polling better by sleeping.
// NOTE: Compiling with a minimum Mac OSX API of 10.13 and generating
// a binary_archive with a one_shot_read_buffer, causes the following snippet
// to fail
explicit binary_archive(stream_type &s) : base_type(s) {
stream_type::pos_type pos = stream_.tellg();
stream_.seekg(0, std::ios_base::end);
eof_pos_ = stream_.tellg();
stream_.seekg(pos);
}
// In particular
// stream_.seekg(0, std::ios_base::end);
// eof_pos_ = stream_.tellg();
// A seekg followed by a tellg returns 0, no state flags are set on the io
// stream. So use the old method that copies the blob into a stringstream
// instead for APPLE targets.
`tools::wallet2::rpc_long_poll_timeout` was a static member declaration
without a definition, which isn't allowed before C++17 (although can
work depending on compiler optimizations). Adding the definition in
wallet2.cpp isn't really an option (it would make core depend on the
wallet), so just move it to a constexpr static global (which is allowed
without a definition, even before C++17) in `rpc/` instead.
This is really useful for the blink test suite as it lets us trigger a
resync (which normally only runs every 60s). In particular where we
have a (test) situation like this:
A - B - C
where we want to take down B and bring it up again but want to be sure
that new things learned from A get seen right away by C: if B does a
resync with C *before* it does a resync with A then C wouldn't get the
sync updates for a full minute, while if we force B to sync then force C
to sync we can ensure quick propagation for the test suite.
`--regtest` didn't work in some edge cases, this fixes various things:
- the genesis block wasn't accepted because it needed to be v7, not
vMax
- reduce initial uptime proof delay to 5s in regtest mode
- add --regtest flag to the wallet so that it can talk to a daemon in
--regtest mode.
This also adds two new mining options, available via rpc:
- slow_mining - this avoids the RandomX initialization. It is much
slower, but for regtest with fixed difficulty of 1 that is perfectly
fine.
- `num_blocks` - instruct the miner to mine for the given number of
blocks, then stop. (This can overmine if mining with multiple
threads at a low difficulty, but that's fine).
While we're syncing it's not uncommon to receive some mempool blinks
that we can't validate yet: the inputs may refer to outputs that we
don't know about yet, and we may not be able to construct the blink
quorum yet. We don't want to cut off our peers if they sent something
just because we can't handle it yet, so don't drop_connection in such a
case.
If a peer sends something invalid (i.e. a block containing a tx that
conflicts with a blink) we don't want to immediately close it because
the peer may be able to recover by rolling back, but in order to do that
it needs to be able to receive our blinks which (probably) won't happen
if it gets instantly close. So require *two* attempts to close before
we actually close the p2p connection.
This can occur when syncing if we get a blink tx before the blocks that
let us determine the quorum. Just ignore it at this point; we'll pick
it up at the next once-per-minute sync run.
Blink txes were not being properly passed in/out of the RPC wallet.
This adds the necessary bits both to submit a blink and to get a blink
submission status back from the daemon.
This replaces the horrible, horrible, badly misused templated
once_a_time_seconds and once_a_time_milliseconds with a `periodic_task`
that works the same way but takes parameters as constructor arguments
instead of template parameters.
It also makes various small improvements:
- uses std::chrono::steady_clock instead of ifdef'ing platform dependent
timer code.
- takes a std::chrono duration rather than a template integer and
scaling parameter.
- timers can be reset to trigger on the next invocation, and this is
thread-safe.
- timer intervals can be changed at run-time.
This all then gets used to reset the proof timer immediately upon
receiving a ping (initially or after expiring) from storage server and
lokinet so that we send proofs out faster.
Blockchain::prepare_handle_incoming_blocks locks m_tx_pool, but uses a
local RAII lock on the blockchain object itself, then also starts a
batch. Blockchain::cleanup_handle_incoming_blocks then also takes out a
local RAII blockchain lock, then cleans up the batch.
But the lmdb layer is retarded in that it throws an exception if any
thread tries to write to the database while a batch is active in another
thread, and so the blockchain lock is *also* used as a guard writes.
Holding an open batch but *not* holding the blockchain lock then hits
this exception if a write arrives in another thread at just the right
time.
This is, of course, terrible design at multiple layers, but this close
to release I am reluctant to make more drastic fixes.
Other small changes here:
- All the locks in `blockchain.cpp` now use tools::unique_lock or
tools::unique_locks rather than the nasty epee macro. This also
reduces the likelihood of accidental deadlock because this means the
dual txpool-blockchain locks are not taken out simultaneously via
std::lock rather than sequentially.
- Removed a completely useless "if (1)". Git blame shows that there was
previously a condition here, but apparently the upstream monero author
who changed it was afraid of removing the `if` keyword.
- Reduced the sleep in the loop that waits for a batch to 100ms from
1000ms because sleeping for a full second for a fairly light test is
insane.
- boost isn't happy calling boost::lock() on the tx pool or blockchain
object because the lock/unlock/try_lock methods are const, and so the
workaround of using boost::lock because std::lock and
std::shared_time_mutex are broken on the macOS SDK 10.11 that we use
for mac builds now requires extra workarounds. Joy.
The MacOSX 10.11 SDK we use is broken AF: it lies about supporting
C++14, but really only upgraded the headers but not the library itself,
so using std::shared_timed_mutex just results in a linking failure.
Upgrading the SDK is a huge pain (I tried, failed, and gave up), so for
now temporarily switch to boost::shared_mutex until we sort out the
macOS build disaster.
sodium and zmq libs weren't using the same variable name which would end
up linking to system libsodium even when we meant to link to the static
one from contrib/depends.
quorum_vote_t's were serialized as blob data, which is highly
non-portable (probably isn't the same on non-amd64 arches) and broke
between 5.x and 6.x because `signature` is aligned now (which changed
its offset and thus broke 5.x <-> 6.x vote transmission).
This adds a hack to write votes into a block of memory compatible with
AMD64 5.x nodes up until HF14, then switches to a new command that fully
serializes starting at the hard fork (after which we can remove the
backwards compatibility stuff added here).
Checkpoint votes internally use a circular buffer, but that's rather
difficult to read for the `print_sn` output; this changes print_sn to
de-circularize the listed votes.
If a block adding fails (triggering the "Block added hook signalled
failure" error message) the service node list doesn't get reset, which
immediately leads to a bad service node winner (because the winner was
already incremented and not popped off).
This updates it to call the blockchain detached hooks to do the cleanup.
It also changes around loki::defer a little bit to rename the internal
class to `deferred` and make it cancellable (by calling `.cancel()`).
`loki::defer` is repurposed as a free function to get a named `deferred`
object given a lambda, which is needed to be able to call `cancel()` on
it. (The LOKI_DEFER macro still works as is).
The argument order felt wrong: switch it to (NAME, TYPE, FUNC)
Since register_command is meant to be called statically there is little
purpose in being able to accept a non-function-pointer callback (i.e.
direct function or capture-less lambda), so drop using std::function<>'s
for command callbacks to avoid the virtual call overhead.
Changes commands from two types (quorum & public) to three:
- anyone -> service node (SNNetwork::command_type::public_)
- service node -> anyone (SNNetwork::command_type::response)
- service node -> service node (command_type::quorum)
Previously quorum commands could be issued to non-SN nodes, but that
should not be allowed (and the code would dereference a nullptr if that
happened).
macOS's std::lock() is broken in that it internally calls non-namespaced
function `try_lock` leading to an ADL conflict with boost::try_lock when
any of the arguments is a `boost::whatever`. `boost::lock` will do the
job for now.
The m_blockchain lock added in #975 was causing deadlocks because we
have ordered `m_blockchain -> ... -> m_sn_mutex` lock sequences but
`proof.store()` was adding a `m_sn_mutex -> ... -> m_blockchain`
sequence when called when receiving an uptime proof. Fix it by also
taking out an m_blockchain lock where we take out the `m_sn_mutex`.
auto locks = tools::unique_locks(mutex1, mutex2, ...);
gives you a tuple of unique_locks and obtains the locks atomically.
auto lock = tools::unique_lock(lock1);
is essentially the same as:
std::unique_lock<decltype(lock1)> lock{lock1};
but less ugly (and extends nicely to the plural version).
* Simplify and avoid uninitialized value warning
Rearranging/simplifying this code slightly to avoid gcc giving a
possibly-uninitialized value use on the dereference that follows this
changed code.
* More simplification: don't need optional