While we're syncing it's not uncommon to receive some mempool blinks
that we can't validate yet: the inputs may refer to outputs that we
don't know about yet, and we may not be able to construct the blink
quorum yet. We don't want to cut off our peers if they sent something
just because we can't handle it yet, so don't drop_connection in such a
case.
If a peer sends something invalid (i.e. a block containing a tx that
conflicts with a blink) we don't want to immediately close it because
the peer may be able to recover by rolling back, but in order to do that
it needs to be able to receive our blinks which (probably) won't happen
if it gets instantly close. So require *two* attempts to close before
we actually close the p2p connection.
This replaces the horrible, horrible, badly misused templated
once_a_time_seconds and once_a_time_milliseconds with a `periodic_task`
that works the same way but takes parameters as constructor arguments
instead of template parameters.
It also makes various small improvements:
- uses std::chrono::steady_clock instead of ifdef'ing platform dependent
timer code.
- takes a std::chrono duration rather than a template integer and
scaling parameter.
- timers can be reset to trigger on the next invocation, and this is
thread-safe.
- timer intervals can be changed at run-time.
This all then gets used to reset the proof timer immediately upon
receiving a ping (initially or after expiring) from storage server and
lokinet so that we send proofs out faster.
The MacOSX 10.11 SDK we use is broken AF: it lies about supporting
C++14, but really only upgraded the headers but not the library itself,
so using std::shared_timed_mutex just results in a linking failure.
Upgrading the SDK is a huge pain (I tried, failed, and gave up), so for
now temporarily switch to boost::shared_mutex until we sort out the
macOS build disaster.
quorum_vote_t's were serialized as blob data, which is highly
non-portable (probably isn't the same on non-amd64 arches) and broke
between 5.x and 6.x because `signature` is aligned now (which changed
its offset and thus broke 5.x <-> 6.x vote transmission).
This adds a hack to write votes into a block of memory compatible with
AMD64 5.x nodes up until HF14, then switches to a new command that fully
serializes starting at the hard fork (after which we can remove the
backwards compatibility stuff added here).
The argument order felt wrong: switch it to (NAME, TYPE, FUNC)
Since register_command is meant to be called statically there is little
purpose in being able to accept a non-function-pointer callback (i.e.
direct function or capture-less lambda), so drop using std::function<>'s
for command callbacks to avoid the virtual call overhead.
Changes commands from two types (quorum & public) to three:
- anyone -> service node (SNNetwork::command_type::public_)
- service node -> anyone (SNNetwork::command_type::response)
- service node -> service node (command_type::quorum)
Previously quorum commands could be issued to non-SN nodes, but that
should not be allowed (and the code would dereference a nullptr if that
happened).
Checkpoint votes were only going over quorumnet, which was quite broken
as they need to go out universally: random nodes piece together the
individual checkpoints to create checkpoints on their own, and so need
to see all of the votes. With the current code only service nodes that
participated in the specific quorum ever saw the checkpoint votes.
This commit returns checkpoint vote distribution to distribution over
p2p.
We could, in theory, do checkpoint vote collection over quorumnet and
then relay the set of votes as a pack to reduce p2p traffic a bit, but
this wouldn't be a huge optimization since we'd still have to distribute
all the votes over p2p at some point anyway, so leaving that as a future
optimization for now.
(Obligations votes, in contrast, don't need to be distributed at all --
if a vote results in a state change, the quorum members themselves can
produce and distribute the state change tx).
This extracts uptime proof data entirely from service node states,
instead storing (some) proof data as its own first-class object in the
code and backed by the database. We now persistently store:
- timestamp
- version
- ip & ports
- ed25519 pubkey
and update it every time a proof is received. Upon restart, we load all
the proofs from the database, which means we no longer lose last proof
received times on restart, and always have the most recently received ip
and port information (rather than only having whatever it was in the
most recently stored state, which could be ~12 blocks ago). It also
means we don't need to spam the network with proofs for up to half an
hour after a restart because we now know when we last sent (and
received) our own proof before restarting.
This separation required some restructuring: most notably changing
`state_t` to be constructed with a `service_node_list` pointer, which it
needs both directly and for BlockchainDB access. Related to this is
also eliminating the separate BlockchainDB pointer stored in
service_node_list in favour of just accessing it via m_blockchain (which
now has a new `has_db()` method to detect when it has been initialized).
We really don't need to worry about confirming a bl_good or bl_bad:
those are only sent once a blink quorum has itself verified the
signature. While one rogue SN could technically lie, there isn't
anything at risk if that happens: the recipient still does signature
validation, and the lie will be corrected at the next wallet refresh.
This keeps confirmations for bl_nostart, however: those can occur
spuriously if one node is in a bad state, so continue not relaying those
back unless we get a majority saying the same thing.
Fixes a bug where a blink TX submission could end up getting no
bl_good/bl_bad reply from the quorum nodes it submitted to if they had
already received the tx (i.e. from another quorum member) and the tx was
already approved. There was code here to partially handle it by
deferring the reply into the signature handling code, but that code only
fired if an arriving signature flipped the result from unknown to
approved/rejected. If it was *already* approved before it arrived at
the blink member the reply wouldn't get sent.
This was causing intermittent timeouts when sending because the
submitting node is waiting for 3 confirming replies, but was randomly
getting 1-4 based on the timing of submissions and quorum approval.
This reverts the parts of upstream commit
6cf56682bc related to doing rdtsc timing
on x86-64 because it is incredibly broken:
- rdtsc is not a reliable duration timer on x86-64 on all: among other
problems, CPU frequencies are not even close to constant, nor are
returned values reliable across threads.
- As if the unreliability wasn't bad enough, this code also spends a
**full second** in a busy loop to try to measure the number of rdtsc
ticks per wallclock nanosecond, and does this during static
initialization. Thus every invocation of every binary wastes a full
second a CPU to calibrate some timer ratio value that isn't even
remotely reliable in the first place.
The second one is particularly annoying: trying to run `--help` or
invoke an rpc command adds a full second of 100% CPU usage delay.
Before this commit:
$ time ./bin/lokid --help >/dev/null
real 0m1.018s
user 0m1.014s
sys 0m0.004s
and after:
$ time ./bin/lokid --help >/dev/null
real 0m0.013s
user 0m0.008s
sys 0m0.004s
- Fix assert to use version_t::_count in service_node_list.cpp
- Incoporate staking changes from LNS which revamp construct tx to
derive more of the parameters from hf_version and type. This removes
extraneous parameters that can be derived elsewhere.
Also delegate setting loki_construct_tx_params into wallet2 atleast,
so that callers into the wallet logic, like rpc server, simplewallet
don't need bend backwards to get the HF version whereas the wallet has
dedicated functions for determining the HF.
rpc/instanciations.cpp is a huge compiler job because it includes two
separate huge template instanciations [sic] in it. Splitting it apart
into two separate compilation units makes compilation more
parallelizable and requires less ram for the individual job.
The split also revealed a few missing headers in epee for logging macros.
Rename handle_incoming_txs to handle_parsed_txs, and re-add a
handle_incoming_txs that does the parse+handle_parsed call with the
appropriate lock held for code that doesn't care about being able to
split it into two steps.
Clarifies the locking requirements with a comment in the code, and adds
an incoming tx lock over the blink receiving process in
cryptonote_protocol_handler.inl.
Also fixes a bug in the weight calculation happening because the
blobdata pointer was being invalidated when going through
the singular `handle_parsed_tx()`.
This adds support for bumping conflicting transactions from the mempool
upon receipt of a blink tx if those conflicting txs are not themselves
blinks.
This commit chops the handle_incoming_txs effectively in half into a
parse_incoming_txs phase, followed by a handle_incoming_txs phase which
is needed so that we can first parse the txes then, assuming they parse
successfully, can flag them as blink txes if blink info was included
with them, so that the actual insertion becomes forcing.
There is a stub here for also allowing a rollback of the mutable part of
the chain; implementation of actually doing that rollback will follow in
another commit.
To make it obvious (and keep it consistent) with all the other
blink-related methods having "blink" in their name.
Also fix grammar in a related debug message.
- actually include the blink hashes in the core sync data
- fix cleanup to delete heights in (0, immutable] instead of [0,
immutable); we want to keep 0 because it signifies the mempool, and we
only need blocks after immutable, not immutable itself.
- fixed NOTIFY_REQUEST_GET_TXS to handle mempool txes properly (it was
only searching the chain and ignored missed_txs, but missed_txs are ones
we need to look up in the mempool)
- Add a method to tx_pool (needed for the above) to grab multiple txes
by hash (essentially a multi-tx version of `get_transaction()`), and
change get_transaction() to use it with a single value.
- Added various debugging statements
- Added a bunch of comments to each condition of the preliminary blink
data check condition.
- Don't abort blink addition on a single signature failure: if there are
enough valid signatures we should still accept it.
- Check for blink signature approval when receiving blink signatures;
it's not enough to know that all were added successfully, we also have
to ask the blink tx if it is approved (which does additional checks on
subquorum counts) once we add them all.
This adds blink tx synchronization: when doing a periodic sync with
other nodes each node includes a map of {HEIGHT, HASH} pairs where
HEIGHT is a mined, post-immutable block height and HASH is an xor of all
the tx hashes mined into that block.
If upon receipt the node disagrees about what it thinks the HASH should
be, it can request a list of txes for one or more disagreeing heights,
for which it gets a list of tx hashes of all blink txes mined into that
block.
If it is then missing any of those TXes, this adds an ability to request
that the remote send TXes via the existing NOTIFY_NEW_TRANSACTIONS
mechanism, but with an added flag (so that these don't get relayed).
This originally was going to request the TXes via the existing
NOTIFY_REQUEST_GET_OBJECTS, which has a `txs` field, but unfortunately
any txs passed to it are completely ignored; it is *only* usable for
block synchronization. As such I renamed it and removed the `txs` field
to make the responsibility/capability clearer.
Everything that uses these structs already either zero initializes or
doesn't need to zero initialize, so the epee stuff is just annoying and
provides no benefit (plus some of the additions don't use the epee
misdirection layer already).
- Simplify the algorithm (while keeping the results the same)
- Don't use a return value for a function that doesn't have a meaningful
return value
- Clarify and correct documentation of what it does
None of this code is used or called or invoked anywhere, yet complicates
things by throwing in unneeded multiple inheritance, a unique_ptr that
is never touched, and compiling functions (and an extra .cpp file) that
are never called which are full of FIXMEs and TODOs and probably don't
work. None of this has been touched in years, either.
Purge it.
This was leftover code from a previous implementation approach (burned
fee checking for blink moved into tx_pool_options, which are set up for
blink inside the add_new_blink call just below this).
This adds the ability for check_fee() to also check the burn amount.
This requires passing extra info through `add_tx()` (and the various
things that call it), so I took the:
bool keeped_by_block, bool relayed, bool do_not_relay
argument triplet, moved it into a struct in tx_pool.h, then added the other fee
options there (along with some static factory functions for generating the
typical sets of option).
The majority of this commit is chasing that change through the codebase and
test suite.
This is used by blink but should also help LNS and other future burn
transactions to verify a burn amount simply when adding the transation to the
mempool. It supports a fixed burn amount, a burn amount as a multiple of the
minimum tx fee, and also allows you to increase the minimum tx fee (so that,
for example, we could require blink txes to pay miners 250% of the usual
minimum (unimportant) priority tx fee.
- Removed a useless core::add_new_tx() overload that wasn't used anywhere.
Blink-specific changes:
(I'd normally separate these into a separate commit, but they got interwoven
fairly heavily with the above change).
- changed the way blink burning is specified so that we have three knobs for
fee adjustment (fixed burn fee; base fee multiple; and required miner tx fee).
The fixed amount is currently 0, base fee is 400%, and require miner tx fee is
simply 100% (i.e. no different than a normal transaction). This is the same as
before this commit, but is changing how they are being specified in
cryptonote_config.h.
- blink tx fee, burn amount, and miner tx fee (if > 100%) now get checked
before signing a blink tx. (These fee checks don't apply to anyone else --
when propagating over the network only the miner tx fee is checked).
- Added a couple of checks for blink quorums: 1) make sure they have reached
the blink hf; 2) make sure the submitted tx version conforms to the current hf
min/max tx version.
- print blink fee information in simplewallet's `fee` output
- add "typical" fee calculations in the `fee` output:
[wallet T6SCwL (has locked stakes)]: fee
Current fee is 0.000000850 loki per byte + 0.020000000 loki per output
No backlog at priority 1
No backlog at priority 2
No backlog at priority 3
No backlog at priority 4
Current blink fee is 0.000004250 loki per byte + 0.100000000 loki per output
Estimated typical small transaction fees: 0.042125000 (unimportant), 0.210625000 (normal), 1.053125000 (elevated), 5.265625000 (priority), 0.210625000 (blink)
where "small" here is the same tx size (2500 bytes + 2 outputs) used to
estimate backlogs.
This catches any exception thrown in the inner quorumnet blink code and
sets it in the promise if it occurs, which propagates it out to
core_rpc_server to catch and deal with.
- Adds blink signature synchronization and storage through the regular
p2p network
- Adds wallet support (though this is still currently buggy and needs
additional fixes - it sees the tx when it arrives in the mempool but
isn't properly updating when the blink tx gets mined.)
There are a bunch trivial forwarding wrappers in cryptonote_core that
simply call the same method in the pool, and blink would require adding
several more. Instead of all of these existing (and new) wrappers, just
directly expose the tx_pool reference so that anything with a `core`
reference can access and call to the mempool directly.
There is nothing gained at all by trying to pad the data to an even
multiple of 1024 but it adds quite a lot of complexity (which would get
worse with blink). Just adding some 0-1023 random bytes prevents
traffic analysis just as well and considerably simplifies the code.
This adds a thread-local, pre-seeded rng at `tools::rng` (to avoid the
multiple places we are creating + seeding such an RNG currently).
This also moves the portable uniform value and generic shuffle code
there as well as neither function is specific to service nodes and this
seems a logical place for them.
This is the bulk of the work for blink. There is two pieces yet to come
which will follow shortly, which are: the p2p communication of blink
transactions (which needs to be fully synchronized, not just shared,
unlike regular mempool txes); and an implementation of fee burning.
Blink approval, multi-quorum signing, cli wallet and node support for
submission denial are all implemented here.
This overhauls and fixes various parts of the SNNetwork interface to fix
some issues (particularly around non-SN communication with SNs, which
wasn't working).
There are also a few sundry FIXME's and TODO's of other minor details
that will follow shortly under cleanup/testing/etc.
Currently we store it as various different things: 3 separate ints, 2
u16s, 3 separate u16s, and a vector of u16s. This unifies all version
values to a `std::array<uint16_t,3>`.
- LOKI_VERSION_{MAJOR,MINOR,PATCH} are now just LOKI_VERSION
- The previous LOKI_VERSION (C-string of the version) is now renamed
LOKI_VERSION_STR
A related change included here is that the restricted RPC now returns
the major version in the get_info rpc call instead of an empty string
(e.g. "5" instead of ""). There is almost certainly enough difference
in the RPC results to distinguish major versions already so this doesn't
seem like it actually leaks anything significant.
For some reason they use GLOB for sources but this is strongly
discouraged. This commit explicitly lists the files, just like every
other CMakeLists.txt in the project.
This adds vote relaying via quorumnet.
- The main glue between existing code and quorumnet code is in
cryptonote_protocol/quorumnet.{h,cpp}
- Uses function pointers assigned at initialization to call into the
glue without making cn_core depend on p2p
- Adds ed25519 and quorumnet port to uptime proofs and serialization.
- Added a second uptime proof signature using the ed25519 key to that
SNs are proving they actually have it (otherwise they could spoof
someone else's pubkey).
- Adds quorumnet port, defaulting to zmq rpc port + 1. quorumnet
listens on the p2p IPv4 address (but broadcasts the `public_ip` to the
network).
- Derives x25519 when received or deserialized.
- Converted service_info_info::version_t into a scoped enum (with the
same underlying type).
- Added contribution_t::version_t scoped enum. It was using
service_info_info::version for a 0 value which seemed rather wrong.
Random small details:
- Renamed internal `arg_sn_bind_port` for consistency
- Moved default stagenet ports from 38153-38155 to 38056-38058, and add the default stagenet
quorumnet port at 38059 (this keeps the range contiguous; otherwise the next stagenet default port
would collide with the testnet p2p port).
Neither of these have a place in modern C++11; boost::value_initialized
is entirely superseded by `Type var{};` which does value initialization
(or default construction if a default constructor is defined). More
problematically, each `boost::value_initialized<T>` requires
instantiation of another wrapping templated type which is a pointless
price to pay the compiler in C++11 or newer.
Also removed is the AUTO_VAL_INIT macro (which is just a simple macro
around constructing a boost::value_initialized<T>).
BOOST_FOREACH is a similarly massive pile of code to implement
C++11-style for-each loops. (And bizarrely it *doesn't* appear to fall
back to C++ for-each loops even when under a C++11 compiler!)
This removes both entirely from the codebase.
This generates a ed25519 keypair (and from it derives a x25519 keypair)
and broadcasts the ed25519 pubkey in HF13 uptime proofs.
This auxiliary key will be used both inside lokid (starting in HF14) in
places like the upcoming quorumnet code where we need a standard
pub/priv keypair that is usable in external tools (e.g. sodium) without
having to reimplement the incompatible (though still 25519-based) Monero
pubkey format.
This pulls it back into HF13 from the quorumnet code because the
generation code is ready now, and because there may be opportunities to
use this outside of lokid (e.g. in the storage server and in lokinet)
before HF14. Broadcasting it earlier also allows us to be ready to go
as soon as HF14 hits rather than having to wait for every node to have
sent a post-HF14 block uptime proof.
For a similar reason this adds a placeholder for the quorumnet port in
the uptime proof: currently the value is just set to 0 and ignored, but
allowing it to be passed will allow upgraded loki 6.x nodes to start
sending it to each other without having to wait for the fork height so
that they can start using it immediately when HF14 begins.