- Alternative pulse blocks must be verified against the quorum they belong to.
This updates alt_block_added hook in Service Node List to check the new Pulse
invariants and on passing allow the alt block to be stored into the DB until
enough blocks have been checkpointed.
- New reorganization behaviour for the Pulse hard fork. Currently reorganization
rules work by preferring chains with greater cumulative difficulty and or
a chain with more checkpoints. Pulse blocks introduces a 'fake' difficulty to
allow falling back to PoW and continuing the chain with reasonable difficulty.
If we fall into a position where we have an alt chain of mixed Pulse blocks
and PoW blocks, difficulty is no longer a valid metric to compare blocks (a
completely PoW chain could have much higher cumulative difficulty if hash
power is thrown at it vs Pulse chain with fixed difficulty).
So starting in HF16 we only reorganize when 2 consecutive checkpoints prevail
on one chain. This aligns with the idea of a PoS network that is
governed by the Service Nodes. The chain doesn't essentially recover until
Pulse is re-enabled and Service Nodes on that chain checkpoint the chain
again, causing the PoW chain to switch over.
- Generating Pulse Entropy no longer does a confusing +-1 to the height dance
and always begins from the top block. It now takes a block instead of a height
since the blocks may be on an alternative chain or the main chain. In the
former case, we have to query the alternative DB table to grab the blocks to
work.
- Removes the developer debug hashes in code for entropy.
- Adds core tests to check reorganization works
Changes naming scheme
- Block Producer: The Service Node/Miner reponsible for generating the block
- Block Leader: The Service Node that is at the top of the Service Node
List queue. They are also by default on pulse round 0, the block producer.
If that round fails, then the block producer is changed as per the LIP-6
with multi-block seeding.
This commit also re-writes validate miner TX to repeat some of the
verification code. This appears to be clearer than having multiple
variables that change the verification loop in small ways which was more
error prone than I'd like to tolerate.
With Service Node List now generating 2 coinbase rewards, block_winner
is ambiguous, so, distinguish the reward generated for nodes at the top
of the queued list as- queued_winner.
- Previously we generated quorums based off the previous quorum. To
incorporate the round, we can no longer pre-compute the quorum.
Instead it must be calculated when the block arrives, just before the
Service Node List is changed.
- Add tests to check round changing works
Allows us to insert custom pulse data inbetween the begin and end pairs
by separating the SN list update step (which relies on the block) from
the block miner TX/TX list construction step.
- Entropy is taken from the blockchain height, (i.e. if top block is block
100, then the blockchain height is 101). When the Core Tests is generating
a block in the tests, we haven't added the block to the DB yet, so when
db.height() is called in the tests, the height returned is 100.
Whereas when the Blockchain is receiving a block and on validation,
block 100, has been added to the DB already (added to DB, then we do
Service Node List validation). So the DB returns a blockchain height of
101. The 2 inconsistencies causes differences in the source entropy
for the blockchain.
- Begin moving code to allow creating blocks with custom pulse data in
tests by introducing loki_create_block_params. The workflow is to get
loki_chain_generator to return you the parameters to form the next
valid block. The user can modify the parameters to other valid
blocks/invalid blocks.
- Change the stable sort to a normal sort to avoid requiring the caller
side of the API needing to implicitly know that the list given must
be sorted lexicographically.
This replaces the NIH epee http server which does not work all that well
with an external C++ library called uWebSockets. Fundamentally this
gives the following advantages:
- Much less code to maintain
- Just one thread for handling HTTP connections versus epee's pool of
threads
- Uses existing LokiMQ job server and existing thread pool for handling
the actual tasks; they are processed/scheduled in the same "rpc" or
"admin" queues as lokimq rpc calls. One notable benefit is that "admin"
rpc commands get their own queue (and thus cannot be delayed by long rpc
commands). Currently the lokimq threads and the http rpc thread pool
and the p2p thread pool and the job queue thread pool and the dns lookup
thread pool and... are *all* different thread pools; this is a step
towards consolidating them.
- Very little mutex contention (which has been a major problem with epee
RPC in the past): there is one mutex (inside uWebSockets) for putting
responses back into the thread managing the connection; everything
internally gets handled through (lock-free) lokimq inproc sockets.
- Faster RPC performance on average, and much better worst case
performance. Epee's http interface seems to have some race condition
that ocassionally stalls a request (even a very simple one) for a dozen
or more seconds for no good reason.
- Long polling gets redone here to no longer need threads; instead we
just store the request and respond when the thread pool, or else in a
timer (that runs once/second) for timing out long polls.
---
The basic idea of how this works from a high level:
We launch a single thread to handle HTTP RPC requests and response data.
This uWebSockets thread is essentially running an event loop: it never
actually handles any logic; it only serves to shuttle data that arrives
in a request to some other thread, and then, at some later point, to
send some reply back to that waiting connection. Everything is
asynchronous and non-blocking here: the basic uWebSockets event loop
just operates as things arrive, passes it off immediately, and goes back
to waiting for the next thing to arrive.
The basic flow is like this:
0. uWS thread -- listens on localhost:22023
1. uWS thread -- incoming request on localhost:22023
2. uWS thread -- fires callback, which injects the task into the LokiMQ job queue
3. LMQ main loop -- schedules it as an RPC job
4. LMQ rpc thread -- Some LokiMQ thread runs it, gets the result
5. LMQ rpc thread -- Result gets queued up for the uWS thread
6. uWS thread -- takes the request and starts sending it
(asynchronously) back to the requestor.
In more detail:
uWebSockets has registered has registered handlers for non-jsonrpc
requests (legacy JSON or binary). If the port is restricted then admin
commands get mapped to a "Access denied" response handler, otherwise
public commands (and admin commands on an unrestricted port) go to the
rpc command handler.
POST requests to /json_rpc have their own handler; this is a little
different than the above because it has to parse the request before it
can determine whether it is allowed or not, but once this is done it
continues roughly the same as legacy/binary requests.
uWebSockets then listens on the given IP/port for new incoming requests,
and starts listening for requests in a thread (we own this thread).
When a request arrives, it fires the event handler for that request.
(This may happen multiple times, if the client is sending a bunch of
data in a POST request). Once we have the full request, we then queue
the job in LokiMQ, putting it in the "rpc" or "admin" command
categories. (The one practical different here is that "admin" is
configured to be allowed to start up its own thread if all other threads
are busy, while "rpc" commands are prioritized along with everything
else.) LokiMQ then schedules this, along with native LokiMQ "rpc." or
"admin." requests.
When a LMQ worker thread becomes available, the RPC command gets called
in it and runs. Whatever output it produces (or error message, if it
throws) then gets wrapped up in jsonrpc boilerplate (if necessary), and
delivered to the uWebSockets thread to be sent in reply to that request.
uWebSockets picks up the data and sends whatever it can without
blocking, then buffers whatever it couldn't send to be sent again in a
later event loop iteration once the requestor can accept more data.
(This part is outside lokid; we only have to give uWS the data and let
it worry about delivery).
---
PR specifics:
Things removed from this PR:
1. ssl settings; with this PR the HTTP RPC interface is plain-text. The
previous default generated a self-signed certificate for the server on
startup and then the client accepted any certificate. This is actually
*worse* than unencrypted because it is entirely MITM-readable and yet
might make people think that their RPC communication is encrypted, and
setting up actual certificates is difficult enough that I think most
people don't bother.
uWebSockets *does* support HTTPS, and we could glue the existing options
into it, but I'm not convinced it's worthwhile: it works much better to
put HTTPS in a front-end proxy holding the certificate that proxies
requests to the backend (which can then listen in restricted mode on
some localhost port). One reason this is better is that it is much
easier to reload and/or restart such a front-end server, while
certificate updates with lokid require a full restart. Another reason
is that you get an error page instead of a timeout if something is wrong
with the backend. Finally we also save having to generate a temporary
certificate on *every* lokid invocation.
2. HTTP Digest authentication. Digest authentication is obsolete (and
was already obsolete when it got added to Monero). HTTP-Digest was
originally an attempt to provide a password authentication mechanism
that does not leak the password in transit, but still required that the
server know the password. It only has marginal value against replay
attacks, and is made entirely obsolete by sending traffic over HTTPS
instead. No client out there supports Digest but *not* Basic auth, and
so given the limited usefulness it seems pointless to support more than
Basic auth for HTTP RPC login.
What's worse is that epee's HTTP Digest authentication is a terrible
implementation: it uses boost::spirit -- a recursive descent parser
meant for building complex language grammars -- just to parse a single
HTTP header for Digest auth. This is a big load of crap that should
never have been accepted upstream, and that we should get rid of (even
if we wanted to support Digest auth it takes less than 100 lines of code
to do it when *not* using a recursive descent parser).
REGISTER_CALLBACK and REGISTER_CALLBACK_METHOD do *exactly the same
thing*, and both take a different and unnecessary argument. DRY it out
and make it use a lambda instead of boost::bind.
A huge amount of this is repetitive:
- `boost::get<T>(variant)` becomes `std::get<T>(variant)`
- `boost::get<T>(variant_ptr)` becomes `std::get_if<T>(variant_ptr)`
- `variant.type() == typeid(T)` becomes `std::holds_alternative<T>(variant)`
There are also some simplifications to visitors using simpler stl
visitors, or (simpler still) generic lambdas as visitors.
Also adds boost serialization serializers for std::variant and
std::optional.
- De-const a bunch of things that need to be non-const now
- Stick lns_db_ in a shared_ptr because it isn't copyable anymore but
the chain generator needs to remain copyable for fork testing.
- Renames generic_key->generic_owner
- Move generic_owner and generic_signature out of crypto.h because they
aren't really crypto items, rather composition of crypto primitives.
generic_owner also needs access to account_public_address, while that is
just 2 public keys, I've decided to include cryptonote_basic.h into
tx_extra.h instead of crypto.h.
- Some generic_owner helper functions were moved into
cryptonote_basic/format_utils as they need to avoid circular
dependencies between cryptonote_core/cryptonote_basic had I included
generic_owner/generic_signature into loki_name_system.h
- Utilise the normal serialize macros since tx_extra.h already includes
the serializing headers.
We allow arbitrary reorganizations for the LNS DB by storing the
previous TXID that purchased the LNS record. When reorganizing, we
iterate back through the blockchain looking for the first LNS entry that
is before the reorganize height and revert the LNS entry to that record.
We want to allow people to buy LNS entries on behalf of other users. If
this is the case we don't need signatures to verify that the purchaser
knows the secret key. What we actually want in this scenario is that,
there's a LNS entry, and people can voluntarily pay to renew/buy that.
This extracts uptime proof data entirely from service node states,
instead storing (some) proof data as its own first-class object in the
code and backed by the database. We now persistently store:
- timestamp
- version
- ip & ports
- ed25519 pubkey
and update it every time a proof is received. Upon restart, we load all
the proofs from the database, which means we no longer lose last proof
received times on restart, and always have the most recently received ip
and port information (rather than only having whatever it was in the
most recently stored state, which could be ~12 blocks ago). It also
means we don't need to spam the network with proofs for up to half an
hour after a restart because we now know when we last sent (and
received) our own proof before restarting.
This separation required some restructuring: most notably changing
`state_t` to be constructed with a `service_node_list` pointer, which it
needs both directly and for BlockchainDB access. Related to this is
also eliminating the separate BlockchainDB pointer stored in
service_node_list in favour of just accessing it via m_blockchain (which
now has a new `has_db()` method to detect when it has been initialized).