- Instead of the Pulse quorums validators recording participation
between each other- so failures may not manifest in a decommission until
the several common nodes align and agree to vote off the node.
Voting now occurs when blocks arrives, validators participating in the
generation of the block are marked. This is shared information between
all nodes syncing the chain so decommissions are more readily agreeable
and acted upon in the Obligation quorums immediately.
Currently where we need to look up a block by height we do:
1. get block hash for the given height
2. look up block by hash
This hits the lmdb layer and does:
3. look up height from hash in hashes-to-height table
4. look up block from height in blocks table
which is pointless. This commit adds a `get_block_by_height()` that
avoids the extra runaround, and converts code doing height lookups to
use the new method.
This updates failures to come in batches of 10 followed by 10
should-be-good blocks.
Blocks with an odd 2nd-last digit (xxx1x, xxx3x, etc.) are the ones
where we add failures.
- Late messages that arrive fail signature validation and log an error.
In reality this is not always an error, it means the Pulse node finished
the round or realised the round was going to fail earlier than another
node.
The late arriving messages refer to the previous round or block and
might actually validate ok, but just be late. This commit stores the
round history so that we can still validate these old messages and
silently ignore instead of printing errors.
- Previously we just submitted 1 signature that signed the contents of
the final block that required us to delay signature verification,
because, if we received the message before we were in the final stage we
would have to delay the verification because we have insufficient data
to verify the signature.
This means that when someone in the quorum receives and relays the
message, they can tamper the message and make it invalid (by changing
the round to something invalid for example) and cause other nodes in the
quorum to reject it, eventually, recording that the Service Node didn't
participate in the round and bias Service Nodes to decommissioning.
Instead of taking the shortcut and providing only 1 signature, we do the
same thing we do with all the other messages,
1. We signed the contents of the message- this proves that the message
originated from the Service Node it claims to have come from (preventing
any tampering).
2. The 2nd signature actually is the signature that signs the final
block and is included in the block for propagation in the network.
Doing so patches up the ability for intermediate relay nodes from tampering
the message.
- A non-participating node might be able to leak his way through
a stage and influence the receive count and cause participating nodes to
progress and (but) eventually fail and report some non-sensical error
that all messages were received but still failed.
- If a node is in the pulse quorum- but in the
locked in bitset (that indicates the nodes that are locked in to
participate in the round) does not include the node, go to sleep.
Previously the node would continue through the pulse rounds, but
messages would be ignored by everyon else in the quorums.
- Alternative pulse blocks must be verified against the quorum they belong to.
This updates alt_block_added hook in Service Node List to check the new Pulse
invariants and on passing allow the alt block to be stored into the DB until
enough blocks have been checkpointed.
- New reorganization behaviour for the Pulse hard fork. Currently reorganization
rules work by preferring chains with greater cumulative difficulty and or
a chain with more checkpoints. Pulse blocks introduces a 'fake' difficulty to
allow falling back to PoW and continuing the chain with reasonable difficulty.
If we fall into a position where we have an alt chain of mixed Pulse blocks
and PoW blocks, difficulty is no longer a valid metric to compare blocks (a
completely PoW chain could have much higher cumulative difficulty if hash
power is thrown at it vs Pulse chain with fixed difficulty).
So starting in HF16 we only reorganize when 2 consecutive checkpoints prevail
on one chain. This aligns with the idea of a PoS network that is
governed by the Service Nodes. The chain doesn't essentially recover until
Pulse is re-enabled and Service Nodes on that chain checkpoint the chain
again, causing the PoW chain to switch over.
- Generating Pulse Entropy no longer does a confusing +-1 to the height dance
and always begins from the top block. It now takes a block instead of a height
since the blocks may be on an alternative chain or the main chain. In the
former case, we have to query the alternative DB table to grab the blocks to
work.
- Removes the developer debug hashes in code for entropy.
- Adds core tests to check reorganization works
- We could do it earlier, but we need info for producing the payouts.
Adding it earlier and shuffling around more state to store is not worth
it just for early return to sleep, when we still have to wait for the next
round to start anyway.
- When not a participant in a pulse round, nodes will iterate Pulse
quorums until it is and then sleeps on the round. This can cause the
rounds to overflow at round > 255, if the Service Node is never selected
to participate and cause them to reject any Pulse Block even if some
prior quorum sent it validly.
- Moving the non-participant check down to after the round starts also
puts all the participation checks (is validator, is producer, is
neither) into one spot for improved clarity.
Otherwise the state machine loop only runs once, then on loop end it's
assigned the same value as context.state and terminates.
Setting it first allows the loop to detect when state has changed and
continue running.