SS wants this, in particular, to be able to do reachability tests.
(Using connect_remote for this was bad with pubkey-based routing ids
because the second connection could replace an existing connection).
It can still be set using `lmq.log_level(...)`, but this can be slightly
more convenient -- and without this log messages in the constructor are
completely useless.
We really don't *ever* want send to block, no matter how it is called,
since the send is always in the proxy thread. This makes the actual
send call always non-blocking, and adds callbacks that we can invoke on
send failures: either on queue full errors (which might be recoverable),
or both full queue and hard failures (which are generally not
recoverable). These callbacks are both optional: they have to be passed
in using `send_option::queue_full` (if you just want queue full
notifies) or `send_option::queue_failure` (if you want queue full
notifies *and* other send exceptions).
For example:
lmq.request(conn, "some.method", callback, lokimq::request_timeout{5s});
will result in the callback being called with a failure if the response
doesn't arrive within 5s. (If it still arrives, but after the failure
callback, it gets dropped).
The previous 1s default seems on the long side; this reduces it to
250ms. It also makes it a public member so that it can be configured
(which is mainly needed for the test suite, but might be useful for
lokimq-calling code that needs faster or slower connection cleanups).
Having this as a vector seems to cause armhf/gcc-6 to segfault. On
closer inspection there's no good reason this should be a vector in the
first place: it only gets used during new connection handshaking and
isn't in any hot loop, plus the elements are fairly large tuples where
shifting elements is going to be relatively expensive. Thus switching
it to a list everywhere (rather than just on old gcc arm) seems fine.
lokimq.cpp and lokimq.h were getting monolithic; this splits lokimq.cpp
into multiple smaller cpp files by logical purpose for better parallel
compilation ability. It also splits up the lokimq.h header slightly by
moving the ConnectionID and Message types into their own headers.
This makes it much more convenience to use them with a run-time
condition; this simplifies:
if (should_be_optional)
lmq.send(..., send_option::optional{});
else
lmq.send(...);
to:
lmq.send(..., send_option::optional{should_be_optional});
This allows storing a ConnectionID received in a message callback and
using it later to send another message along the connection without
worrying about a routing id: the ConnectionID will have it if it is
required. Previously you would have had to store the ConnectionID *and*
the routing prefix, and then specified the route as a
send_option::route{}, which was annoying and cumbersome.
This adds a separate category (and reserve count) for "reply jobs",
which are jobs triggered by receiving a reply to a request, or after a
successful connect or unsuccessful timeout. Previously these were
scheduled as regular batch jobs; this schedules them as a new "reply
jobs" category with its own reserved threads count.
This also changes the defaults for batch jobs and reply jobs to be based
on the specified general workers count rather than directly on hardware
concurrency, so that if you are on a 16-thread CPU but override general
workers from its default of 16 to 4 and don't change batch workers you
now get reserved batch workers set to 2 rather than 8 which constrains
the typical parallel batch jobs to 4 (i.e. the general worker limit)
rather than exceeding it with the batch job limit.
Similarly for reply jobs, which is now ceil(general/8) by default.
The existing code was largely set up for SN-to-SN or client-to-SN
communications, where messages can always get to the right place because
we can always send by pubkey.
This doesn't work when we want general communications with a random
remote address.
This commit overhauls the way loki-mq handles communication in a few
important ways:
- Listening instances no longer pass bind addresses into the
constructor; instead they call `listen_curve()` or `listen_plain()`
before invoking `start()`.
- `listen_curve()` is equivalent to the existing bind support: it
listens on a socket and accepts encrypted handshaked connections from
anyone who already knows the server's public key.
- `listen_plain()` is all new: it sets up a plain text listening socket
over which random clients can connect and talk. End-points aren't
verified, and it isn't encrypted, but if you don't know who you are
talking to then encryption isn't doing anything anyway.
- Connecting to a remote now connections in CURVE encryption or NULL
(plain-text) encryption based on whether you provide a remote_pubkey.
For CURVE, the connection will fail if the pubkey does not match.
- `ConnectionID` objects are now returned when connecting to a remote
address; this object is then passed in to send/request/etc. to direct
the message. For SN communication, ConnectionID's can be created
implicitly from SN pubkey strings, so the existing interface of
`lmq.send(pubkey, ...)` will still work in most cases.
- A ConnectionID is now passed to the ConnectSuccess and ConnectFailure
callbacks. This can be used to uniquely identify which connection
succeeded or failed, and can determine whether the remote is a service
node (`.sn()`) and/or the pubkey (`.pubkey()`). (Obviously the service
node status is only available when the client can do service node
lookups, and the pubkey() is only non-empty for encrypted connections).
string_view isn't supposed to be implicitly convertible to std::string
and code would break compiling under c++17 (when our local string_view
is simply a std::string_view typedef).
LMQ_TRACE becomes nothing under a release build, which is good because
many traces are in the proxy hot path.
Also fixes some confusing log level comparison logic by flipping the
order of log levels. Previous trace < debug < info < warn, which was
confusing: now that order is reversed which makes way more sense (i.e.
larger LogLevel means more logging).
Batch jobs scheduled by the proxy thread itself were delayed to the next
poll timeout (because nothing ever gets sent on a socket). Add a
variable to bypass the next poll to handle this case.
This allows making RPC requests with a callback that gets called when
the response comes back. The is essentially a wrapper around doing it
yourself (i.e. by setting up a server-side "request" and client-side
"reply" command where "request" responds with a "reply" command), but
abstracted into lokimq itself as it is likely to be very useful when
integrating client/server connections rather than peer-to-peer
connections.
This overhauls the proposed batch implementation (described in the
README but previously not implemented) and implements it.
Various other minor improvements and code restructuring.
Added a proposed "request" type to the README to be implemented; this is
like a command, but always expects a ["REPLY", TAG, ...] response. The
request invoker provides a callback to invoke when such a REPLY arrives.
This library is adapted from lokid's existing quorumnet code (added in
6.x) used for SN-to-SN communication for quorum voting but generalized
to be usable both there and as a basis for other communication channels
with loki projects (for example: wallet-to-lokid communication; loki-ss
and lokinet internal communication with lokid; loki-ss to loki-ss
communication and message passing; perhaps eventually loki p2p traffic).
This initial release compiles but likely has a few warts and bugs that
need ironing out in the implementation before it is production ready.
Some tests will follow.