Commit Graph

60 Commits

Author SHA1 Message Date
Jason Rhinelander 716d73d196 All sends use dontwait; add send failure callbacks
We really don't *ever* want send to block, no matter how it is called,
since the send is always in the proxy thread.  This makes the actual
send call always non-blocking, and adds callbacks that we can invoke on
send failures: either on queue full errors (which might be recoverable),
or both full queue and hard failures (which are generally not
recoverable).  These callbacks are both optional: they have to be passed
in using `send_option::queue_full` (if you just want queue full
notifies) or `send_option::queue_failure` (if you want queue full
notifies *and* other send exceptions).
2020-03-29 15:21:20 -03:00
Jason Rhinelander 8e1b2dffa5 Catch connect failures
socket.connect() can throw, e.g. if given an invalid connection address;
catch this, log the error, and return a failure condition.
2020-03-29 14:40:21 -03:00
Jason Rhinelander 2493e2abd4 Remove empty file
All the batch implementation code is in jobs.cpp, this file wasn't meant
to be committed originally.
2020-03-29 12:29:38 -03:00
Jason Rhinelander bcca8dd34e Catch errors on internal msgs; support non-blocking sends
When we try to route an internal message ("BYE", "NOT_A_SERVICE_NODE",
etc.) back to the remote from the proxy thread we can end up trying to
send to a disconnected remote, which raises an exception, but this isn't
caught in proxy code: fix this by catching and ignoring it.

This also changes the code to send these messages in "dontwait" mode so
that if we can't queue the message we get (and ignore) an exception
rather than blocking.
2020-03-29 11:34:55 -03:00
Jason Rhinelander fd19f7b183 Trim logged filenames to lokimq/*
Otherwise this includes the full build path which is gross.
2020-03-27 15:17:34 -03:00
Jason Rhinelander 0639bfa629 Avoid segfault on retried SN connection request
When we fail to send to a SN but can retry (e.g. because we had an
incoming connection which no longer works, but can retry an outgoing
connection) we were recursing, but this was resulting in a double-free
of the request callback (since we'd try to take ownership of the
incoming serialized pointer twice).

Rewrite the code to use a loop with single ownership instead.

This also changes the request callback behaviour to fire a failure
callback immediately if we can't send a request; previously you'd have
to wait for a timeout, but that is pointless if we couldn't get the
request out.
2020-03-27 14:59:11 -03:00
Jason Rhinelander a7c669775f Avoid masking ReplyCallback type with template param 2020-03-27 14:48:35 -03:00
Jason Rhinelander 8b6f6f498c Make request timeout configurable
For example:

    lmq.request(conn, "some.method", callback, lokimq::request_timeout{5s});

will result in the callback being called with a failure if the response
doesn't arrive within 5s.  (If it still arrives, but after the failure
callback, it gets dropped).
2020-03-23 22:30:53 -03:00
Jason Rhinelander 75750001ce Reduce connection check interval and make configurable
The previous 1s default seems on the long side; this reduces it to
250ms.  It also makes it a public member so that it can be configured
(which is mainly needed for the test suite, but might be useful for
lokimq-calling code that needs faster or slower connection cleanups).
2020-03-23 22:29:14 -03:00
Jason Rhinelander b97f3442e7 Rename keep-alive -> keep_alive in internal serialization
This makes it consistent with other internal parameter names.
2020-03-23 22:28:23 -03:00
Jason Rhinelander 04e2bf7cf7 Change pending_connects from vector to list
Having this as a vector seems to cause armhf/gcc-6 to segfault.  On
closer inspection there's no good reason this should be a vector in the
first place: it only gets used during new connection handshaking and
isn't in any hot loop, plus the elements are fairly large tuples where
shifting elements is going to be relatively expensive.  Thus switching
it to a list everywhere (rather than just on old gcc arm) seems fine.
2020-03-21 12:56:46 -03:00
Jason Rhinelander 036e871cdb 32-bit warning fix 2020-03-13 19:05:12 -03:00
Jason Rhinelander 49f8ef21f1 Install mapbox-variant and cppzmq headers 2020-03-13 19:05:12 -03:00
Jason Rhinelander e17ca30411 Split up into logical headers and compilation units
lokimq.cpp and lokimq.h were getting monolithic; this splits lokimq.cpp
into multiple smaller cpp files by logical purpose for better parallel
compilation ability.  It also splits up the lokimq.h header slightly by
moving the ConnectionID and Message types into their own headers.
2020-03-13 14:28:21 -03:00
Jason Rhinelander 1c80b61335 Add version to cmake, generate version header 2020-03-13 14:28:05 -03:00
Jason Rhinelander f4fad9c194 Fix problems on outgoing disconnect
This removes two superfluous erases that occur during connection closing
(the proxy_close_connection just above them already removes the element
from `peers`), and also short-circuits the incoming message loop if our
pollitems becomes stale so that we don't try to use a closed connection.

It also fixes a bug in the outgoing connection index that was
decrementing the wrong connection indices, leading to failures when
trying to send on an existing connection after a disconnect.

Also adds a test case (which fails before the changes in this commit) to
test this.
2020-03-10 13:54:41 -03:00
Jason Rhinelander 0ce614ef8b Workarounds for old cmake/gcc 2020-03-05 01:33:02 -04:00
Jason Rhinelander fa7d4a8a42 Silence -Wmismatched-tags warning 2020-03-05 01:19:29 -04:00
Jason Rhinelander 882750b700 Don't use consumed data when recursing proxy_send
Fixes "Internal error: Invalid proxy send command; conn_pubkey or
conn_id missing"
2020-03-04 00:13:01 -04:00
Jason Rhinelander 3cb52df837 Fix conn index error
The conn_index entry wasn't being added for outgoing SN connections.
2020-03-03 23:26:14 -04:00
Jason Rhinelander 88f0a10bd8 Fix infinite loop on idle peer expiry 2020-03-03 18:12:13 -04:00
Jason Rhinelander ea5ff7790d Fix destructor when `start()` hasn't been called 2020-03-03 17:28:53 -04:00
Jason Rhinelander 428ef12506 Add missing `inline` to de-templatized hex funcs 2020-03-03 15:06:39 -04:00
Jason Rhinelander 465b398b10 Use string_view instead of taking string type by template
Fixes the case of using a char *
2020-03-02 18:50:43 -04:00
Jason Rhinelander dcb7e4df0b Add a category command helper class
This allows simplifying:

    lmq.add_category("foo", ...);
    lmq.add_command("foo", "a", ...);
    lmq.add_command("foo", "b", ...);
    lmq.add_request_command("foo", "c", ...);

to:

    lmq.add_category("foo", ...)
        .add_command("a", ...)
        .add_command("b", ...)
        .add_request_command("b", ...)
        ;
2020-03-02 15:11:54 -04:00
Jason Rhinelander a43ee15b58 _sv wasn't being defined inline 2020-03-02 15:11:29 -04:00
Jason Rhinelander b3abcfc9ae Add ""_sv literal that works just like C++17 ""sv 2020-03-02 14:24:07 -04:00
Jason Rhinelander f18f86cf96 Allow `optional` and `incoming` to take a bool
This makes it much more convenience to use them with a run-time
condition; this simplifies:

    if (should_be_optional)
        lmq.send(..., send_option::optional{});
    else
        lmq.send(...);

to:

    lmq.send(..., send_option::optional{should_be_optional});
2020-03-02 14:24:07 -04:00
Jason Rhinelander 0493082040 Add default allow to `listen_plain()`
The default on listen_curve() was supposed to go on both.
2020-03-01 23:54:06 -04:00
Jason Rhinelander 1cf02d0c66 Fix invalid access in peer address debug message 2020-03-01 15:21:09 -04:00
Jason Rhinelander e4f93afafa Fix trace IP address 2020-02-29 16:34:37 -04:00
Jason Rhinelander 1ad68b2605 Wrap HI response in try/catch
It's possible that the client has gone away, in which case we can get an
exception raised from an unroutable request.
2020-02-29 16:06:08 -04:00
Jason Rhinelander 3be632e73e Fix auth_level for remote connections
The wrong key was being set here (deserialization expects auth_level).
2020-02-29 16:04:43 -04:00
Jason Rhinelander 09c487f327 Add ability to use random routing ids for outgoing 2020-02-29 16:03:25 -04:00
Jason Rhinelander ece8870896 Move routing prefix into ConnectionID
This allows storing a ConnectionID received in a message callback and
using it later to send another message along the connection without
worrying about a routing id: the ConnectionID will have it if it is
required.  Previously you would have had to store the ConnectionID *and*
the routing prefix, and then specified the route as a
send_option::route{}, which was annoying and cumbersome.
2020-02-29 16:01:47 -04:00
Jason Rhinelander 28e36a3eaf Make AuthLevel stream printable 2020-02-29 15:16:58 -04:00
Jason Rhinelander 2743e576b2 Distinguish between batch jobs and reply jobs
This adds a separate category (and reserve count) for "reply jobs",
which are jobs triggered by receiving a reply to a request, or after a
successful connect or unsuccessful timeout.  Previously these were
scheduled as regular batch jobs; this schedules them as a new "reply
jobs" category with its own reserved threads count.

This also changes the defaults for batch jobs and reply jobs to be based
on the specified general workers count rather than directly on hardware
concurrency, so that if you are on a 16-thread CPU but override general
workers from its default of 16 to 4 and don't change batch workers you
now get reserved batch workers set to 2 rather than 8 which constrains
the typical parallel batch jobs to 4 (i.e. the general worker limit)
rather than exceeding it with the batch job limit.

Similarly for reply jobs, which is now ceil(general/8) by default.
2020-02-28 17:54:00 -04:00
Jason Rhinelander 57f0ca74da Added support for general (non-SN) communications
The existing code was largely set up for SN-to-SN or client-to-SN
communications, where messages can always get to the right place because
we can always send by pubkey.

This doesn't work when we want general communications with a random
remote address.

This commit overhauls the way loki-mq handles communication in a few
important ways:

- Listening instances no longer pass bind addresses into the
constructor; instead they call `listen_curve()` or `listen_plain()`
before invoking `start()`.

- `listen_curve()` is equivalent to the existing bind support: it
listens on a socket and accepts encrypted handshaked connections from
anyone who already knows the server's public key.

- `listen_plain()` is all new: it sets up a plain text listening socket
over which random clients can connect and talk.  End-points aren't
verified, and it isn't encrypted, but if you don't know who you are
talking to then encryption isn't doing anything anyway.

- Connecting to a remote now connections in CURVE encryption or NULL
(plain-text) encryption based on whether you provide a remote_pubkey.
For CURVE, the connection will fail if the pubkey does not match.

- `ConnectionID` objects are now returned when connecting to a remote
address; this object is then passed in to send/request/etc. to direct
the message.  For SN communication, ConnectionID's can be created
implicitly from SN pubkey strings, so the existing interface of
`lmq.send(pubkey, ...)` will still work in most cases.

- A ConnectionID is now passed to the ConnectSuccess and ConnectFailure
callbacks.  This can be used to uniquely identify which connection
succeeded or failed, and can determine whether the remote is a service
node (`.sn()`) and/or the pubkey (`.pubkey()`).  (Obviously the service
node status is only available when the client can do service node
lookups, and the pubkey() is only non-empty for encrypted connections).
2020-02-28 00:16:43 -04:00
Jason Rhinelander e4d371b026 Fixed string_view c++17 compatibility
string_view isn't supposed to be implicitly convertible to std::string
and code would break compiling under c++17 (when our local string_view
is simply a std::string_view typedef).
2020-02-24 22:20:56 -04:00
Jason Rhinelander c589599892 Fix off-by-one `remotes` access 2020-02-23 23:50:47 -04:00
Jason Rhinelander 99f4333b18 Add gcc 5 constexpr workaround 2020-02-23 16:28:22 -04:00
Jason Rhinelander f2ee3d9b41 constexpr string_view fixes
Pre-C++17 char_traits::compare isn't constexpr so we can't constexpr the
find/rfind methods that use it.

begin() etc, however, can be constexpr (and need to be for some of the
other constexpr methods here that use them).
2020-02-22 23:27:21 -04:00
Jason Rhinelander da96c1ec79 Make string_view a full std::string_view implementation
Adds the missing pieces plus adds a test script.
2020-02-22 21:59:07 -04:00
Jason Rhinelander 03827ac1f7 Properly remove pending requests after they are invoked 2020-02-20 20:42:44 -04:00
Jason Rhinelander ed9af92411 Fix reply_tag off-by-one 2020-02-20 20:23:16 -04:00
Jason Rhinelander de0a5842af Added debugging around pending request creation/removal 2020-02-20 20:13:29 -04:00
Jason Rhinelander 5d30846cee Disable non-working self-connection inproc socket
This was meant to be an optimization but doesn't actually work because
we don't do a ZAP request at all when connecting inproc sockets, so the
metadata we need never gets set.

Remove it so a SN can still connect to itself.
2020-02-17 17:59:01 -04:00
Jason Rhinelander 58e45db996 Fix truncated pubkey
Fix bug from quorumnet transition; quorumnet set the user id to
something like "S:abc..." or "C:abc..." to indicate SN or non-SN, but
now that gets carried in a separate message property but the +2 on the
copying position was still erroneously being used.
2020-02-16 22:42:03 -04:00
Jason Rhinelander b11d2870bd Fix verify pubkey initialization 2020-02-13 00:53:43 -04:00
Jason Rhinelander 98a4aed68f Fix bad string access for pubkey verification 2020-02-13 00:39:00 -04:00