Commit Graph

269 Commits

Author SHA1 Message Date
Jason Rhinelander 2966427cc0 Increase ZMQ socket limit
ZMQ's default is 1024, which we are close to hitting; this changes the
default for LokiMQ to 10000.
2020-04-17 16:13:04 -03:00
Jason Rhinelander 34bbaaf612 Use slower and exponential backoff in reconnection
ZMQ's default reconnection time is 100ms, indefinitely, which seems far
too aggressive, particularly where we have some potential for hundreds
or thousands of connections.

This changes the default to be slightly slower (250ms instead of 100ms)
on the first attempt, and to use exponential backoff doubling the time
between each failed connection attempt up to a max of 5s between
reconnection attempts to calm things down.
2020-04-17 16:09:53 -03:00
Jason Rhinelander b2518b8eb3 Fix broken idle expiry timeout
Idle time was being calculated as the negative of what it should have
been, so a connection idle for 30s was idle for "-30s", and since -30 is
not greater than whatever the idle time is, it would never expire and
get closed.

This was resulting in SNs keeping connections open forever, which was
very likely not helping with connectivity (and probably also responsible
for some of the connection rushes triggering ISP DDOS warnings).
2020-04-17 16:06:54 -03:00
Jason Rhinelander 712662f144 Fix storing reference to temporary
consume_string returns a temporary string; we wnat consume_string_view
which returns a view into the data being consumed.
2020-04-17 16:05:41 -03:00
Jason Rhinelander 131bc95f65 Fix pre-1.1.0 UNKNOWNCOMMAND detection
1.0.5 sends just ["UNKNOWNCOMMAND"], so the detection here was broken,
which resulted in a warning rather than just a debug log message.
2020-04-14 23:53:19 -03:00
Jason Rhinelander 3aa63c059d Test suite timing tweaks 2020-04-14 17:40:41 -03:00
Jason Rhinelander 7de36da483 Add ZMTP heartbeating (enabled by default)
ZMTP heartbeating should help keep the connection alive, and should
result in earlier detection of connection failures.
2020-04-14 16:08:54 -03:00
Jason Rhinelander b081cf9331 Add missing SET_SNS proxy handler 2020-04-13 16:11:30 -03:00
Jason Rhinelander 84bd5544cc Move pubkey_set into auth.h header
This allows it to be brought in without the full lokimq.h header.
2020-04-13 13:03:19 -03:00
Jason Rhinelander 3b86eb1341 1.1.0: invocation-time SN auth; failure responses
This replaces the recognition of SN status to be checked per-command
invocation rather than on connection.  As this breaks the API quite
substantially, though doesn't really affect the functionality, it seems
suitable to bump the minor version.

This requires a fundamental shift in how the calling application tells
LokiMQ about service nodes: rather than using a callback invoked on
connection, the application now has to call set_active_sns() (or the
more efficient update_active_sns(), if changes are readily available) to
update the list whenever it changes.  LokiMQ then keeps this list
internally and uses it when determining whether to invoke.

This release also brings better request responses on errors: when a
request fails, the data argument will now be set to the failure reason,
one of:

- TIMEOUT
- UNKNOWNCOMMAND
- NOT_A_SERVICE_NODE (the remote isn't running in SN mode)
- FORBIDDEN (auth level denies the request)
- FORBIDDEN_SN (SN required and the remote doesn't see us as a SN)

Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were
already sent by remotes, but there was no connection to a request and so
they would log a warning, but the request would have to time out.

These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command
is a request but didn't include a reply tag) are also sent in response
to regular commands, but they simply result in a log warning showing the
error type and the command that caused the failure when received.
2020-04-12 19:57:19 -03:00
Jason Rhinelander fb3bf9bd1f Bump version to 1.0.5 2020-04-06 18:16:59 -03:00
Jason Rhinelander 95540ec7d5 Fix pollitems_stale not being set in some cases
This could cause stalls of up to 250ms before we detect an incoming
message.
2020-04-06 13:16:55 -03:00
Jason Rhinelander af42875e97 Made simple_string_view take a char type
This allows (most usefully) a `ustring_view` for viewing unsigned char
strings.
2020-04-03 12:28:50 -03:00
Jason Rhinelander bc49b5e9a0 Expose advanced zmq context setting ability 2020-04-03 12:28:50 -03:00
Jason Rhinelander e3a86aaf71 Add `send_option::outgoing` to force a send on an outgoing connection
SS wants this, in particular, to be able to do reachability tests.
(Using connect_remote for this was bad with pubkey-based routing ids
because the second connection could replace an existing connection).
2020-04-03 01:34:21 -03:00
Jason Rhinelander b9e9f10f29 Reset stale pollitems
This was never being reset to false which could really hurt performance
(because it being false would cause the proxy socket reading loop to
short circuit before reading all available msgs, basically needing one
full proxy loop per incoming message).
2020-04-03 01:34:21 -03:00
Jason Rhinelander d4ffebebbd Change thread count logs to debug from trace 2020-04-03 01:34:21 -03:00
Jason Rhinelander 6ba70923b9 Add job queue check on total workers size
Without this there could be a race condition where a job could create a
new worker during shutdown, and end up causing an assert failure.
2020-03-29 15:43:17 -03:00
Jason Rhinelander 4c470f3e33 Bump version to 1.0.4 2020-03-29 15:21:44 -03:00
Jason Rhinelander bd196d08b8 Allow log level to be specified in constructor
It can still be set using `lmq.log_level(...)`, but this can be slightly
more convenient -- and without this log messages in the constructor are
completely useless.
2020-03-29 15:21:20 -03:00
Jason Rhinelander b66f653708 Less verbose logging at `info` level
Downgrades a bunch of not-useful-at-info-level debug messages from info
-> debug.  This makes `info` a more useful value for a client that wants
messages about startup/shutdown but not random non-serious connection
related messages.
2020-03-29 15:21:20 -03:00
Jason Rhinelander 716d73d196 All sends use dontwait; add send failure callbacks
We really don't *ever* want send to block, no matter how it is called,
since the send is always in the proxy thread.  This makes the actual
send call always non-blocking, and adds callbacks that we can invoke on
send failures: either on queue full errors (which might be recoverable),
or both full queue and hard failures (which are generally not
recoverable).  These callbacks are both optional: they have to be passed
in using `send_option::queue_full` (if you just want queue full
notifies) or `send_option::queue_failure` (if you want queue full
notifies *and* other send exceptions).
2020-03-29 15:21:20 -03:00
Jason Rhinelander 8e1b2dffa5 Catch connect failures
socket.connect() can throw, e.g. if given an invalid connection address;
catch this, log the error, and return a failure condition.
2020-03-29 14:40:21 -03:00
Jason Rhinelander 2493e2abd4 Remove empty file
All the batch implementation code is in jobs.cpp, this file wasn't meant
to be committed originally.
2020-03-29 12:29:38 -03:00
Jason Rhinelander bcca8dd34e Catch errors on internal msgs; support non-blocking sends
When we try to route an internal message ("BYE", "NOT_A_SERVICE_NODE",
etc.) back to the remote from the proxy thread we can end up trying to
send to a disconnected remote, which raises an exception, but this isn't
caught in proxy code: fix this by catching and ignoring it.

This also changes the code to send these messages in "dontwait" mode so
that if we can't queue the message we get (and ignore) an exception
rather than blocking.
2020-03-29 11:34:55 -03:00
Jason Rhinelander 7f9141a4a9 1.0.3 release 2020-03-27 18:55:16 -03:00
Jason Rhinelander fd19f7b183 Trim logged filenames to lokimq/*
Otherwise this includes the full build path which is gross.
2020-03-27 15:17:34 -03:00
Jason Rhinelander 0639bfa629 Avoid segfault on retried SN connection request
When we fail to send to a SN but can retry (e.g. because we had an
incoming connection which no longer works, but can retry an outgoing
connection) we were recursing, but this was resulting in a double-free
of the request callback (since we'd try to take ownership of the
incoming serialized pointer twice).

Rewrite the code to use a loop with single ownership instead.

This also changes the request callback behaviour to fire a failure
callback immediately if we can't send a request; previously you'd have
to wait for a timeout, but that is pointless if we couldn't get the
request out.
2020-03-27 14:59:11 -03:00
Jason Rhinelander a7c669775f Avoid masking ReplyCallback type with template param 2020-03-27 14:48:35 -03:00
Jason Rhinelander 9fec81856f 1.0.2 version bump 2020-03-24 11:35:31 -03:00
Jason Rhinelander 8b6f6f498c Make request timeout configurable
For example:

    lmq.request(conn, "some.method", callback, lokimq::request_timeout{5s});

will result in the callback being called with a failure if the response
doesn't arrive within 5s.  (If it still arrives, but after the failure
callback, it gets dropped).
2020-03-23 22:30:53 -03:00
Jason Rhinelander 75750001ce Reduce connection check interval and make configurable
The previous 1s default seems on the long side; this reduces it to
250ms.  It also makes it a public member so that it can be configured
(which is mainly needed for the test suite, but might be useful for
lokimq-calling code that needs faster or slower connection cleanups).
2020-03-23 22:29:14 -03:00
Jason Rhinelander b97f3442e7 Rename keep-alive -> keep_alive in internal serialization
This makes it consistent with other internal parameter names.
2020-03-23 22:28:23 -03:00
Jason Rhinelander 48d3f261d3 1.0.1 release
- internal data structure change to help armhf/gcc-6
- various test suite fixes
- various build system improvements
2020-03-21 12:57:45 -03:00
Jason Rhinelander 04e2bf7cf7 Change pending_connects from vector to list
Having this as a vector seems to cause armhf/gcc-6 to segfault.  On
closer inspection there's no good reason this should be a vector in the
first place: it only gets used during new connection handshaking and
isn't in any hot loop, plus the elements are fairly large tuples where
shifting elements is going to be relatively expensive.  Thus switching
it to a list everywhere (rather than just on old gcc arm) seems fine.
2020-03-21 12:56:46 -03:00
Jason Rhinelander 98b1bd6930 Add more locks around assertions
Catch2 isn't currently thread safe, so if we hit one of these assertions
while some other thread is doing things such as logging we might
segfault.
2020-03-21 12:56:13 -03:00
Jason Rhinelander 3a120efb79 Increase test timeouts for arm
These *sometimes* spurious fail because apparently they weren't quite
long enough to pass tests on my Pi 4.
2020-03-21 11:10:07 -03:00
Jason Rhinelander 0a7074c573 Add BUILD_BYPRODUCTS so that ninja build works 2020-03-19 19:54:24 -03:00
Jason Rhinelander a36e53d409 More linking overhaul
- Don't try to use cppzmq, just find libzmq ourselves.
- Allow existing `libzmq` and `sodium` targets to be used to control how
we link to libzmq and/or sodium.
- Use PkgConfig:: targets instead of the older bunch-of-variables
approach (requires cmake >= 3.6).
2020-03-15 01:43:23 -03:00
Jason Rhinelander bc0e6be801 Add sodium dep if embedding static lib when doing a shared build, too 2020-03-14 16:06:58 -03:00
Jason Rhinelander dd088c8ba5 cmake compatibility fix 2020-03-14 15:17:48 -03:00
Jason Rhinelander 3d315ba123 More static build linking fixes
Static linking is a dumpster fire.
2020-03-14 14:34:56 -03:00
Jason Rhinelander dd1a8eeb1d Use the correct variable for shared libs 2020-03-14 02:20:43 -03:00
Jason Rhinelander 1176b946e5 1.0.0 release 2020-03-13 21:08:34 -03:00
Jason Rhinelander ec50ee8cbd Compile libzmq statically if embedding 2020-03-13 21:08:34 -03:00
Jason Rhinelander c4d74a8640 Slightly relax build dep to 4.3
Distros (such as buster) include a patched 4.3.1, which is fine to use.
2020-03-13 19:41:08 -03:00
Jason Rhinelander 036e871cdb 32-bit warning fix 2020-03-13 19:05:12 -03:00
Jason Rhinelander 49f8ef21f1 Install mapbox-variant and cppzmq headers 2020-03-13 19:05:12 -03:00
Jason Rhinelander f7efd7f5c3 Add .gitignore 2020-03-13 17:47:40 -03:00
Jason Rhinelander a4ec2c982b Add and install pkgconfig file 2020-03-13 15:31:43 -03:00