oxen-mq

mirror of https://github.com/oxen-io/oxen-mq.git synced 2023-12-13 21:00:31 +01:00

Author	SHA1	Message	Date
Jason Rhinelander	b905a8a4ff	Silence spurious warning on optional send failure When doing an optional send that gets declined (because we aren't connected) the "sending would block" warning would still be printed, but shouldn't be.	2020-04-29 14:54:54 -03:00
Jason Rhinelander	3a0508fdce	Fix incoming ConnectionIDs not being storable ConnectionIDs weren't comparing their routes, which meant that if external code stored one in a map or set all incoming connections on the same listener would be considered the same connection. This fixes it by considering route for equality/hashing, and strips route off internally where we need to map it to a socket.	2020-04-26 12:12:04 -03:00
Jason Rhinelander	f4f1506df0	Add remote address into Message object Can be useful for end point logging.	2020-04-24 18:59:33 -03:00
Jason Rhinelander	a812abd422	Fix ""_sv literal being non-constexpr	2020-04-23 21:59:24 -03:00
Jason Rhinelander	730633bbae	Provide caller Access in Message This lets a callback set up something at, say, basic level, but provide different values for an admin auth remote than a basic auth remote.	2020-04-23 21:52:39 -03:00
Jason Rhinelander	e7cd2dedc2	Change worker thread names: "w2" -> "lmq-w2"	2020-04-21 16:58:10 -03:00
Jason Rhinelander	6ddf033674	Fix proxy thread stall when workers fill up When we hit the limit on the number of workers the proxy thread would stop processing incoming messages, sending it into an infinite loop of death. The check was supposed to use `active_workers()` rather than `workers.size()`, but even that isn't quite right: we want to always pull all incoming messages off and queue them internally since different categories have their own queue sizes (and so we have to pull it off to know whether we want to keep it -- if spare category queue room -- or drop it).	2020-04-21 16:55:40 -03:00
Jason Rhinelander	0ebfef2164	Set thread names on proxy/workers Makes debugging which threads are using CPU easier.	2020-04-21 12:02:44 -03:00
Jason Rhinelander	fc1ea66599	Reduce heartbeat frequency to 15s 3s was excessive especially considering that the default heartbeat timeout is set to 30s.	2020-04-18 02:58:22 -03:00
Jason Rhinelander	238dfa7f78	Drop idle connections regularly The check here on "only if we have some idle workers" fails catastrophically with one worker because that worker is always occupied when this code gets called because of how the loop works and so connections don't get expired at all.	2020-04-18 02:55:12 -03:00
Jason Rhinelander	2966427cc0	Increase ZMQ socket limit ZMQ's default is 1024, which we are close to hitting; this changes the default for LokiMQ to 10000.	2020-04-17 16:13:04 -03:00
Jason Rhinelander	34bbaaf612	Use slower and exponential backoff in reconnection ZMQ's default reconnection time is 100ms, indefinitely, which seems far too aggressive, particularly where we have some potential for hundreds or thousands of connections. This changes the default to be slightly slower (250ms instead of 100ms) on the first attempt, and to use exponential backoff doubling the time between each failed connection attempt up to a max of 5s between reconnection attempts to calm things down.	2020-04-17 16:09:53 -03:00
Jason Rhinelander	b2518b8eb3	Fix broken idle expiry timeout Idle time was being calculated as the negative of what it should have been, so a connection idle for 30s was idle for "-30s", and since -30 is not greater than whatever the idle time is, it would never expire and get closed. This was resulting in SNs keeping connections open forever, which was very likely not helping with connectivity (and probably also responsible for some of the connection rushes triggering ISP DDOS warnings).	2020-04-17 16:06:54 -03:00
Jason Rhinelander	712662f144	Fix storing reference to temporary consume_string returns a temporary string; we wnat consume_string_view which returns a view into the data being consumed.	2020-04-17 16:05:41 -03:00
Jason Rhinelander	131bc95f65	Fix pre-1.1.0 UNKNOWNCOMMAND detection 1.0.5 sends just ["UNKNOWNCOMMAND"], so the detection here was broken, which resulted in a warning rather than just a debug log message.	2020-04-14 23:53:19 -03:00
Jason Rhinelander	7de36da483	Add ZMTP heartbeating (enabled by default) ZMTP heartbeating should help keep the connection alive, and should result in earlier detection of connection failures.	2020-04-14 16:08:54 -03:00
Jason Rhinelander	b081cf9331	Add missing SET_SNS proxy handler	2020-04-13 16:11:30 -03:00
Jason Rhinelander	84bd5544cc	Move pubkey_set into auth.h header This allows it to be brought in without the full lokimq.h header.	2020-04-13 13:03:19 -03:00
Jason Rhinelander	3b86eb1341	1.1.0: invocation-time SN auth; failure responses This replaces the recognition of SN status to be checked per-command invocation rather than on connection. As this breaks the API quite substantially, though doesn't really affect the functionality, it seems suitable to bump the minor version. This requires a fundamental shift in how the calling application tells LokiMQ about service nodes: rather than using a callback invoked on connection, the application now has to call set_active_sns() (or the more efficient update_active_sns(), if changes are readily available) to update the list whenever it changes. LokiMQ then keeps this list internally and uses it when determining whether to invoke. This release also brings better request responses on errors: when a request fails, the data argument will now be set to the failure reason, one of: - TIMEOUT - UNKNOWNCOMMAND - NOT_A_SERVICE_NODE (the remote isn't running in SN mode) - FORBIDDEN (auth level denies the request) - FORBIDDEN_SN (SN required and the remote doesn't see us as a SN) Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were already sent by remotes, but there was no connection to a request and so they would log a warning, but the request would have to time out. These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command is a request but didn't include a reply tag) are also sent in response to regular commands, but they simply result in a log warning showing the error type and the command that caused the failure when received.	2020-04-12 19:57:19 -03:00
Jason Rhinelander	95540ec7d5	Fix pollitems_stale not being set in some cases This could cause stalls of up to 250ms before we detect an incoming message.	2020-04-06 13:16:55 -03:00
Jason Rhinelander	af42875e97	Made simple_string_view take a char type This allows (most usefully) a `ustring_view` for viewing unsigned char strings.	2020-04-03 12:28:50 -03:00
Jason Rhinelander	bc49b5e9a0	Expose advanced zmq context setting ability	2020-04-03 12:28:50 -03:00
Jason Rhinelander	e3a86aaf71	Add `send_option::outgoing` to force a send on an outgoing connection SS wants this, in particular, to be able to do reachability tests. (Using connect_remote for this was bad with pubkey-based routing ids because the second connection could replace an existing connection).	2020-04-03 01:34:21 -03:00
Jason Rhinelander	b9e9f10f29	Reset stale pollitems This was never being reset to false which could really hurt performance (because it being false would cause the proxy socket reading loop to short circuit before reading all available msgs, basically needing one full proxy loop per incoming message).	2020-04-03 01:34:21 -03:00
Jason Rhinelander	d4ffebebbd	Change thread count logs to debug from trace	2020-04-03 01:34:21 -03:00
Jason Rhinelander	6ba70923b9	Add job queue check on total workers size Without this there could be a race condition where a job could create a new worker during shutdown, and end up causing an assert failure.	2020-03-29 15:43:17 -03:00
Jason Rhinelander	bd196d08b8	Allow log level to be specified in constructor It can still be set using `lmq.log_level(...)`, but this can be slightly more convenient -- and without this log messages in the constructor are completely useless.	2020-03-29 15:21:20 -03:00
Jason Rhinelander	b66f653708	Less verbose logging at `info` level Downgrades a bunch of not-useful-at-info-level debug messages from info -> debug. This makes `info` a more useful value for a client that wants messages about startup/shutdown but not random non-serious connection related messages.	2020-03-29 15:21:20 -03:00
Jason Rhinelander	716d73d196	All sends use dontwait; add send failure callbacks We really don't ever want send to block, no matter how it is called, since the send is always in the proxy thread. This makes the actual send call always non-blocking, and adds callbacks that we can invoke on send failures: either on queue full errors (which might be recoverable), or both full queue and hard failures (which are generally not recoverable). These callbacks are both optional: they have to be passed in using `send_option::queue_full` (if you just want queue full notifies) or `send_option::queue_failure` (if you want queue full notifies and other send exceptions).	2020-03-29 15:21:20 -03:00
Jason Rhinelander	8e1b2dffa5	Catch connect failures socket.connect() can throw, e.g. if given an invalid connection address; catch this, log the error, and return a failure condition.	2020-03-29 14:40:21 -03:00
Jason Rhinelander	2493e2abd4	Remove empty file All the batch implementation code is in jobs.cpp, this file wasn't meant to be committed originally.	2020-03-29 12:29:38 -03:00
Jason Rhinelander	bcca8dd34e	Catch errors on internal msgs; support non-blocking sends When we try to route an internal message ("BYE", "NOT_A_SERVICE_NODE", etc.) back to the remote from the proxy thread we can end up trying to send to a disconnected remote, which raises an exception, but this isn't caught in proxy code: fix this by catching and ignoring it. This also changes the code to send these messages in "dontwait" mode so that if we can't queue the message we get (and ignore) an exception rather than blocking.	2020-03-29 11:34:55 -03:00
Jason Rhinelander	fd19f7b183	Trim logged filenames to lokimq/* Otherwise this includes the full build path which is gross.	2020-03-27 15:17:34 -03:00
Jason Rhinelander	0639bfa629	Avoid segfault on retried SN connection request When we fail to send to a SN but can retry (e.g. because we had an incoming connection which no longer works, but can retry an outgoing connection) we were recursing, but this was resulting in a double-free of the request callback (since we'd try to take ownership of the incoming serialized pointer twice). Rewrite the code to use a loop with single ownership instead. This also changes the request callback behaviour to fire a failure callback immediately if we can't send a request; previously you'd have to wait for a timeout, but that is pointless if we couldn't get the request out.	2020-03-27 14:59:11 -03:00
Jason Rhinelander	a7c669775f	Avoid masking ReplyCallback type with template param	2020-03-27 14:48:35 -03:00
Jason Rhinelander	8b6f6f498c	Make request timeout configurable For example: lmq.request(conn, "some.method", callback, lokimq::request_timeout{5s}); will result in the callback being called with a failure if the response doesn't arrive within 5s. (If it still arrives, but after the failure callback, it gets dropped).	2020-03-23 22:30:53 -03:00
Jason Rhinelander	75750001ce	Reduce connection check interval and make configurable The previous 1s default seems on the long side; this reduces it to 250ms. It also makes it a public member so that it can be configured (which is mainly needed for the test suite, but might be useful for lokimq-calling code that needs faster or slower connection cleanups).	2020-03-23 22:29:14 -03:00
Jason Rhinelander	b97f3442e7	Rename keep-alive -> keep_alive in internal serialization This makes it consistent with other internal parameter names.	2020-03-23 22:28:23 -03:00
Jason Rhinelander	04e2bf7cf7	Change pending_connects from vector to list Having this as a vector seems to cause armhf/gcc-6 to segfault. On closer inspection there's no good reason this should be a vector in the first place: it only gets used during new connection handshaking and isn't in any hot loop, plus the elements are fairly large tuples where shifting elements is going to be relatively expensive. Thus switching it to a list everywhere (rather than just on old gcc arm) seems fine.	2020-03-21 12:56:46 -03:00
Jason Rhinelander	036e871cdb	32-bit warning fix	2020-03-13 19:05:12 -03:00
Jason Rhinelander	49f8ef21f1	Install mapbox-variant and cppzmq headers	2020-03-13 19:05:12 -03:00
Jason Rhinelander	e17ca30411	Split up into logical headers and compilation units lokimq.cpp and lokimq.h were getting monolithic; this splits lokimq.cpp into multiple smaller cpp files by logical purpose for better parallel compilation ability. It also splits up the lokimq.h header slightly by moving the ConnectionID and Message types into their own headers.	2020-03-13 14:28:21 -03:00
Jason Rhinelander	1c80b61335	Add version to cmake, generate version header	2020-03-13 14:28:05 -03:00
Jason Rhinelander	f4fad9c194	Fix problems on outgoing disconnect This removes two superfluous erases that occur during connection closing (the proxy_close_connection just above them already removes the element from `peers`), and also short-circuits the incoming message loop if our pollitems becomes stale so that we don't try to use a closed connection. It also fixes a bug in the outgoing connection index that was decrementing the wrong connection indices, leading to failures when trying to send on an existing connection after a disconnect. Also adds a test case (which fails before the changes in this commit) to test this.	2020-03-10 13:54:41 -03:00
Jason Rhinelander	0ce614ef8b	Workarounds for old cmake/gcc	2020-03-05 01:33:02 -04:00
Jason Rhinelander	fa7d4a8a42	Silence -Wmismatched-tags warning	2020-03-05 01:19:29 -04:00
Jason Rhinelander	882750b700	Don't use consumed data when recursing proxy_send Fixes "Internal error: Invalid proxy send command; conn_pubkey or conn_id missing"	2020-03-04 00:13:01 -04:00
Jason Rhinelander	3cb52df837	Fix conn index error The conn_index entry wasn't being added for outgoing SN connections.	2020-03-03 23:26:14 -04:00
Jason Rhinelander	88f0a10bd8	Fix infinite loop on idle peer expiry	2020-03-03 18:12:13 -04:00
Jason Rhinelander	ea5ff7790d	Fix destructor when `start()` hasn't been called	2020-03-03 17:28:53 -04:00
Jason Rhinelander	428ef12506	Add missing `inline` to de-templatized hex funcs	2020-03-03 15:06:39 -04:00
Jason Rhinelander	465b398b10	Use string_view instead of taking string type by template Fixes the case of using a char *	2020-03-02 18:50:43 -04:00
Jason Rhinelander	dcb7e4df0b	Add a category command helper class This allows simplifying: lmq.add_category("foo", ...); lmq.add_command("foo", "a", ...); lmq.add_command("foo", "b", ...); lmq.add_request_command("foo", "c", ...); to: lmq.add_category("foo", ...) .add_command("a", ...) .add_command("b", ...) .add_request_command("b", ...) ;	2020-03-02 15:11:54 -04:00
Jason Rhinelander	a43ee15b58	_sv wasn't being defined inline	2020-03-02 15:11:29 -04:00
Jason Rhinelander	b3abcfc9ae	Add ""_sv literal that works just like C++17 ""sv	2020-03-02 14:24:07 -04:00
Jason Rhinelander	f18f86cf96	Allow `optional` and `incoming` to take a bool This makes it much more convenience to use them with a run-time condition; this simplifies: if (should_be_optional) lmq.send(..., send_option::optional{}); else lmq.send(...); to: lmq.send(..., send_option::optional{should_be_optional});	2020-03-02 14:24:07 -04:00
Jason Rhinelander	0493082040	Add default allow to `listen_plain()` The default on listen_curve() was supposed to go on both.	2020-03-01 23:54:06 -04:00
Jason Rhinelander	1cf02d0c66	Fix invalid access in peer address debug message	2020-03-01 15:21:09 -04:00
Jason Rhinelander	e4f93afafa	Fix trace IP address	2020-02-29 16:34:37 -04:00
Jason Rhinelander	1ad68b2605	Wrap HI response in try/catch It's possible that the client has gone away, in which case we can get an exception raised from an unroutable request.	2020-02-29 16:06:08 -04:00
Jason Rhinelander	3be632e73e	Fix auth_level for remote connections The wrong key was being set here (deserialization expects auth_level).	2020-02-29 16:04:43 -04:00
Jason Rhinelander	09c487f327	Add ability to use random routing ids for outgoing	2020-02-29 16:03:25 -04:00
Jason Rhinelander	ece8870896	Move routing prefix into ConnectionID This allows storing a ConnectionID received in a message callback and using it later to send another message along the connection without worrying about a routing id: the ConnectionID will have it if it is required. Previously you would have had to store the ConnectionID and the routing prefix, and then specified the route as a send_option::route{}, which was annoying and cumbersome.	2020-02-29 16:01:47 -04:00
Jason Rhinelander	28e36a3eaf	Make AuthLevel stream printable	2020-02-29 15:16:58 -04:00
Jason Rhinelander	2743e576b2	Distinguish between batch jobs and reply jobs This adds a separate category (and reserve count) for "reply jobs", which are jobs triggered by receiving a reply to a request, or after a successful connect or unsuccessful timeout. Previously these were scheduled as regular batch jobs; this schedules them as a new "reply jobs" category with its own reserved threads count. This also changes the defaults for batch jobs and reply jobs to be based on the specified general workers count rather than directly on hardware concurrency, so that if you are on a 16-thread CPU but override general workers from its default of 16 to 4 and don't change batch workers you now get reserved batch workers set to 2 rather than 8 which constrains the typical parallel batch jobs to 4 (i.e. the general worker limit) rather than exceeding it with the batch job limit. Similarly for reply jobs, which is now ceil(general/8) by default.	2020-02-28 17:54:00 -04:00
Jason Rhinelander	57f0ca74da	Added support for general (non-SN) communications The existing code was largely set up for SN-to-SN or client-to-SN communications, where messages can always get to the right place because we can always send by pubkey. This doesn't work when we want general communications with a random remote address. This commit overhauls the way loki-mq handles communication in a few important ways: - Listening instances no longer pass bind addresses into the constructor; instead they call `listen_curve()` or `listen_plain()` before invoking `start()`. - `listen_curve()` is equivalent to the existing bind support: it listens on a socket and accepts encrypted handshaked connections from anyone who already knows the server's public key. - `listen_plain()` is all new: it sets up a plain text listening socket over which random clients can connect and talk. End-points aren't verified, and it isn't encrypted, but if you don't know who you are talking to then encryption isn't doing anything anyway. - Connecting to a remote now connections in CURVE encryption or NULL (plain-text) encryption based on whether you provide a remote_pubkey. For CURVE, the connection will fail if the pubkey does not match. - `ConnectionID` objects are now returned when connecting to a remote address; this object is then passed in to send/request/etc. to direct the message. For SN communication, ConnectionID's can be created implicitly from SN pubkey strings, so the existing interface of `lmq.send(pubkey, ...)` will still work in most cases. - A ConnectionID is now passed to the ConnectSuccess and ConnectFailure callbacks. This can be used to uniquely identify which connection succeeded or failed, and can determine whether the remote is a service node (`.sn()`) and/or the pubkey (`.pubkey()`). (Obviously the service node status is only available when the client can do service node lookups, and the pubkey() is only non-empty for encrypted connections).	2020-02-28 00:16:43 -04:00
Jason Rhinelander	e4d371b026	Fixed string_view c++17 compatibility string_view isn't supposed to be implicitly convertible to std::string and code would break compiling under c++17 (when our local string_view is simply a std::string_view typedef).	2020-02-24 22:20:56 -04:00
Jason Rhinelander	c589599892	Fix off-by-one `remotes` access	2020-02-23 23:50:47 -04:00
Jason Rhinelander	99f4333b18	Add gcc 5 constexpr workaround	2020-02-23 16:28:22 -04:00
Jason Rhinelander	f2ee3d9b41	constexpr string_view fixes Pre-C++17 char_traits::compare isn't constexpr so we can't constexpr the find/rfind methods that use it. begin() etc, however, can be constexpr (and need to be for some of the other constexpr methods here that use them).	2020-02-22 23:27:21 -04:00
Jason Rhinelander	da96c1ec79	Make string_view a full std::string_view implementation Adds the missing pieces plus adds a test script.	2020-02-22 21:59:07 -04:00
Jason Rhinelander	03827ac1f7	Properly remove pending requests after they are invoked	2020-02-20 20:42:44 -04:00
Jason Rhinelander	ed9af92411	Fix reply_tag off-by-one	2020-02-20 20:23:16 -04:00
Jason Rhinelander	de0a5842af	Added debugging around pending request creation/removal	2020-02-20 20:13:29 -04:00
Jason Rhinelander	5d30846cee	Disable non-working self-connection inproc socket This was meant to be an optimization but doesn't actually work because we don't do a ZAP request at all when connecting inproc sockets, so the metadata we need never gets set. Remove it so a SN can still connect to itself.	2020-02-17 17:59:01 -04:00
Jason Rhinelander	58e45db996	Fix truncated pubkey Fix bug from quorumnet transition; quorumnet set the user id to something like "S:abc..." or "C:abc..." to indicate SN or non-SN, but now that gets carried in a separate message property but the +2 on the copying position was still erroneously being used.	2020-02-16 22:42:03 -04:00
Jason Rhinelander	b11d2870bd	Fix verify pubkey initialization	2020-02-13 00:53:43 -04:00
Jason Rhinelander	98a4aed68f	Fix bad string access for pubkey verification	2020-02-13 00:39:00 -04:00
Jason Rhinelander	1b6f38fc07	Add LMQ_TRACE macro instead of LMQ_LOG(trace LMQ_TRACE becomes nothing under a release build, which is good because many traces are in the proxy hot path. Also fixes some confusing log level comparison logic by flipping the order of log levels. Previous trace < debug < info < warn, which was confusing: now that order is reversed which makes way more sense (i.e. larger LogLevel means more logging).	2020-02-12 22:19:18 -04:00
Jason Rhinelander	13e7953bd7	Updated headers and remove dead code	2020-02-12 22:05:15 -04:00
Jason Rhinelander	63e70f9912	Fix delayed proxy-scheduled batch jobs Batch jobs scheduled by the proxy thread itself were delayed to the next poll timeout (because nothing ever gets sent on a socket). Add a variable to bypass the next poll to handle this case.	2020-02-11 19:08:19 -04:00
Jason Rhinelander	ccfb6d080b	Add request/reply abstraction This allows making RPC requests with a callback that gets called when the response comes back. The is essentially a wrapper around doing it yourself (i.e. by setting up a server-side "request" and client-side "reply" command where "request" responds with a "reply" command), but abstracted into lokimq itself as it is likely to be very useful when integrating client/server connections rather than peer-to-peer connections.	2020-02-11 02:30:07 -04:00
Jason Rhinelander	061bdee0a8	Add zmq timer support	2020-02-11 02:29:00 -04:00
Jason Rhinelander	7393e8422c	Improve on-the-fly bt deserialization interface This separates the dict traversal into `next_...` functions that return a key-value pair, and `consume_...` that return just the value. The latter is particularly useful when using `skip_until` to position yourself at a known key.	2020-02-11 02:18:14 -04:00
Jason Rhinelander	3ff66490ad	Add ability to run a batch completion job in the proxy thread For very small jobs (i.e. just setting a flag or something) this is going to be faster than dispatching to a thread.	2020-02-11 02:17:12 -04:00
Jason Rhinelander	03ea49167c	Various small optimizations	2020-02-06 00:50:31 -04:00
Jason Rhinelander	f75b6cf221	Added batch job implementation + various improvements This overhauls the proposed batch implementation (described in the README but previously not implemented) and implements it. Various other minor improvements and code restructuring. Added a proposed "request" type to the README to be implemented; this is like a command, but always expects a ["REPLY", TAG, ...] response. The request invoker provides a callback to invoke when such a REPLY arrives.	2020-02-05 20:21:27 -04:00
Jason Rhinelander	f3d583c520	Initial LokiMQ release This library is adapted from lokid's existing quorumnet code (added in 6.x) used for SN-to-SN communication for quorum voting but generalized to be usable both there and as a basis for other communication channels with loki projects (for example: wallet-to-lokid communication; loki-ss and lokinet internal communication with lokid; loki-ss to loki-ss communication and message passing; perhaps eventually loki p2p traffic). This initial release compiles but likely has a few warts and bugs that need ironing out in the implementation before it is production ready. Some tests will follow.	2020-02-02 22:39:26 -04:00

1 2 3

138 commits