Commit Graph

79 Commits

Author SHA1 Message Date
Jason Rhinelander 8f97add30f
Add epoll support for Linux
Each call to zmq::poll is painfully slow when we have many open zmq
sockets, such as when we have 1800 outbound connections (i.e. connected
to every other service node, as services nodes might have sometimes and
the Session push notification server *always* has).

In testing on my local Ryzen 5950 system each time we go back to
zmq::poll incurs about 1.5ms of (mostly system) CPU time with 2000 open
outbound sockets, and so if we're being pelted with a nearly constant
stream of requests (such as happens with the Session push notification
server) we incur massive CPU costs every time we finish processing
messages and go back to wait (via zmq::poll) for more.

In testing a simple ZMQ (no OxenMQ) client/server that establishes 2000
connections to a server, and then has the server send a message back on
a random connection every 1ms, we get atrocious CPU usage: the proxy
thread spends a constant 100% CPU time.  Virtually all of this is in the
poll call itself, though, so we aren't really bottlenecked by how much
can go through the proxy thread: in such a scenario the poll call uses
its CPU then returns right away, we process the queue of messages, and
return to another poll call.  If we have lots of messages received in
that time, though (because messages are coming fast and the poll was
slow) then we process a lot all at once before going back to the poll,
so the main consequences here are that:

1) We use a huge amount of CPU
2) We introduce latency in a busy situation because the CPU has to make
   the poll call (e.g. 1.5ms) before the next message can be processed.
3) If traffic is very bursty then the latency can manifest another
   problem: in the time it takes to poll we could accumulate enough
   incoming messages to overfill our internal per-category job queue,
   which was happening in the SPNS.

(I also tested with 20k connections, and the poll time scaling was
linear: we still processed everything, but in larger chunks because
every poll call took about 15ms, and so we'd have about 15 messages at a
time to process with added latency of up to 15ms).

Switching to epoll *drastically* reduces the CPU usage in two ways:

1) It's massively faster by design: there's a single setup and
   communication of all the polling details to the kernel which we only
   have to do when our set of zmq sockets changes (which is relatively
   rare).
2) We can further reduce CPU time because epoll tells us *which* sockets
   need attention, and so if only 1 connection out of the 2000 sent us
   something we can only bother checking that single socket for
   messages.  (In theory we can do the same with zmq::poll by querying
   for events available on the socket, but in practice it doesn't
   improve anything over just trying to read from them all).

In my straight zmq test script, using epoll instead reduced CPU usage in
the sends-every-1ms scenario from a constant pegged 100% of a core to an
average of 2-3% of a single core.  (Moreover this CPU usage level didn't
noticeably change when using 20k connections instead of 2k).
2023-09-14 15:03:15 -03:00
Jason Rhinelander 4f3ee28784
Bump version 2023-07-17 13:50:00 -03:00
Jason Rhinelander ff0e515c51
Fix installed headers
- Remove more deprecated shim headers
- Remove the gone (and newly gone) headers from the install list
- Add missing pubsub.h to install list
2022-10-05 20:26:34 -03:00
Thomas Winget 85437d167b initial implementation of generic pub/sub management
Implements a generic pub/sub system for RPC endpoints to allow clients
to subscribe to things.

patch version bump

tests included and passing
2022-09-28 15:43:45 -04:00
Jason Rhinelander 25f714371b
Remove deprecated code
- Removes the old lokimq name compatibility shims
- Removes the oxenmq::bt* -> oxenc::bt* shim headers
2022-09-28 13:28:48 -03:00
Jason Rhinelander edcde9246a
Fix zmq socket limit setting
MAX_SOCKETS wasn't working properly because ZMQ uses it when the context
is initialized, which happens when the first socket is constructed on
that context.

For OxenMQ, we had several sockets constructed on the context during
OxenMQ construction, which meant the context_t was being initialized
during OxenMQ construction, rather than during start(), and so setting
MAX_SOCKETS would have no effect and you'd always get the default.

This fixes it by making all the member variable zmq::socket_t's
default-constructed, then replacing them with proper zmq::socket_t's
during startup() so that we also defer zmq::context_t initialization to
the right place.

A second issue found during testing (also fixed here) is that the socket
worker threads use to communicate to the proxy could fail if the worker
socket creation would violate the zmq max sockets limit, which wound up
throwing an uncaught exception and aborting.  This pre-initializes (but
doesn't connect) all potential worker threads sockets during start() so
that the lazily-initialized worker thread will have one already set up
rather than having to create a new one (which could fail).
2022-08-05 10:40:01 -03:00
Sean Darcy c91e56cf2d adds custom formatter for OMQ structs that have to_string member 2022-08-04 10:50:02 +10:00
Jason Rhinelander b0c3bd4ee9
fix linkage for submodule dep use 2022-05-30 13:28:52 -03:00
Jason Rhinelander 4671af3ca0
Fix use of parent oxenc::oxenc target
oxen-mq's export command errored when using a parent oxenc target in a
submodule oxen-mq; add an intermediate IMPORTED target so that cmake
knows it doesn't have to export the oxenc dependency as well.
2022-05-30 13:07:49 -03:00
Jason Rhinelander 115c5550ca
Bump version & embedded oxenc version 2022-05-24 16:15:39 -03:00
Jason Rhinelander 5c7f6504d2
Fix cmake compilation properties
For some reason using target_compile_features doesn't properly set up
C++17 flags in the generate compile_commands.json, which then breaks
clang-complete.  Switch to use properties instead, which works.
2022-05-12 12:15:30 -03:00
Jason Rhinelander bbdf4af98f
cmake work-around for cmake < 3.21
PkgConfig::xyz won't exist before 3.21 if xyz doesn't require any flags
(which is common for a system-installed header-only library like oxenc).

(CMake bug 22180)
2022-03-30 16:09:40 -03:00
Jason Rhinelander 77c4840273
Fix extra file in header install list 2022-02-07 14:41:51 -04:00
Jason Rhinelander a0a54ed461
Fix static build 2022-02-07 14:38:19 -04:00
Jason Rhinelander 045df9cb9b
Use oxen-encoding and add compatibility shim headers
bt_*, hex, base32z, base64 all moved to oxen-encoding a while ago; this
finishes the move by removing them from oxenmq and instead making oxenmq
depend on oxen-encoding.
2022-01-18 10:30:23 -04:00
Jason Rhinelander fe8a1f4306
Disable IPv6 by default
libzmq's IPv6 support is buggy when also using DNS hostname: in
particular, if you try to connect to a DNS name that has an IPv6
address, then zmq will *only* try an IPv6 connection, even if the local
client has no IPv6 connectivity, and even if the remote is only
listening on its IPv4 address.

This is much too unreliable to enable by default.
2021-12-02 19:01:21 -04:00
Jason Rhinelander f88691b7e9 Bump version 2021-11-30 14:22:21 -04:00
Jason Rhinelander 39b6d89037 Updates for pyoxenmq 1.0.0
Makes some send/connection options more robust to "do nothing" runtime
value, which the Python wrapper needs.

Also found a bunch of doc typos and fixes.

Bump version to 1.2.8 so that new pyoxenmq can build-depend on it.
2021-10-21 22:56:13 -03:00
Jason Rhinelander 560d38d069 Allow disabling -Werror via a cmake option 2021-10-13 19:03:18 -03:00
Jason Rhinelander f553085558 Add support for inproc: requests
inproc support is special in zmq: in particular it completely bypasses
the auth layer, which causes problems in OxenMQ because we assume that a
message will always have auth information (set during initial connection
handshake).

This adds an "always-on" inproc listener and adds a new `connect_inproc`
method for a caller to establish a connection to it.

It also throws exceptions if you try to `listen_plain` or `listen_curve`
on an inproc address, because that won't work for the reasons detailed
above.
2021-08-04 20:15:16 -03:00
Jason Rhinelander 3991f50547 Bump project version in dev branch (for next release) 2021-05-23 10:36:44 -03:00
Jason Rhinelander 99a3f1d840 Bump (and cmake-modernize) version 2021-04-15 15:15:44 -03:00
Jason Rhinelander e3e79e1fb7 Bump version 2021-03-09 15:43:44 -04:00
Jason Rhinelander 86247bc5c7 Add missing header 2021-01-14 21:48:09 -04:00
Jason Rhinelander 396f591fae Remove deprecated string_view compat shim 2021-01-14 15:32:38 -04:00
Jason Rhinelander b49a94fb83 Export compile commands and use ccache by default 2021-01-14 15:32:38 -04:00
Jason Rhinelander 0738695eb9 Add lokimq compatibility headers 2021-01-14 15:32:38 -04:00
Jason Rhinelander 2ae6b96016 Rename LokiMQ to OxenMQ 2021-01-14 15:32:38 -04:00
Jason Rhinelander bd9313bf19 Fix decoding into a std::byte
Decoding into a std::byte output iterator was not working because the
`*out++ = val` assignment doesn't work when the output is std::byte and
val is a char/unsigned char/uint8_t.  Instead we need to explicitly
cast, but figuring out what we have to cast to is a little bit tricky.

This PR makes it work (and bumps the version for this and the is_hex
fix).
2020-12-14 13:05:14 -04:00
Jason Rhinelander 178bd4f674 Bump version for 1.2.2 release 2020-11-17 12:42:37 -04:00
Jason Rhinelander b1543513bb Don't install when building as a static subdirectory
This is making lokimq headers & static lib get installed when lokimq is
used as a project subdirectory, which is very annoying.

This adds an option for enabling the install lines, and only enables it
if doing a shared library or a top-level project build.
2020-11-17 12:40:59 -04:00
Jason Rhinelander ec0d44e143 Stable release bump 2020-10-19 23:44:24 -03:00
Jason Rhinelander 8ed529200b macOS 10.12 compatibility
Add var::get/var::visit implementations of std::get/std::visit that get
used if compiling for an old macos target, and use those.

The issue is that on a <10.14 macos target Apple's libc++ is missing
std::bad_variant_access, and so any method that can throw it (such as
std::get and std::visit) can't be used.  This workaround is ugly, but
such is life when you want to support running on Apple platforms.
2020-10-15 16:55:33 -03:00
Jason Rhinelander 318781a6d4 Update macos build to use 10.14 compatibility 2020-10-15 15:49:54 -03:00
Jason Rhinelander 0ac1d48bc8 Update bundled libzmq version 2020-09-16 11:46:04 -03:00
Jason Rhinelander faeeaa86d4 Add missing headers to installed header list 2020-08-13 12:38:46 -03:00
Jason Rhinelander 4e89dce5b6 Add "C" to languages
Without this CMAKE_C_COMPILER won't be set when building as a standalone
project, and we need that if we build the bundled libzmq.
2020-06-22 13:32:16 -03:00
Jason Rhinelander e072e68d84 Move -isystem hack inside if-found
This was breaking if we didn't find libzmq (or didn't find recent
enough) because the target didn't exist.
2020-05-19 22:55:57 -03:00
Jason Rhinelander e5a8d09127 Link to sodium publicly
The test suite needs this, in particular.
2020-05-15 01:36:01 -03:00
Jason Rhinelander 68c1899cda C++17 changes; replace mapbox with std::variant
Various small C++17 code improvements.

Replace mapbox::variant with std::variant.

Remove the bt_u64 type wrapper; instead we know have `bt_value` which
wraps a variant holding both int64_t and uint64_t, and has contructors
to send signed/unsigned integer types into the appropriate one.
lokimq::get_int checks both as appropriate during extraction.

As a side effect this means we no longer do the uint64_t -> int64_t
conversion on the wire, ever, without needing the wrapper; although this
can break older versions sending large positive integers (i.e. larger
than int64_t max) those weren't actually working completely reliably
with mapbox variant anyway, and the one place using such a value in loki
core (in a checksum) is already fully upgraded across the network
(currently using bt_u64, but always sending a positive value on the
wire).
2020-05-14 20:19:43 -03:00
Jason Rhinelander 7b42537801 Require C++17
Removes lokimq::string_view (the type alias is still provided for
backwards compat, but now is always std::string_view).

Bump version (on dev branch) to 1.2.0
2020-05-12 15:33:59 -03:00
Jason Rhinelander 8984dfc4ea Add address parsing/generating class
This class extends the basic ZMQ addresses with addresses that handle
parsing and generating of addresses with embedded curve pubkeys of
various forms, along with a QR-friendly address generator.
2020-05-08 21:42:16 -03:00
Jason Rhinelander 59a41943d4 Add support for setting umask when binding
This is needed to be able to control the permissions of any created ipc
sockets.
2020-05-06 14:52:41 -03:00
Jason Rhinelander 719a9b0b58 1.1.4 2020-04-30 15:12:50 -03:00
Jason Rhinelander 08a11bb9ba Add hack to fix compilation on debian sid 2020-04-28 22:52:18 -03:00
Jason Rhinelander 99bbf8dea9 Bump version (not released yet) 2020-04-23 21:51:52 -03:00
Jason Rhinelander 1a65d7f5e5 Bump version to 1.1.2 2020-04-21 16:59:41 -03:00
Jason Rhinelander 911c66140f Bump version to 1.1.1 2020-04-17 16:19:32 -03:00
Jason Rhinelander 3b86eb1341 1.1.0: invocation-time SN auth; failure responses
This replaces the recognition of SN status to be checked per-command
invocation rather than on connection.  As this breaks the API quite
substantially, though doesn't really affect the functionality, it seems
suitable to bump the minor version.

This requires a fundamental shift in how the calling application tells
LokiMQ about service nodes: rather than using a callback invoked on
connection, the application now has to call set_active_sns() (or the
more efficient update_active_sns(), if changes are readily available) to
update the list whenever it changes.  LokiMQ then keeps this list
internally and uses it when determining whether to invoke.

This release also brings better request responses on errors: when a
request fails, the data argument will now be set to the failure reason,
one of:

- TIMEOUT
- UNKNOWNCOMMAND
- NOT_A_SERVICE_NODE (the remote isn't running in SN mode)
- FORBIDDEN (auth level denies the request)
- FORBIDDEN_SN (SN required and the remote doesn't see us as a SN)

Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were
already sent by remotes, but there was no connection to a request and so
they would log a warning, but the request would have to time out.

These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command
is a request but didn't include a reply tag) are also sent in response
to regular commands, but they simply result in a log warning showing the
error type and the command that caused the failure when received.
2020-04-12 19:57:19 -03:00
Jason Rhinelander fb3bf9bd1f Bump version to 1.0.5 2020-04-06 18:16:59 -03:00