Commit Graph

65 Commits

Author SHA1 Message Date
Jason Rhinelander caadd35052
epoll: fix hang on heavily loaded sockets
This fixes a hang in the epoll code that triggers on heavy, bursty
connections (such as the live SPNS APNs notifier).

It turns out that side-effects of processing our sockets could leave
other sockets (that we processed earlier in the loop) in a
needs-attention state which we might not notice if we go back to
epoll_wait right away.  zmq::poll apparently takes care of this (and so
is safe to re-poll even in this state), but when we are using epoll we
need to worry about it by always checking for zmq events (which itself
has side effects) and, if we get any, re-enter the loop body immediately
*without* polling to deal with them.
2023-09-15 18:29:23 -03:00
Jason Rhinelander 8f97add30f
Add epoll support for Linux
Each call to zmq::poll is painfully slow when we have many open zmq
sockets, such as when we have 1800 outbound connections (i.e. connected
to every other service node, as services nodes might have sometimes and
the Session push notification server *always* has).

In testing on my local Ryzen 5950 system each time we go back to
zmq::poll incurs about 1.5ms of (mostly system) CPU time with 2000 open
outbound sockets, and so if we're being pelted with a nearly constant
stream of requests (such as happens with the Session push notification
server) we incur massive CPU costs every time we finish processing
messages and go back to wait (via zmq::poll) for more.

In testing a simple ZMQ (no OxenMQ) client/server that establishes 2000
connections to a server, and then has the server send a message back on
a random connection every 1ms, we get atrocious CPU usage: the proxy
thread spends a constant 100% CPU time.  Virtually all of this is in the
poll call itself, though, so we aren't really bottlenecked by how much
can go through the proxy thread: in such a scenario the poll call uses
its CPU then returns right away, we process the queue of messages, and
return to another poll call.  If we have lots of messages received in
that time, though (because messages are coming fast and the poll was
slow) then we process a lot all at once before going back to the poll,
so the main consequences here are that:

1) We use a huge amount of CPU
2) We introduce latency in a busy situation because the CPU has to make
   the poll call (e.g. 1.5ms) before the next message can be processed.
3) If traffic is very bursty then the latency can manifest another
   problem: in the time it takes to poll we could accumulate enough
   incoming messages to overfill our internal per-category job queue,
   which was happening in the SPNS.

(I also tested with 20k connections, and the poll time scaling was
linear: we still processed everything, but in larger chunks because
every poll call took about 15ms, and so we'd have about 15 messages at a
time to process with added latency of up to 15ms).

Switching to epoll *drastically* reduces the CPU usage in two ways:

1) It's massively faster by design: there's a single setup and
   communication of all the polling details to the kernel which we only
   have to do when our set of zmq sockets changes (which is relatively
   rare).
2) We can further reduce CPU time because epoll tells us *which* sockets
   need attention, and so if only 1 connection out of the 2000 sent us
   something we can only bother checking that single socket for
   messages.  (In theory we can do the same with zmq::poll by querying
   for events available on the socket, but in practice it doesn't
   improve anything over just trying to read from them all).

In my straight zmq test script, using epoll instead reduced CPU usage in
the sends-every-1ms scenario from a constant pegged 100% of a core to an
average of 2-3% of a single core.  (Moreover this CPU usage level didn't
noticeably change when using 20k connections instead of 2k).
2023-09-14 15:03:15 -03:00
Jason Rhinelander b8bb10eac5 Redo random string generation
This is probably slightly more efficient (as it avoids going through
uniform_int_distribution), but more importantly, won't trigger some of
Apple's new xcode buggy crap.
2023-04-04 12:16:43 -03:00
Jason Rhinelander ff0e515c51
Fix installed headers
- Remove more deprecated shim headers
- Remove the gone (and newly gone) headers from the install list
- Add missing pubsub.h to install list
2022-10-05 20:26:34 -03:00
Jason Rhinelander 445f214840
Fix a race condition with tagged thread startup
There's a very rare race condition where a tagged thread doesn't seem to
exist when the proxy tries syncing startup with them, and so the proxy
thread hangs in startup.

This addresses it by avoiding looking at the `proxy_thread` variable
(which probably isn't thread safe) in the worker's startup, and
signalling the you-need-to-shutdown condition via a third option for the
(formerly boolean) `tagged_go`.
2022-10-05 19:32:54 -03:00
Thomas Winget 85437d167b initial implementation of generic pub/sub management
Implements a generic pub/sub system for RPC endpoints to allow clients
to subscribe to things.

patch version bump

tests included and passing
2022-09-28 15:43:45 -04:00
Jason Rhinelander 25f714371b
Remove deprecated code
- Removes the old lokimq name compatibility shims
- Removes the oxenmq::bt* -> oxenc::bt* shim headers
2022-09-28 13:28:48 -03:00
Jason Rhinelander edcde9246a
Fix zmq socket limit setting
MAX_SOCKETS wasn't working properly because ZMQ uses it when the context
is initialized, which happens when the first socket is constructed on
that context.

For OxenMQ, we had several sockets constructed on the context during
OxenMQ construction, which meant the context_t was being initialized
during OxenMQ construction, rather than during start(), and so setting
MAX_SOCKETS would have no effect and you'd always get the default.

This fixes it by making all the member variable zmq::socket_t's
default-constructed, then replacing them with proper zmq::socket_t's
during startup() so that we also defer zmq::context_t initialization to
the right place.

A second issue found during testing (also fixed here) is that the socket
worker threads use to communicate to the proxy could fail if the worker
socket creation would violate the zmq max sockets limit, which wound up
throwing an uncaught exception and aborting.  This pre-initializes (but
doesn't connect) all potential worker threads sockets during start() so
that the lazily-initialized worker thread will have one already set up
rather than having to create a new one (which could fail).
2022-08-05 10:40:01 -03:00
Sean Darcy c91e56cf2d adds custom formatter for OMQ structs that have to_string member 2022-08-04 10:50:02 +10:00
Jason Rhinelander ace6ea9d8e
Avoid unnecessary nullptr assignment
We can just leave the dangling pointer value in the `run` object: even
though we just deleted it, there's no need to reset this value because
it will never be used again.  (And even if we did, we don't check
against nullptr anyway so having a nullptr here doesn't make anything
safter than a dangling pointer).

The assignment (into the variant) uses a small amount of CPU (via
std::variant), so better for performance to just leave it dangling.
2022-05-12 12:48:46 -03:00
Jason Rhinelander 62a803f371
Add missing header
This was surely coming in implicitly already, but better to be explicit.
2022-05-12 12:48:15 -03:00
Jason Rhinelander d86ecb3a70
Use fixed vector for idle workers
Use a count + fixed size vector with a separate variable tracking the
size seems to perform slightly better than popping/pushing the vector.
2022-05-12 12:44:54 -03:00
Jason Rhinelander 45791d3a19
Use fixed array for known-small internal messages
Internal messages (control messages, worker messages) are always 3 parts
or less, so we can optimize by using a stack allocated std::array for
those cases rather than needing to continually clear and expand a heap
allocated vector.
2022-05-12 12:42:08 -03:00
Jason Rhinelander b8e4eb148f
Use raw index bytes in worker router
Change the internal worker routing id to be "w" followed by the raw
integer bytes, so that we can just memcpy them into a uint32_t rather
than needing to do str -> integer conversion on each received worker
message.

(This also eliminates a vestigal call into oxenc internals).
2022-05-12 12:38:13 -03:00
Jason Rhinelander fa6de369b2
Change std::queue to std::deque typedef
This shouldn't make any difference with an optimizing compiler, but
makes it easier a bit easier to experiment with different data structures.
2022-05-12 12:32:17 -03:00
Jason Rhinelander 371606cde0
Eliminate useless unordered_set
I don't know what this set was originally meant to be doing, but it
currently does nothing (except adding overhead).

The comment says it "owns" the instances but that isn't really true; the
instances effectively already manage themselves as they pass the pointer
through the communications between proxy and workers.
2022-05-12 12:25:46 -03:00
Jason Rhinelander 3a51713396
Add simpler Job subclass of Batch for simple jobs
This adds a much simpler `Job` implementation of `Batch` that is used
for simple no-return, no-completion jobs (as are initiated via
`omq.job(...)`).

This reduces the overhead involved in constructing/destroying the Batch
instance for these common jobs.
2022-05-12 12:20:51 -03:00
Jason Rhinelander 045df9cb9b
Use oxen-encoding and add compatibility shim headers
bt_*, hex, base32z, base64 all moved to oxen-encoding a while ago; this
finishes the move by removing them from oxenmq and instead making oxenmq
depend on oxen-encoding.
2022-01-18 10:30:23 -04:00
Jason Rhinelander fe8a1f4306
Disable IPv6 by default
libzmq's IPv6 support is buggy when also using DNS hostname: in
particular, if you try to connect to a DNS name that has an IPv6
address, then zmq will *only* try an IPv6 connection, even if the local
client has no IPv6 connectivity, and even if the remote is only
listening on its IPv4 address.

This is much too unreliable to enable by default.
2021-12-02 19:01:21 -04:00
Jason Rhinelander 3b634329ac Fix libc++
libc++ hates the forward declaration, so just include the <future>
header.
2021-11-30 14:29:24 -04:00
Jason Rhinelander 9c022b29de
Merge pull request #69 from jagerman/null-logger
Allow null logger
2021-11-30 14:21:39 -04:00
Jason Rhinelander 4d68868482
Merge pull request #68 from jagerman/start-throws-at-caller
Propagate proxy thread startup exceptions
2021-11-30 14:21:31 -04:00
Jason Rhinelander 430951bf3c
Merge pull request #66 from jagerman/address-hashing
Add std::hash implementation for oxenmq::address
2021-11-30 14:21:24 -04:00
Jason Rhinelander 03749c87f0
Merge pull request #67 from jagerman/ipv6
Enable ipv6 support on sockets
2021-11-30 14:20:43 -04:00
Jason Rhinelander 85d35fa505 Propagate proxy thread startup exceptions
Currently if the proxy thread fails to start (typically because a bind
fails) the exception happens in the proxy thread which is uncatchable by
the caller (and aborts the program).

This makes it nicer by transporting startup exceptions back to the
start() call.
2021-11-30 14:16:17 -04:00
Jason Rhinelander e180187746 Allow null logger
Currently if you pass a nullptr for Logger you get a random
std::bad_function_call called from some random thread the first time a
log message goes out.

This fixes it allow a nullptr that logs nothing.
2021-11-30 14:14:55 -04:00
Jason Rhinelander 375cfab4ce Rebrand variables LMQ -> OMQ
Various things still were using the lmq (or LMQ) names; change them all
to omq/OMQ.
2021-11-30 14:10:47 -04:00
Jason Rhinelander f04bd72a4c Enable ipv6 support on sockets
Without this you cannot bind or connect to IPv6 addresses because,
oddly, libzmq defaults ipv6 to disabled.
2021-11-30 13:50:24 -04:00
Jason Rhinelander 31f64821f8 Add std::hash implementation for oxenmq::address
So that you can store addresses in unordered_sets _maps.
2021-11-28 10:26:35 -04:00
Jason Rhinelander 39b6d89037 Updates for pyoxenmq 1.0.0
Makes some send/connection options more robust to "do nothing" runtime
value, which the Python wrapper needs.

Also found a bunch of doc typos and fixes.

Bump version to 1.2.8 so that new pyoxenmq can build-depend on it.
2021-10-21 22:56:13 -03:00
Jason Rhinelander 504d0d10ea
Merge pull request #52 from jagerman/convert-iterators
Make (and use) iterator approach for encoding/decoding
2021-10-13 18:17:28 -03:00
Jason Rhinelander 0d0ed8efa9 Fix r narrowing initialization warning when uint_fast16_t is small 2021-10-05 12:21:38 -03:00
Jason Rhinelander 02a542b9c6 Simplify iterator initialization & avoid warnings 2021-10-05 12:12:16 -03:00
Jason Rhinelander 9a8adb5bfd Add methods for unpadded base64 construction
The iterator has them; this adds wrapper methods to access them when not
using the iterator directly.
2021-10-01 18:53:05 -03:00
Jason Rhinelander 24dd7a3854 Make (and use) iterator approach for encoding/decoding
This allows for on-the-fly encoding/decoding, and also allows for
on-the-fly transcoding between types without needing intermediate string
allocations (see added test cases for examples).
2021-10-01 18:23:29 -03:00
Jason Rhinelander cd56ad8e08 Expose size calculations; stricter b32z/b64 validity checking
- Add {to,from}_{base64,base32z,hex}_size functions to calculate the
  resulting output size from a given input size.

- Use it internally

- Make b32z and b64 validity checking slightly stricter: currently we
  "accept" some b32z and b64 strings that contain an extra character
  that leave us with 5-7 trailing bits (base32z) or 6 trailing bits
  (base64).  We simply ignore the extra one if decoding, but we
  shouldn't accept it in the "is valid" calls.
2021-10-01 17:54:03 -03:00
Jason Rhinelander 6100802f82
Merge pull request #48 from majestrate/boob-operator-overload-2021-09-24
add operator() overload for defered message that sends reply
2021-09-28 01:32:58 -03:00
Jeff Becker 5a41e84378
add operator() overload for defered message that sends reply 2021-09-24 16:01:31 -04:00
Jason Rhinelander 377932607c Add const 2021-09-07 16:26:45 -03:00
Jason Rhinelander cdd21a9e81 Another workaround for crapple 2021-09-07 02:00:09 -03:00
Jason Rhinelander 977bced84e Apple workaround 2021-09-07 01:16:38 -03:00
Jason Rhinelander 9e3469d968 Add allocation-free bt-list and bt-dict producer
This should allow for b-encoding with better performance and less memory
fragmentation.

Documentation and test suite show how it's used.
2021-09-07 01:12:47 -03:00
Jason Rhinelander 2ac4379fa6 Make {to,from}_{hex/b64/b32} return output iterator
Changes the 3-iterator versions of to_hex, from_b32z, etc. to return the
final output iterator, which allows for much easier in-place "from"
conversion without needing a new string by doing something like:

    std::string data = /* some hex */;
    auto end = oxenmq::from_hex(data.begin(), data.end(), data.begin();
    data.erase(end, data.end());

Returning from the "to" converters is a bit less useful but doing it
anyway for consistency (and because it could still have some use, e.g.
if output is into some fixed buffer it lets you determine how much was
written).
2021-08-20 16:08:33 -03:00
Jason Rhinelander ae884d2f13 Fix backwards logic on overlapping ranges comment 2021-08-20 15:45:03 -03:00
Jason Rhinelander 45f358ab5f Remove debugging 2021-08-13 20:05:21 -03:00
Jason Rhinelander c6ae1faefa Downgrade "worker waiting for" log message to trace 2021-08-06 15:21:07 -03:00
Jason Rhinelander f553085558 Add support for inproc: requests
inproc support is special in zmq: in particular it completely bypasses
the auth layer, which causes problems in OxenMQ because we assume that a
message will always have auth information (set during initial connection
handshake).

This adds an "always-on" inproc listener and adds a new `connect_inproc`
method for a caller to establish a connection to it.

It also throws exceptions if you try to `listen_plain` or `listen_curve`
on an inproc address, because that won't work for the reasons detailed
above.
2021-08-04 20:15:16 -03:00
Jason Rhinelander 1d2246cda8 Remove debug 2021-07-01 01:08:32 -03:00
Jason Rhinelander 9e0d2e24f6 Add a single container version of `send_option::data_parts`
`send_option::data_parts(mycontainer)` is now a shortcut for
`send_option::data_parts(mycontainer.begin(), mycontainer.end())`.
2021-07-01 00:39:35 -03:00
Jason Rhinelander 4a6bb3f702 Fix messages coming back on an outgoing connection
The recent PR that revamped the connection IDs missed a case when
connecting to service nodes where we store the SN pubkey in peers, but
then fail to find the peer when we look it up by connection id.

This adds the required tracking to fix that case (and adds a test that
fails without the fix here).
2021-07-01 00:37:55 -03:00