oxen-mq/oxenmq/oxenmq-internal.h

145 lines
5.9 KiB
C
Raw Normal View History

#pragma once
Add epoll support for Linux Each call to zmq::poll is painfully slow when we have many open zmq sockets, such as when we have 1800 outbound connections (i.e. connected to every other service node, as services nodes might have sometimes and the Session push notification server *always* has). In testing on my local Ryzen 5950 system each time we go back to zmq::poll incurs about 1.5ms of (mostly system) CPU time with 2000 open outbound sockets, and so if we're being pelted with a nearly constant stream of requests (such as happens with the Session push notification server) we incur massive CPU costs every time we finish processing messages and go back to wait (via zmq::poll) for more. In testing a simple ZMQ (no OxenMQ) client/server that establishes 2000 connections to a server, and then has the server send a message back on a random connection every 1ms, we get atrocious CPU usage: the proxy thread spends a constant 100% CPU time. Virtually all of this is in the poll call itself, though, so we aren't really bottlenecked by how much can go through the proxy thread: in such a scenario the poll call uses its CPU then returns right away, we process the queue of messages, and return to another poll call. If we have lots of messages received in that time, though (because messages are coming fast and the poll was slow) then we process a lot all at once before going back to the poll, so the main consequences here are that: 1) We use a huge amount of CPU 2) We introduce latency in a busy situation because the CPU has to make the poll call (e.g. 1.5ms) before the next message can be processed. 3) If traffic is very bursty then the latency can manifest another problem: in the time it takes to poll we could accumulate enough incoming messages to overfill our internal per-category job queue, which was happening in the SPNS. (I also tested with 20k connections, and the poll time scaling was linear: we still processed everything, but in larger chunks because every poll call took about 15ms, and so we'd have about 15 messages at a time to process with added latency of up to 15ms). Switching to epoll *drastically* reduces the CPU usage in two ways: 1) It's massively faster by design: there's a single setup and communication of all the polling details to the kernel which we only have to do when our set of zmq sockets changes (which is relatively rare). 2) We can further reduce CPU time because epoll tells us *which* sockets need attention, and so if only 1 connection out of the 2000 sent us something we can only bother checking that single socket for messages. (In theory we can do the same with zmq::poll by querying for events available on the socket, but in practice it doesn't improve anything over just trying to read from them all). In my straight zmq test script, using epoll instead reduced CPU usage in the sends-every-1ms scenario from a constant pegged 100% of a core to an average of 2-3% of a single core. (Moreover this CPU usage level didn't noticeably change when using 20k connections instead of 2k).
2023-09-14 19:38:39 +02:00
#include <limits>
2021-01-14 19:37:14 +01:00
#include "oxenmq.h"
// Inside some method:
// OMQ_LOG(warn, "bad ", 42, " stuff");
//
#define OMQ_LOG(level, ...) log(LogLevel::level, __FILE__, __LINE__, __VA_ARGS__)
#ifndef NDEBUG
// Same as OMQ_LOG(trace, ...) when not doing a release build; nothing under a release build.
# define OMQ_TRACE(...) log(LogLevel::trace, __FILE__, __LINE__, __VA_ARGS__)
#else
# define OMQ_TRACE(...)
#endif
2021-01-14 19:37:14 +01:00
namespace oxenmq {
constexpr char SN_ADDR_COMMAND[] = "inproc://sn-command";
constexpr char SN_ADDR_WORKERS[] = "inproc://sn-workers";
constexpr char SN_ADDR_SELF[] = "inproc://sn-self";
constexpr char ZMQ_ADDR_ZAP[] = "inproc://zeromq.zap.01";
Add epoll support for Linux Each call to zmq::poll is painfully slow when we have many open zmq sockets, such as when we have 1800 outbound connections (i.e. connected to every other service node, as services nodes might have sometimes and the Session push notification server *always* has). In testing on my local Ryzen 5950 system each time we go back to zmq::poll incurs about 1.5ms of (mostly system) CPU time with 2000 open outbound sockets, and so if we're being pelted with a nearly constant stream of requests (such as happens with the Session push notification server) we incur massive CPU costs every time we finish processing messages and go back to wait (via zmq::poll) for more. In testing a simple ZMQ (no OxenMQ) client/server that establishes 2000 connections to a server, and then has the server send a message back on a random connection every 1ms, we get atrocious CPU usage: the proxy thread spends a constant 100% CPU time. Virtually all of this is in the poll call itself, though, so we aren't really bottlenecked by how much can go through the proxy thread: in such a scenario the poll call uses its CPU then returns right away, we process the queue of messages, and return to another poll call. If we have lots of messages received in that time, though (because messages are coming fast and the poll was slow) then we process a lot all at once before going back to the poll, so the main consequences here are that: 1) We use a huge amount of CPU 2) We introduce latency in a busy situation because the CPU has to make the poll call (e.g. 1.5ms) before the next message can be processed. 3) If traffic is very bursty then the latency can manifest another problem: in the time it takes to poll we could accumulate enough incoming messages to overfill our internal per-category job queue, which was happening in the SPNS. (I also tested with 20k connections, and the poll time scaling was linear: we still processed everything, but in larger chunks because every poll call took about 15ms, and so we'd have about 15 messages at a time to process with added latency of up to 15ms). Switching to epoll *drastically* reduces the CPU usage in two ways: 1) It's massively faster by design: there's a single setup and communication of all the polling details to the kernel which we only have to do when our set of zmq sockets changes (which is relatively rare). 2) We can further reduce CPU time because epoll tells us *which* sockets need attention, and so if only 1 connection out of the 2000 sent us something we can only bother checking that single socket for messages. (In theory we can do the same with zmq::poll by querying for events available on the socket, but in practice it doesn't improve anything over just trying to read from them all). In my straight zmq test script, using epoll instead reduced CPU usage in the sends-every-1ms scenario from a constant pegged 100% of a core to an average of 2-3% of a single core. (Moreover this CPU usage level didn't noticeably change when using 20k connections instead of 2k).
2023-09-14 19:38:39 +02:00
#ifdef OXENMQ_USE_EPOLL
constexpr auto EPOLL_COMMAND_ID = std::numeric_limits<uint64_t>::max();
constexpr auto EPOLL_WORKER_ID = std::numeric_limits<uint64_t>::max() - 1;
constexpr auto EPOLL_ZAP_ID = std::numeric_limits<uint64_t>::max() - 2;
#endif
/// Destructor for create_message(std::string&&) that zmq calls when it's done with the message.
extern "C" inline void message_buffer_destroy(void*, void* hint) {
delete reinterpret_cast<std::string*>(hint);
}
/// Creates a message without needing to reallocate the provided string data
1.1.0: invocation-time SN auth; failure responses This replaces the recognition of SN status to be checked per-command invocation rather than on connection. As this breaks the API quite substantially, though doesn't really affect the functionality, it seems suitable to bump the minor version. This requires a fundamental shift in how the calling application tells LokiMQ about service nodes: rather than using a callback invoked on connection, the application now has to call set_active_sns() (or the more efficient update_active_sns(), if changes are readily available) to update the list whenever it changes. LokiMQ then keeps this list internally and uses it when determining whether to invoke. This release also brings better request responses on errors: when a request fails, the data argument will now be set to the failure reason, one of: - TIMEOUT - UNKNOWNCOMMAND - NOT_A_SERVICE_NODE (the remote isn't running in SN mode) - FORBIDDEN (auth level denies the request) - FORBIDDEN_SN (SN required and the remote doesn't see us as a SN) Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were already sent by remotes, but there was no connection to a request and so they would log a warning, but the request would have to time out. These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command is a request but didn't include a reply tag) are also sent in response to regular commands, but they simply result in a log warning showing the error type and the command that caused the failure when received.
2020-04-13 00:57:19 +02:00
inline zmq::message_t create_message(std::string&& data) {
auto *buffer = new std::string(std::move(data));
return zmq::message_t{&(*buffer)[0], buffer->size(), message_buffer_destroy, buffer};
};
/// Create a message copying from a string_view
inline zmq::message_t create_message(std::string_view data) {
return zmq::message_t{data.begin(), data.end()};
}
template <typename It>
1.1.0: invocation-time SN auth; failure responses This replaces the recognition of SN status to be checked per-command invocation rather than on connection. As this breaks the API quite substantially, though doesn't really affect the functionality, it seems suitable to bump the minor version. This requires a fundamental shift in how the calling application tells LokiMQ about service nodes: rather than using a callback invoked on connection, the application now has to call set_active_sns() (or the more efficient update_active_sns(), if changes are readily available) to update the list whenever it changes. LokiMQ then keeps this list internally and uses it when determining whether to invoke. This release also brings better request responses on errors: when a request fails, the data argument will now be set to the failure reason, one of: - TIMEOUT - UNKNOWNCOMMAND - NOT_A_SERVICE_NODE (the remote isn't running in SN mode) - FORBIDDEN (auth level denies the request) - FORBIDDEN_SN (SN required and the remote doesn't see us as a SN) Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were already sent by remotes, but there was no connection to a request and so they would log a warning, but the request would have to time out. These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command is a request but didn't include a reply tag) are also sent in response to regular commands, but they simply result in a log warning showing the error type and the command that caused the failure when received.
2020-04-13 00:57:19 +02:00
bool send_message_parts(zmq::socket_t& sock, It begin, It end) {
while (begin != end) {
zmq::message_t &msg = *begin++;
if (!sock.send(msg, begin == end ? zmq::send_flags::dontwait : zmq::send_flags::dontwait | zmq::send_flags::sndmore))
return false;
}
return true;
}
template <typename Container>
1.1.0: invocation-time SN auth; failure responses This replaces the recognition of SN status to be checked per-command invocation rather than on connection. As this breaks the API quite substantially, though doesn't really affect the functionality, it seems suitable to bump the minor version. This requires a fundamental shift in how the calling application tells LokiMQ about service nodes: rather than using a callback invoked on connection, the application now has to call set_active_sns() (or the more efficient update_active_sns(), if changes are readily available) to update the list whenever it changes. LokiMQ then keeps this list internally and uses it when determining whether to invoke. This release also brings better request responses on errors: when a request fails, the data argument will now be set to the failure reason, one of: - TIMEOUT - UNKNOWNCOMMAND - NOT_A_SERVICE_NODE (the remote isn't running in SN mode) - FORBIDDEN (auth level denies the request) - FORBIDDEN_SN (SN required and the remote doesn't see us as a SN) Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were already sent by remotes, but there was no connection to a request and so they would log a warning, but the request would have to time out. These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command is a request but didn't include a reply tag) are also sent in response to regular commands, but they simply result in a log warning showing the error type and the command that caused the failure when received.
2020-04-13 00:57:19 +02:00
bool send_message_parts(zmq::socket_t& sock, Container&& c) {
return send_message_parts(sock, c.begin(), c.end());
}
/// Sends a message with an initial route. `msg` and `data` can be empty: if `msg` is empty then
/// the msg frame will be an empty message; if `data` is empty then the data frame will be omitted.
/// `flags` is passed through to zmq: typically given `zmq::send_flags::dontwait` to throw rather
/// than block if a message can't be queued.
1.1.0: invocation-time SN auth; failure responses This replaces the recognition of SN status to be checked per-command invocation rather than on connection. As this breaks the API quite substantially, though doesn't really affect the functionality, it seems suitable to bump the minor version. This requires a fundamental shift in how the calling application tells LokiMQ about service nodes: rather than using a callback invoked on connection, the application now has to call set_active_sns() (or the more efficient update_active_sns(), if changes are readily available) to update the list whenever it changes. LokiMQ then keeps this list internally and uses it when determining whether to invoke. This release also brings better request responses on errors: when a request fails, the data argument will now be set to the failure reason, one of: - TIMEOUT - UNKNOWNCOMMAND - NOT_A_SERVICE_NODE (the remote isn't running in SN mode) - FORBIDDEN (auth level denies the request) - FORBIDDEN_SN (SN required and the remote doesn't see us as a SN) Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were already sent by remotes, but there was no connection to a request and so they would log a warning, but the request would have to time out. These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command is a request but didn't include a reply tag) are also sent in response to regular commands, but they simply result in a log warning showing the error type and the command that caused the failure when received.
2020-04-13 00:57:19 +02:00
inline bool send_routed_message(zmq::socket_t& socket, std::string route, std::string msg = {}, std::string data = {}) {
assert(!route.empty());
std::array<zmq::message_t, 3> msgs{{create_message(std::move(route))}};
if (!msg.empty())
msgs[1] = create_message(std::move(msg));
if (!data.empty())
msgs[2] = create_message(std::move(data));
return send_message_parts(socket, msgs.begin(), data.empty() ? std::prev(msgs.end()) : msgs.end());
}
// Sends some stuff to a socket directly. If dontwait is true then we throw instead of blocking if
// the message cannot be accepted by zmq (i.e. because the outgoing buffer is full).
1.1.0: invocation-time SN auth; failure responses This replaces the recognition of SN status to be checked per-command invocation rather than on connection. As this breaks the API quite substantially, though doesn't really affect the functionality, it seems suitable to bump the minor version. This requires a fundamental shift in how the calling application tells LokiMQ about service nodes: rather than using a callback invoked on connection, the application now has to call set_active_sns() (or the more efficient update_active_sns(), if changes are readily available) to update the list whenever it changes. LokiMQ then keeps this list internally and uses it when determining whether to invoke. This release also brings better request responses on errors: when a request fails, the data argument will now be set to the failure reason, one of: - TIMEOUT - UNKNOWNCOMMAND - NOT_A_SERVICE_NODE (the remote isn't running in SN mode) - FORBIDDEN (auth level denies the request) - FORBIDDEN_SN (SN required and the remote doesn't see us as a SN) Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were already sent by remotes, but there was no connection to a request and so they would log a warning, but the request would have to time out. These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command is a request but didn't include a reply tag) are also sent in response to regular commands, but they simply result in a log warning showing the error type and the command that caused the failure when received.
2020-04-13 00:57:19 +02:00
inline bool send_direct_message(zmq::socket_t& socket, std::string msg, std::string data = {}) {
std::array<zmq::message_t, 2> msgs{{create_message(std::move(msg))}};
if (!data.empty())
msgs[1] = create_message(std::move(data));
return send_message_parts(socket, msgs.begin(), data.empty() ? std::prev(msgs.end()) : msgs.end());
}
// Receive all the parts of a single message from the given socket. Returns true if a message was
// received, false if called with flags=zmq::recv_flags::dontwait and no message was available.
1.1.0: invocation-time SN auth; failure responses This replaces the recognition of SN status to be checked per-command invocation rather than on connection. As this breaks the API quite substantially, though doesn't really affect the functionality, it seems suitable to bump the minor version. This requires a fundamental shift in how the calling application tells LokiMQ about service nodes: rather than using a callback invoked on connection, the application now has to call set_active_sns() (or the more efficient update_active_sns(), if changes are readily available) to update the list whenever it changes. LokiMQ then keeps this list internally and uses it when determining whether to invoke. This release also brings better request responses on errors: when a request fails, the data argument will now be set to the failure reason, one of: - TIMEOUT - UNKNOWNCOMMAND - NOT_A_SERVICE_NODE (the remote isn't running in SN mode) - FORBIDDEN (auth level denies the request) - FORBIDDEN_SN (SN required and the remote doesn't see us as a SN) Some of these (UNKNOWNCOMMAND, NOT_A_SERVICE_NODE, FORBIDDEN) were already sent by remotes, but there was no connection to a request and so they would log a warning, but the request would have to time out. These errors (minus TIMEOUT, plus NO_REPLY_TAG signalling that a command is a request but didn't include a reply tag) are also sent in response to regular commands, but they simply result in a log warning showing the error type and the command that caused the failure when received.
2020-04-13 00:57:19 +02:00
inline bool recv_message_parts(zmq::socket_t& sock, std::vector<zmq::message_t>& parts, const zmq::recv_flags flags = zmq::recv_flags::none) {
do {
zmq::message_t msg;
if (!sock.recv(msg, flags))
return false;
parts.push_back(std::move(msg));
} while (parts.back().more());
return true;
}
// Same as above, but using a fixed sized array; this is only used for internal jobs (e.g. control
// messages) where we know the message parts should never exceed a given size (this function does
// not bounds check except in debug builds). Returns the number of message parts received, or 0 on
// read error.
template <size_t N>
inline size_t recv_message_parts(zmq::socket_t& sock, std::array<zmq::message_t, N>& parts, const zmq::recv_flags flags = zmq::recv_flags::none) {
for (size_t count = 0; ; count++) {
assert(count < N);
if (!sock.recv(parts[count], flags))
return 0;
if (!parts[count].more())
return count + 1;
}
}
inline const char* peer_address(zmq::message_t& msg) {
try { return msg.gets("Peer-Address"); } catch (...) {}
return "(unknown)";
}
// Returns a string view of the given message data. It's the caller's responsibility to keep the
// referenced message alive. If you want a std::string instead just call `m.to_string()`
inline std::string_view view(const zmq::message_t& m) {
return {m.data<char>(), m.size()};
}
// Extracts and builds the "send" part of a message for proxy_send/proxy_reply
inline std::list<zmq::message_t> build_send_parts(oxenc::bt_list_consumer send, std::string_view route) {
std::list<zmq::message_t> parts;
if (!route.empty())
parts.push_back(create_message(route));
while (!send.is_finished())
parts.push_back(create_message(send.consume_string()));
return parts;
}
/// Sends a control message to a specific destination by prefixing the worker name (or identity)
/// then appending the command and optional data (if non-empty). (This is needed when sending the control message
/// to a router socket, i.e. inside the proxy thread).
inline void route_control(zmq::socket_t& sock, std::string_view identity, std::string_view cmd, const std::string& data = {}) {
sock.send(create_message(identity), zmq::send_flags::sndmore);
detail::send_control(sock, cmd, data);
}
}