Initial LokiMQ release

This library is adapted from lokid's existing quorumnet code (added in
6.x) used for SN-to-SN communication for quorum voting but generalized
to be usable both there and as a basis for other communication channels
with loki projects (for example: wallet-to-lokid communication; loki-ss
and lokinet internal communication with lokid; loki-ss to loki-ss
communication and message passing; perhaps eventually loki p2p traffic).

This initial release compiles but likely has a few warts and bugs that
need ironing out in the implementation before it is production ready.
Some tests will follow.
This commit is contained in:
Jason Rhinelander 2020-02-02 22:39:26 -04:00
commit f3d583c520
10 changed files with 3546 additions and 0 deletions

6
.gitmodules vendored Normal file
View File

@ -0,0 +1,6 @@
[submodule "mapbox-variant"]
path = mapbox-variant
url = https://github.com/mapbox/variant.git
[submodule "cppzmq"]
path = cppzmq
url = https://github.com/zeromq/cppzmq.git

269
README.md Normal file
View File

@ -0,0 +1,269 @@
# LokiMQ - zeromq-based message passing for Loki projects
This C++14 library contains an abstraction layer around ZeroMQ to support integration with Loki
authentication and message passing. It is designed to be usable as the underlying communication
mechanism of SN-to-SN communication ("quorumnet"), the RPC interface used by wallets and local
daemon commands, communication channels between lokid and auxiliary services (storage server,
lokinet), and also provides a local multithreaded job scheduling within a process.
It is not required to use this library to interact with loki components as a client: this is mainly
intended to abstract away much of the server-side handling.
All messages are encrypted (using x25519).
This library makes minimal use of mutexes, and none in the hot paths of the code, instead mostly
relying on ZMQ sockets for synchronization; for more information on this (and why this is generally
much better performing and more scalable) see the ZMQ guide documentation on the topic.
## Basic message structure
LokiMQ messages consist of 1+ part messages where the first part is a string command and remaining
parts are command-specific data.
The command string is one of two types:
`basic` - for basic commands such as authentication (`login`) handled by LokiMQ itself. These
commands may not contain a `.`, and are reserved for LokiMQ itself.
`category.command` - for commands registered by the LokiMQ caller (e.g. lokid). Here `category`
must be at least one character not containing a `.` and `command` may be anything. These categories
and commands are registered according to general function and authentication level (more on this
below). For example, for lokid categories are:
- `system` - is for RPC commands related to the system administration such as mining, getting
sensitive statistics, accessing SN private keys, remote shutdown, etc.
- `sn` - is for SN-to-SN communication such as blink quorum and uptime proof obligation votes.
- `blink` - is for public blink commands (i.e. blink submission) and is only provided by nodes
running as service nodes.
- `blockchain` - is for remote blockchain access such as retrieving blocks and transactions as well
as subscribing to updates for new blocks, transactions, and service node states.
## Command invocation
The application registers categories and registers commands within these categories with callbacks.
The callbacks are passed a LokiMQ::Message object from which the message (plus various connection
information) can be obtained. There is no structure imposed at all on the data passed in subsequent
message parts: it is up to the command itself to deserialize however it wishes (e.g. JSON,
bt-encoded, or any other encoding).
The Message object also provides methods for replying to the caller. Simple replies queue a reply
if the client is still connected. Replies to service nodes can also be "strong" replies: when
replying to a SN that has closed connection with a strong reply we will attempt to reestablish a
connection to deliver the message. In order for this to work the LokiMQ caller must provide a
lookup function to retrieve the remote address given a SN x25519 pubkey.
### Callbacks
Invoked command functions are always invoked with exactly one arguments: a non-const LokiMQ::Message
reference from which the connection info, LokiMQ object, and message data can be obtained. If you
need some extra state data (for example, a reference to some high level object) the LokiMQ object
has an opaque public `void* data` member intended for exactly this purpose.
## Authentication
Each category has access control consisting of three values:
- Auth level, one of:
- None - no authentication required at all, any remote client may invoke this command
- Basic - this requires a basic authentication level (None access is implied)
- Admin - this requires administrative access (Basic access is implied)
- ServiceNode (bool) - if true this requires that the remote connection has proven its identity as
an active service node (via its x25519 key).
- LocalServiceNode (bool) - if true this requires that the local node is running in service node
mode (note that it is *not* required that the local SN be *active*).
Authentication level components are cumulative: for example, a category with Basic auth +
ServiceNode=true + LocalServiceNode=true would only be access if all three conditions are met.
The authentication mechanism works in two ways: defaults based on configuration, and explicit
logins.
Configuration defaults allows controlling the default access for an incoming connection based on its
remote address. Typically this is used to allow connections from localhost (or a unix domain
socket) to automatically be an Admin connection without requiring explicit authentication. This
also allows configuration of how public connections should be treated: for example, a lokid running
as a public RPC server would do so by granting Basic access to all incoming connections.
Explicit logins allow the daemon to specify username/passwords with mapping to Basic or Admin
authentication levels.
Thus, for example, a daemon could be configured to be allow Basic remote access with authentication
(i.e. requiring a username/password login given out to people who should be able to access).
For example, in lokid the categories described above have authentication levels of:
- `system` - Admin
- `sn` - ServiceNode
- `blink` - LocalServiceNode
- `blockchain` - Basic
### Service Node authentication
In order to handle ServiceNode authentication, LokiMQ uses an Allow callback invoked during
connection to determine both whether to allow the connection, and to determine whether the incoming
connection is an active service node.
Note that this status persists for the life of the connection (i.e. it is not rechecked on each
command invocation). If you require stronger protection against being called by
decommissioned/deregistered service nodes from a connection established when the SN was active then
the callback itself will need to verify when invoked.
## Command aliases
Command aliases can be specified. For example, an alias `a.b` -> `c.d` will rewrite any incoming
`a.b` command to `c.d` before handling it. These are applied *before* any authentication is
performed.
The main purpose here is for backwards compatibility either for renaming category or commands, or
for changing command access levels by moving it from one category to another. It's recommended that
such aliases be used only temporarily for version transitions.
## Threads
LokiMQ operates a pool of worker threads to handle jobs. The simplest use just allocates new jobs
to a free worker thread, and we have a "general threads" value to configure how many such threads
are available.
You may, however, also reserve a minimum number of workers per command category. For example, you
could reserve 1 thread for the `sys` category and 2 for the `qnet` category plus 8 general threads.
The general threads will be used most of the time for any categories (including `sys` and `qnet`),
but there will always be at least 1/2 worker threads either currently working on or available for
incoming `system`/`sn` commands. General thread gets used first; only if all general threads are
currently busy *and* a category has unused reserved threads will an additional thread be used.
Note that these actual reserved threads are not exclusive: reserving M of N total threads for a
category simply ensures that no more than (N-M) threads are being used for other categories at any
given time, but the actual jobs may run on any worker thread.
As mentioned above, LokiMQ tries to avoid exceeding the configured general threads value (G)
whenever possible: the only time we will dispatch a job to a worker thread when we have >= G threads
already running is when a new command arrives, the category reserves M threads, and the thread pool
is currently processing fewer than M jobs for that category.
Some examples: assume A and B are commands that take sufficiently long to run that we receive all
commands before the first job is finished. Suppose that A's category reserves 2 threads, B's
category has no reserved threads, and we have 4 general threads configured.
Example 1: commands arrive in order AABBBB. This will not exceed 4 threads: when the third B
arrives there are already 4 jobs running so it gets queued.
Example 2: commands arrive in order AABBAA. This also won't exceed 4 threads: when the third A
arrives there are already 4 jobs running and two of them are already A jobs, so there are no
remaining slots for A jobs.
Example 3: BBBAAA. This won't exceed 5 threads: the first four get started normally. When the
second A arrives there are 4 threads running, but only 1 of them is an A thus there is still a free
slot for A jobs so we start the second A on a fifth thread. The third A, however, has no A jobs
available so gets queued.
Exmaple 4: BBBBBBAAA. At most 6 jobs. The 5th and 6th B's get queued (all general workers are
busy). The first and second get started (there are two unused reserved A slots), the third one gets
queued. The 5th and 6th B's are already interesting on their own: they won't be started until there
are only three active jobs; the third A won't be started until *either* there are only three active
jobs, or one of the other A's finish.
Thus the general thread count should be regarded as the "normal" thread limit and reserved threads
allow an extra burst of thread activity *only if* all general threads are busy with other categories
when a command with reserve threads arrived.
## Internal job queuing
This library supports queuing internal jobs (internally these are in the "" (empty string) category,
which is not externally accessible). These jobs are quite different from ordinary jobs: they have
no authentication and can only be submitted by the program itself to its own worker threads. They
have either no second message part, or else one single message part consists of an opaque void
pointer value. This pointer is passed by value to the registered function, which must take exactly
one `void *` argument.
It is entirely the responsibility of the caller and callee to deal with the `void *` argument,
including construction/destruction/etc. This is very low level but allow the most flexibility. For
example, a caller might do something like:
```C++
// Set up a function that takes ownership:
void hello1(void *data) {
auto* str = static_cast<std::string*>(data);
std::cout << "Hello1 " << *str << "\n";
delete str;
}
LokiMQ::register_task("hello1", &hello1);
// Another function that doesn't take ownership (and handles nullptr):
void hello2(void *data) { //
std::cout << "Hello2 " <<
(data ? *static_cast<std::string*>(data) : "anonymous") <<
"\n";
}
LokiMQ::register_task("hello2", &hello2);
// Later, in the calling code:
const static std::string there{"there"};
void myfunc() {
// ...
lmq.job(&hello1, new std::string{"world"}); // Give up ownership of the pointer
lmq.job(&hello2, &there); // Passing an externally valid pointer
// But don't do this:
//std::string world{"world"};
//lmq.job(&hello2, &world); // Bad: `world` will probably be destroyed
// before the callback gets invoked
}
```
### Dealing with synchronization of jobs
A common pattern is one where a single thread suddenly has some work that can be be parallelized.
We can easily queue all the jobs into the worker thread pool (see above), but the issue then is how
to continue when the work is done. You could (but shouldn't) employ some blocking, locking, mutex +
condition variable monstrosity, but you shouldn't.
Instead LokiMQ provides a mechanism for this by allowing you to submit a batch of jobs with a
completion callback. All jobs will be queued and, when the last one finishes, the finalization
callback will be queued to continue with the task.
From the caller point of view this requires splitting the logic into two parts, a "Before" that sets
up the batch, a "Job" that does the work (multiple times), and an "After" that continues once all
jobs are finished.
For example, the following example shows how you might use it to convert from input values (0 to 49)
to some other output value:
```C++
struct task_data { int input; double result; };
// Called for each job.
void do_my_task(void* in) {
auto& x = *static_cast<task_data*>(in);
x.result = 42.0 * x.input; // Job
}
void start_big_task() {
// ... Before code ...
auto* results = new std::vector<task_data>{50};
lokimq::Batch batch;
for (size_t i = 0; i < results->size(); i++) {
auto* r = (*result)[i];
r->input = i;
batch.add_job(&do_my_task, r);
}
lmq.job(batch, &continue_big_task, results);
// ... to be continued in `continue_big_task` after all the jobs finish
}
// This will be called once all the `do_my_task` calls have completed. (Note that we could be in
// a different thread from the one `start_big_task()` was running in).
void continue_big_task(void* rptr) {
// Put into a unique_ptr to deal with ownership
std::unique_ptr<std::vector<task_data>> results{static_cast<std::vector<int>*>(rptr)};
double sum = 0;
for (auto &r : results) sum += r;
std::cout << "All done, sum = " << sum << "\n";
}
```
This code deliberately does not support blocking to wait for the tasks to finish: if you want such a
bad design you can implement it yourself; LokiMQ isn't going to help you hurt yourself.

1
cppzmq Submodule

@ -0,0 +1 @@
Subproject commit 8d5c9a88988dcbebb72939ca0939d432230ffde1

220
lokimq/bt_serialize.cpp Normal file
View File

@ -0,0 +1,220 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "bt_serialize.h"
namespace lokimq {
namespace detail {
/// Reads digits into an unsigned 64-bit int.
uint64_t extract_unsigned(string_view& s) {
if (s.empty())
throw bt_deserialize_invalid{"Expected 0-9 but found end of string"};
if (s[0] < '0' || s[0] > '9')
throw bt_deserialize_invalid("Expected 0-9 but found '"s + s[0]);
uint64_t uval = 0;
while (!s.empty() && (s[0] >= '0' && s[0] <= '9')) {
uint64_t bigger = uval * 10 + (s[0] - '0');
s.remove_prefix(1);
if (bigger < uval) // overflow
throw bt_deserialize_invalid("Integer deserialization failed: value is too large for a 64-bit int");
uval = bigger;
}
return uval;
}
void bt_deserialize<string_view>::operator()(string_view& s, string_view& val) {
if (s.size() < 2) throw bt_deserialize_invalid{"Deserialize failed: given data is not an bt-encoded string"};
if (s[0] < '0' || s[0] > '9')
throw bt_deserialize_invalid_type{"Expected 0-9 but found '"s + s[0] + "'"};
uint64_t len = extract_unsigned(s);
if (s.empty() || s[0] != ':')
throw bt_deserialize_invalid{"Did not find expected ':' during string deserialization"};
s.remove_prefix(1);
if (len > s.size())
throw bt_deserialize_invalid{"String deserialization failed: encoded string length is longer than the serialized data"};
val = {s.data(), len};
s.remove_prefix(len);
}
// Check that we are on a 2's complement architecture. It's highly unlikely that this code ever
// runs on a non-2s-complement architecture (especially since C++20 requires a two's complement
// signed value behaviour), but check at compile time anyway because we rely on these relations
// below.
static_assert(std::numeric_limits<int64_t>::min() + std::numeric_limits<int64_t>::max() == -1 &&
static_cast<uint64_t>(std::numeric_limits<int64_t>::max()) + uint64_t{1} == (uint64_t{1} << 63),
"Non 2s-complement architecture not supported!");
std::pair<maybe_signed_int64_t, bool> bt_deserialize_integer(string_view& s) {
// Smallest possible encoded integer is 3 chars: "i0e"
if (s.size() < 3) throw bt_deserialize_invalid("Deserialization failed: end of string found where integer expected");
if (s[0] != 'i') throw bt_deserialize_invalid_type("Deserialization failed: expected 'i', found '"s + s[0] + '\'');
s.remove_prefix(1);
std::pair<maybe_signed_int64_t, bool> result;
if (s[0] == '-') {
result.second = true;
s.remove_prefix(1);
}
uint64_t uval = extract_unsigned(s);
if (s.empty())
throw bt_deserialize_invalid("Integer deserialization failed: encountered end of string before integer was finished");
if (s[0] != 'e')
throw bt_deserialize_invalid("Integer deserialization failed: expected digit or 'e', found '"s + s[0] + '\'');
s.remove_prefix(1);
if (result.second) { // negative
if (uval > (uint64_t{1} << 63))
throw bt_deserialize_invalid("Deserialization of integer failed: negative integer value is too large for a 64-bit signed int");
result.first.i64 = -uval;
} else {
result.first.u64 = uval;
}
return result;
}
template struct bt_deserialize<int64_t>;
template struct bt_deserialize<uint64_t>;
void bt_deserialize<bt_value, void>::operator()(string_view& s, bt_value& val) {
if (s.size() < 2) throw bt_deserialize_invalid("Deserialization failed: end of string found where bt-encoded value expected");
switch (s[0]) {
case 'd': {
bt_dict dict;
bt_deserialize<bt_dict>{}(s, dict);
val = std::move(dict);
break;
}
case 'l': {
bt_list list;
bt_deserialize<bt_list>{}(s, list);
val = std::move(list);
break;
}
case 'i': {
auto read = bt_deserialize_integer(s);
val = read.first.i64; // We only store an i64, but can get a u64 out of it via get<uint64_t>(val)
break;
}
case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': {
std::string str;
bt_deserialize<std::string>{}(s, str);
val = std::move(str);
break;
}
default:
throw bt_deserialize_invalid("Deserialize failed: encountered invalid value '"s + s[0] + "'; expected one of [0-9idl]");
}
}
} // namespace detail
bt_list_consumer::bt_list_consumer(string_view data_) : data{std::move(data_)} {
if (data.empty()) throw std::runtime_error{"Cannot create a bt_list_consumer with an empty string_view"};
if (data[0] != 'l') throw std::runtime_error{"Cannot create a bt_list_consumer with non-list data"};
data.remove_prefix(1);
}
/// Attempt to parse the next value as a string (and advance just past it). Throws if the next
/// value is not a string.
string_view bt_list_consumer::consume_string() {
if (data.empty())
throw bt_deserialize_invalid{"expected a string, but reached end of data"};
else if (!is_string())
throw bt_deserialize_invalid_type{"expected a string, but found "s + data.front()};
string_view next{data}, result;
detail::bt_deserialize<string_view>{}(next, result);
data = next;
return result;
}
/// Consumes a value without returning it.
void bt_list_consumer::skip_value() {
if (is_string())
consume_string();
else if (is_integer())
detail::bt_deserialize_integer(data);
else if (is_list())
consume_list_data();
else if (is_dict())
consume_dict_data();
else
throw bt_deserialize_invalid_type{"next bt value has unknown type"};
}
string_view bt_list_consumer::consume_list_data() {
auto start = data.begin();
if (data.size() < 2 || !is_list()) throw bt_deserialize_invalid_type{"next bt value is not a list"};
data.remove_prefix(1); // Descend into the sublist, consume the "l"
while (!is_finished()) {
skip_value();
if (data.empty())
throw bt_deserialize_invalid{"bt list consumption failed: hit the end of string before the list was done"};
}
data.remove_prefix(1); // Back out from the sublist, consume the "e"
return {start, static_cast<size_t>(std::distance(start, data.begin()))};
}
string_view bt_list_consumer::consume_dict_data() {
auto start = data.begin();
if (data.size() < 2 || !is_dict()) throw bt_deserialize_invalid_type{"next bt value is not a dict"};
data.remove_prefix(1); // Descent into the dict, consumer the "d"
while (!is_finished()) {
consume_string(); // Key is always a string
if (!data.empty())
skip_value();
if (data.empty())
throw bt_deserialize_invalid{"bt dict consumption failed: hit the end of string before the dict was done"};
}
data.remove_prefix(1); // Back out of the dict, consume the "e"
return {start, static_cast<size_t>(std::distance(start, data.begin()))};
}
bt_dict_consumer::bt_dict_consumer(string_view data_) {
data = std::move(data_);
if (data.empty()) throw std::runtime_error{"Cannot create a bt_dict_consumer with an empty string_view"};
if (data.size() < 2 || data[0] != 'd') throw std::runtime_error{"Cannot create a bt_dict_consumer with non-dict data"};
data.remove_prefix(1);
}
bool bt_dict_consumer::consume_key() {
if (key_.data())
return true;
if (data.empty()) throw bt_deserialize_invalid_type{"expected a key or dict end, found end of string"};
if (data[0] == 'e') return false;
key_ = bt_list_consumer::consume_string();
if (data.empty() || data[0] == 'e')
throw bt_deserialize_invalid{"dict key isn't followed by a value"};
return true;
}
} // namespace lokimq

788
lokimq/bt_serialize.h Normal file
View File

@ -0,0 +1,788 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <iostream>
#pragma once
#include <vector>
#include <list>
#include <unordered_map>
#include <algorithm>
#include <functional>
#include <cstring>
#include <ostream>
#include <sstream>
#include "string_view.h"
#include "mapbox/variant.hpp"
namespace lokimq {
using namespace std::literals;
/** \file
* LokiMQ serialization for internal commands is very simple: we support two primitive types,
* strings and integers, and two container types, lists and dicts with string keys. On the wire
* these go in BitTorrent byte encoding as described in BEP-0003
* (https://www.bittorrent.org/beps/bep_0003.html#bencoding).
*
* On the C++ side, on input we allow strings, integral types, STL-like containers of these types,
* and STL-like containers of pairs with a string first value and any of these types as second
* value. We also accept std::variants (if compiled with std::variant support, i.e. in C++17 mode)
* that contain any of these, and mapbox::util::variants (the internal type used for its recursive
* support).
*
* One minor deviation from BEP-0003 is that we don't support serializing values that don't fit in a
* 64-bit integer (BEP-0003 specifies arbitrary precision integers).
*
* On deserialization we can either deserialize into a mapbox::util::variant that supports everything, or
* we can fill a container of your given type (though this fails if the container isn't compatible
* with the deserialized data).
*/
/// Exception throw if deserialization fails
class bt_deserialize_invalid : public std::invalid_argument {
using std::invalid_argument::invalid_argument;
};
/// A more specific subclass that is thown if the serialization type is an initial mismatch: for
/// example, trying deserializing an int but the next thing in input is a list. This is not,
/// however, thrown if the type initially looks fine but, say, a nested serialization fails. This
/// error will only be thrown when the input stream has not been advanced (and so can be tried for a
/// different type).
class bt_deserialize_invalid_type : public bt_deserialize_invalid {
using bt_deserialize_invalid::bt_deserialize_invalid;
};
class bt_list;
class bt_dict;
/// Recursive generic type that can fully represent everything valid for a BT serialization.
using bt_value = mapbox::util::variant<
std::string,
int64_t,
mapbox::util::recursive_wrapper<bt_list>,
mapbox::util::recursive_wrapper<bt_dict>
>;
/// Very thin wrapper around a std::list<bt_value> that holds a list of generic values (though *any*
/// compatible data type can be used).
class bt_list : public std::list<bt_value> {
using std::list<bt_value>::list;
};
/// Very thin wrapper around a std::unordered_map<bt_value> that holds a list of string -> generic
/// value pairs (though *any* compatible data type can be used).
class bt_dict : public std::unordered_map<std::string, bt_value> {
using std::unordered_map<std::string, bt_value>::unordered_map;
};
#ifdef __cpp_lib_void_t
using std::void_t;
#else
/// C++17 void_t backport
template <typename... Ts> struct void_t_impl { using type = void; };
template <typename... Ts> using void_t = typename void_t_impl<Ts...>::type;
#endif
namespace detail {
/// Reads digits into an unsigned 64-bit int.
uint64_t extract_unsigned(string_view& s);
inline uint64_t extract_unsigned(string_view&& s) { return extract_unsigned(s); }
// Fallback base case; we only get here if none of the partial specializations below work
template <typename T, typename SFINAE = void>
struct bt_serialize { static_assert(!std::is_same<T, T>::value, "Cannot serialize T: unsupported type for bt serialization"); };
template <typename T, typename SFINAE = void>
struct bt_deserialize { static_assert(!std::is_same<T, T>::value, "Cannot deserialize T: unsupported type for bt deserialization"); };
/// Checks that we aren't at the end of a string view and throws if we are.
inline void bt_need_more(const string_view &s) {
if (s.empty())
throw bt_deserialize_invalid{"Unexpected end of string while deserializing"};
}
union maybe_signed_int64_t { int64_t i64; uint64_t u64; };
/// Deserializes a signed or unsigned 64-bit integer from an input stream. Sets the second bool to
/// true iff the value is int64_t because a negative value was read. Throws an exception if the
/// read value doesn't fit in a int64_t (if negative) or a uint64_t (if positive). Removes consumed
/// characters from the string_view.
std::pair<maybe_signed_int64_t, bool> bt_deserialize_integer(string_view& s);
/// Integer specializations
template <typename T>
struct bt_serialize<T, std::enable_if_t<std::is_integral<T>::value>> {
static_assert(sizeof(T) <= sizeof(uint64_t), "Serialization of integers larger than uint64_t is not supported");
void operator()(std::ostream &os, const T &val) {
// Cast 1-byte types to a larger type to avoid iostream interpreting them as single characters
using output_type = std::conditional_t<(sizeof(T) > 1), T, std::conditional_t<std::is_signed<T>::value, int, unsigned>>;
os << 'i' << static_cast<output_type>(val) << 'e';
}
};
template <typename T>
struct bt_deserialize<T, std::enable_if_t<std::is_integral<T>::value>> {
void operator()(string_view& s, T &val) {
constexpr uint64_t umax = static_cast<uint64_t>(std::numeric_limits<T>::max());
constexpr int64_t smin = static_cast<int64_t>(std::numeric_limits<T>::min()),
smax = static_cast<int64_t>(std::numeric_limits<T>::max());
auto read = bt_deserialize_integer(s);
if (std::is_signed<T>::value) {
if (!read.second) { // read a positive value
if (read.first.u64 > umax)
throw bt_deserialize_invalid("Integer deserialization failed: found too-large value " + std::to_string(read.first.u64) + " > " + std::to_string(umax));
val = static_cast<T>(read.first.u64);
} else {
bool oob = read.first.i64 < smin || read.first.i64 > smax;
if (sizeof(T) < sizeof(int64_t) && oob)
throw bt_deserialize_invalid("Integer deserialization failed: found out-of-range value " + std::to_string(read.first.i64) + " not in [" + std::to_string(smin) + "," + std::to_string(smax) + "]");
val = static_cast<T>(read.first.i64);
}
} else {
if (read.second)
throw bt_deserialize_invalid("Integer deserialization failed: found negative value " + std::to_string(read.first.i64) + " but type is unsigned");
if (sizeof(T) < sizeof(uint64_t) && read.first.u64 > umax)
throw bt_deserialize_invalid("Integer deserialization failed: found too-large value " + std::to_string(read.first.u64) + " > " + std::to_string(umax));
val = static_cast<T>(read.first.u64);
}
}
};
extern template struct bt_deserialize<int64_t>;
extern template struct bt_deserialize<uint64_t>;
template <>
struct bt_serialize<string_view> {
void operator()(std::ostream &os, const string_view &val) { os << val.size(); os.put(':'); os.write(val.data(), val.size()); }
};
template <>
struct bt_deserialize<string_view> {
void operator()(string_view& s, string_view& val);
};
/// String specialization
template <>
struct bt_serialize<std::string> {
void operator()(std::ostream &os, const std::string &val) { bt_serialize<string_view>{}(os, val); }
};
template <>
struct bt_deserialize<std::string> {
void operator()(string_view& s, std::string& val) { string_view view; bt_deserialize<string_view>{}(s, view); val = {view.data(), view.size()}; }
};
/// char * and string literals -- we allow serialization for convenience, but not deserialization
template <>
struct bt_serialize<char *> {
void operator()(std::ostream &os, const char *str) { bt_serialize<string_view>{}(os, {str, std::strlen(str)}); }
};
template <size_t N>
struct bt_serialize<char[N]> {
void operator()(std::ostream &os, const char *str) { bt_serialize<string_view>{}(os, {str, N-1}); }
};
/// Partial dict validity; we don't check the second type for serializability, that will be handled
/// via the base case static_assert if invalid.
template <typename T, typename = void> struct is_bt_input_dict_container : std::false_type {};
template <typename T>
struct is_bt_input_dict_container<T, std::enable_if_t<
std::is_same<std::string, std::remove_cv_t<typename T::value_type::first_type>>::value,
void_t<typename T::const_iterator /* is const iterable */,
typename T::value_type::second_type /* has a second type */>>>
: std::true_type {};
/// Determines whether the type looks like something we can insert into (using `v.insert(v.end(), x)`)
template <typename T, typename = void> struct is_bt_insertable : std::false_type {};
template <typename T>
struct is_bt_insertable<T,
void_t<decltype(std::declval<T>().insert(std::declval<T>().end(), std::declval<typename T::value_type>()))>>
: std::true_type {};
/// Determines whether the given type looks like a compatible map (i.e. has std::string keys) that
/// we can insert into.
template <typename T, typename = void> struct is_bt_output_dict_container : std::false_type {};
template <typename T>
struct is_bt_output_dict_container<T, std::enable_if_t<
std::is_same<std::string, std::remove_cv_t<typename T::value_type::first_type>>::value &&
is_bt_insertable<T>::value,
void_t<typename T::value_type::second_type /* has a second type */>>>
: std::true_type {};
/// Specialization for a dict-like container (such as an unordered_map). We accept anything for a
/// dict that is const iterable over something that looks like a pair with std::string for first
/// value type. The value (i.e. second element of the pair) also must be serializable.
template <typename T>
struct bt_serialize<T, std::enable_if_t<is_bt_input_dict_container<T>::value>> {
using second_type = typename T::value_type::second_type;
using ref_pair = std::reference_wrapper<const typename T::value_type>;
void operator()(std::ostream &os, const T &dict) {
os << 'd';
std::vector<ref_pair> pairs;
pairs.reserve(dict.size());
for (const auto &pair : dict)
pairs.emplace(pairs.end(), pair);
std::sort(pairs.begin(), pairs.end(), [](ref_pair a, ref_pair b) { return a.get().first < b.get().first; });
for (auto &ref : pairs) {
bt_serialize<std::string>{}(os, ref.get().first);
bt_serialize<second_type>{}(os, ref.get().second);
}
os << 'e';
}
};
template <typename T>
struct bt_deserialize<T, std::enable_if_t<is_bt_output_dict_container<T>::value>> {
using second_type = typename T::value_type::second_type;
void operator()(string_view& s, T& dict) {
// Smallest dict is 2 bytes "de", for an empty dict.
if (s.size() < 2) throw bt_deserialize_invalid("Deserialization failed: end of string found where dict expected");
if (s[0] != 'd') throw bt_deserialize_invalid_type("Deserialization failed: expected 'd', found '"s + s[0] + "'"s);
s.remove_prefix(1);
dict.clear();
bt_deserialize<std::string> key_deserializer;
bt_deserialize<second_type> val_deserializer;
while (!s.empty() && s[0] != 'e') {
std::string key;
second_type val;
key_deserializer(s, key);
val_deserializer(s, val);
dict.insert(dict.end(), typename T::value_type{std::move(key), std::move(val)});
}
if (s.empty())
throw bt_deserialize_invalid("Deserialization failed: encountered end of string before dict was finished");
s.remove_prefix(1); // Consume the 'e'
}
};
/// Accept anything that looks iterable; value serialization validity isn't checked here (it fails
/// via the base case static assert).
template <typename T, typename = void> struct is_bt_input_list_container : std::false_type {};
template <typename T>
struct is_bt_input_list_container<T, std::enable_if_t<
!std::is_same<T, std::string>::value &&
!is_bt_input_dict_container<T>::value,
void_t<typename T::const_iterator, typename T::value_type>>>
: std::true_type {};
template <typename T, typename = void> struct is_bt_output_list_container : std::false_type {};
template <typename T>
struct is_bt_output_list_container<T, std::enable_if_t<
!std::is_same<T, std::string>::value &&
!is_bt_output_dict_container<T>::value &&
is_bt_insertable<T>::value>>
: std::true_type {};
/// List specialization
template <typename T>
struct bt_serialize<T, std::enable_if_t<is_bt_input_list_container<T>::value>> {
void operator()(std::ostream& os, const T& list) {
os << 'l';
for (const auto &v : list)
bt_serialize<std::remove_cv_t<typename T::value_type>>{}(os, v);
os << 'e';
}
};
template <typename T>
struct bt_deserialize<T, std::enable_if_t<is_bt_output_list_container<T>::value>> {
using value_type = typename T::value_type;
void operator()(string_view& s, T& list) {
// Smallest list is 2 bytes "le", for an empty list.
if (s.size() < 2) throw bt_deserialize_invalid("Deserialization failed: end of string found where list expected");
if (s[0] != 'l') throw bt_deserialize_invalid_type("Deserialization failed: expected 'l', found '"s + s[0] + "'"s);
s.remove_prefix(1);
list.clear();
bt_deserialize<value_type> deserializer;
while (!s.empty() && s[0] != 'e') {
value_type v;
deserializer(s, v);
list.insert(list.end(), std::move(v));
}
if (s.empty())
throw bt_deserialize_invalid("Deserialization failed: encountered end of string before list was finished");
s.remove_prefix(1); // Consume the 'e'
}
};
/// variant visitor; serializes whatever is contained
class bt_serialize_visitor {
std::ostream &os;
public:
using result_type = void;
bt_serialize_visitor(std::ostream &os) : os{os} {}
template <typename T> void operator()(const T &val) const {
bt_serialize<T>{}(os, val);
}
};
template <typename T>
using is_bt_deserializable = std::integral_constant<bool,
std::is_same<T, std::string>::value || std::is_integral<T>::value ||
is_bt_output_dict_container<T>::value || is_bt_output_list_container<T>::value>;
// General template and base case; this base will only actually be invoked when Ts... is empty,
// which means we reached the end without finding any variant type capable of holding the value.
template <typename SFINAE, typename Variant, typename... Ts>
struct bt_deserialize_try_variant_impl {
void operator()(string_view&, Variant&) {
throw bt_deserialize_invalid("Deserialization failed: could not deserialize value into any variant type");
}
};
template <typename... Ts, typename Variant>
void bt_deserialize_try_variant(string_view& s, Variant& variant) {
bt_deserialize_try_variant_impl<void, Variant, Ts...>{}(s, variant);
}
template <typename Variant, typename T, typename... Ts>
struct bt_deserialize_try_variant_impl<std::enable_if_t<is_bt_deserializable<T>::value>, Variant, T, Ts...> {
void operator()(string_view& s, Variant& variant) {
if ( is_bt_output_list_container<T>::value ? s[0] == 'l' :
is_bt_output_dict_container<T>::value ? s[0] == 'd' :
std::is_integral<T>::value ? s[0] == 'i' :
std::is_same<T, std::string>::value ? s[0] >= '0' && s[0] <= '9' :
false) {
T val;
bt_deserialize<T>{}(s, val);
variant = std::move(val);
} else {
bt_deserialize_try_variant<Ts...>(s, variant);
}
}
};
template <typename Variant, typename T, typename... Ts>
struct bt_deserialize_try_variant_impl<std::enable_if_t<!is_bt_deserializable<T>::value>, Variant, T, Ts...> {
void operator()(string_view& s, Variant& variant) {
// Unsupported deserialization type, skip it
bt_deserialize_try_variant<Ts...>(s, variant);
}
};
template <>
struct bt_deserialize<bt_value, void> {
void operator()(string_view& s, bt_value& val);
};
template <typename... Ts>
struct bt_serialize<mapbox::util::variant<Ts...>> {
void operator()(std::ostream& os, const mapbox::util::variant<Ts...>& val) {
mapbox::util::apply_visitor(bt_serialize_visitor{os}, val);
}
};
template <typename... Ts>
struct bt_deserialize<mapbox::util::variant<Ts...>> {
void operator()(string_view& s, mapbox::util::variant<Ts...>& val) {
bt_deserialize_try_variant<Ts...>(s, val);
}
};
#ifdef __cpp_lib_variant
/// C++17 std::variant support
template <typename... Ts>
struct bt_serialize<std::variant<Ts...>> {
void operator()(std::ostream &os, const std::variant<Ts...>& val) {
mapbox::util::apply_visitor(bt_serialize_visitor{os}, val);
}
};
template <typename... Ts>
struct bt_deserialize<std::variant<Ts...>> {
void operator()(string_view& s, std::variant<Ts...>& val) {
bt_deserialize_try_variant<Ts...>(s, val);
}
};
#endif
template <typename T>
struct bt_stream_serializer {
const T &val;
explicit bt_stream_serializer(const T &val) : val{val} {}
operator std::string() const {
std::ostringstream oss;
oss << *this;
return oss.str();
}
};
template <typename T>
std::ostream &operator<<(std::ostream &os, const bt_stream_serializer<T> &s) {
bt_serialize<T>{}(os, s.val);
return os;
}
} // namespace detail
/// Returns a wrapper around a value reference that can serialize the value directly to an output
/// stream. This class is intended to be used inline (i.e. without being stored) as in:
///
/// std::list<int> my_list{{1,2,3}};
/// std::cout << bt_serializer(my_list);
///
/// While it is possible to store the returned object and use it, such as:
///
/// auto encoded = bt_serializer(42);
/// std::cout << encoded;
///
/// this approach is not generally recommended: the returned object stores a reference to the
/// passed-in type, which may not survive. If doing this note that it is the caller's
/// responsibility to ensure the serializer is not used past the end of the lifetime of the value
/// being serialized.
///
/// Also note that serializing directly to an output stream is more efficient as no intermediate
/// string containing the entire serialization has to be constructed.
///
template <typename T>
detail::bt_stream_serializer<T> bt_serializer(const T &val) { return detail::bt_stream_serializer<T>{val}; }
/// Serializes the given value into a std::string.
///
/// int number = 42;
/// std::string encoded = bt_serialize(number);
/// // Equivalent:
/// //auto encoded = (std::string) bt_serialize(number);
///
/// This takes any serializable type: integral types, strings, lists of serializable types, and
/// string->value maps of serializable types.
template <typename T>
std::string bt_serialize(const T &val) { return bt_serializer(val); }
/// Deserializes the given string view directly into `val`. Usage:
///
/// std::string encoded = "i42e";
/// int value;
/// bt_deserialize(encoded, value); // Sets value to 42
///
template <typename T, std::enable_if_t<!std::is_const<T>::value, int> = 0>
void bt_deserialize(string_view s, T& val) {
return detail::bt_deserialize<T>{}(s, val);
}
/// Deserializes the given string_view into a `T`, which is returned.
///
/// std::string encoded = "li1ei2ei3ee"; // bt-encoded list of ints: [1,2,3]
/// auto mylist = bt_deserialize<std::list<int>>(encoded);
///
template <typename T>
T bt_deserialize(string_view s) {
T val;
bt_deserialize(s, val);
return val;
}
/// Deserializes the given value into a generic `bt_value` type (mapbox::util::variant) which is capable
/// of holding all possible BT-encoded values (including recursion).
///
/// Example:
///
/// std::string encoded = "i42e";
/// auto val = bt_get(encoded);
/// int v = get_int<int>(val); // fails unless the encoded value was actually an integer that
/// // fits into an `int`
///
inline bt_value bt_get(string_view s) {
return bt_deserialize<bt_value>(s);
}
/// Helper functions to extract a value of some integral type from a bt_value which contains an
/// integer. Does range checking, throwing std::overflow_error if the stored value is outside the
/// range of the target type.
///
/// Example:
///
/// std::string encoded = "i123456789e";
/// auto val = bt_get(encoded);
/// auto v = get_int<uint32_t>(val); // throws if the decoded value doesn't fit in a uint32_t
template <typename IntType, std::enable_if_t<std::is_integral<IntType>::value, int> = 0>
IntType get_int(const bt_value &v) {
// It's highly unlikely that this code ever runs on a non-2s-complement architecture, but check
// at compile time if converting to a uint64_t (because while int64_t -> uint64_t is
// well-defined, uint64_t -> int64_t only does the right thing under 2's complement).
static_assert(!std::is_unsigned<IntType>::value || sizeof(IntType) != sizeof(int64_t) || -1 == ~0,
"Non 2s-complement architecture not supported!");
int64_t value = mapbox::util::get<int64_t>(v);
if (sizeof(IntType) < sizeof(int64_t)) {
if (value > static_cast<int64_t>(std::numeric_limits<IntType>::max())
|| value < static_cast<int64_t>(std::numeric_limits<IntType>::min()))
throw std::overflow_error("Unable to extract integer value: stored value is outside the range of the requested type");
}
return static_cast<IntType>(value);
}
/// Class that allows you to walk through a bt-encoded list in memory without copying or allocating
/// memory. It accesses existing memory directly and so the caller must ensure that the referenced
/// memory stays valid for the lifetime of the bt_list_consumer object.
class bt_list_consumer {
protected:
string_view data;
bt_list_consumer() = default;
public:
bt_list_consumer(string_view data_);
/// Copy constructor. Making a copy copies the current position so can be used for multipass
/// iteration through a list.
bt_list_consumer(const bt_list_consumer&) = default;
bt_list_consumer& operator=(const bt_list_consumer&) = default;
/// Returns true if the next value indicates the end of the list
bool is_finished() const { return data.front() == 'e'; }
/// Returns true if the next element looks like an encoded string
bool is_string() const { return data.front() >= '0' && data.front() <= '9'; }
/// Returns true if the next element looks like an encoded integer
bool is_integer() const { return data.front() == 'i'; }
/// Returns true if the next element looks like an encoded list
bool is_list() const { return data.front() == 'l'; }
/// Returns true if the next element looks like an encoded dict
bool is_dict() const { return data.front() == 'd'; }
/// Attempt to parse the next value as a string (and advance just past it). Throws if the next
/// value is not a string.
string_view consume_string();
/// Attempts to parse the next value as an integer (and advance just past it). Throws if the
/// next value is not an integer.
template <typename IntType>
IntType consume_integer() {
if (!is_integer()) throw bt_deserialize_invalid_type{"next value is not an integer"};
string_view next{data};
IntType ret;
detail::bt_deserialize<IntType>{}(next, ret);
data = next;
return ret;
}
/// Consumes a list, return it as a list-like type. This typically requires dynamic allocation,
/// but only has to parse the data once. Compare with consume_list_data() which allows
/// alloc-free traversal, but requires parsing twice (if the contents are to be used).
template <typename T = bt_list>
T consume_list() {
T list;
consume_list(list);
return list;
}
/// Same as above, but takes a pre-existing list-like data type.
template <typename T>
void consume_list(T& list) {
if (!is_list()) throw bt_deserialize_invalid_type{"next bt value is not a list"};
string_view n{data};
detail::bt_deserialize<T>{}(n, list);
data = n;
}
/// Consumes a dict, return it as a dict-like type. This typically requires dynamic allocation,
/// but only has to parse the data once. Compare with consume_dict_data() which allows
/// alloc-free traversal, but requires parsing twice (if the contents are to be used).
template <typename T = bt_dict>
T consume_dict() {
T dict;
consume_dict(dict);
return dict;
}
/// Same as above, but takes a pre-existing dict-like data type.
template <typename T>
void consume_dict(T& dict) {
if (!is_dict()) throw bt_deserialize_invalid_type{"next bt value is not a dict"};
string_view n{data};
detail::bt_deserialize<T>{}(n, dict);
data = n;
}
/// Consumes a value without returning it.
void skip_value();
/// Attempts to parse the next value as a list and returns the string_view that contains the
/// entire thing. This is recursive into both lists and dicts and likely to be quite
/// inefficient for large, nested structures (unless the values only need to be skipped but
/// aren't separately needed). This, however, does not require dynamic memory allocation.
string_view consume_list_data();
/// Attempts to parse the next value as a dict and returns the string_view that contains the
/// entire thing. This is recursive into both lists and dicts and likely to be quite
/// inefficient for large, nested structures (unless the values only need to be skipped but
/// aren't separately needed). This, however, does not require dynamic memory allocation.
string_view consume_dict_data();
};
/// Class that allows you to walk through key-value pairs of a bt-encoded dict in memory without
/// copying or allocating memory. It accesses existing memory directly and so the caller must
/// ensure that the referenced memory stays valid for the lifetime of the bt_dict_consumer object.
class bt_dict_consumer : private bt_list_consumer {
string_view key_;
/// Consume the key if not already consumed and there is a key present (rather than 'e').
/// Throws exception if what should be a key isn't a string, or if the key consumes the entire
/// data (i.e. requires that it be followed by something). Returns true if the key was consumed
/// (either now or previously and cached).
bool consume_key();
/// Clears the cached key and returns it. Must have already called consume_key directly or
/// indirectly via one of the `is_{...}` methods.
string_view flush_key() {
string_view k;
k.swap(key_);
return k;
}
public:
bt_dict_consumer(string_view data_);
/// Copy constructor. Making a copy copies the current position so can be used for multipass
/// iteration through a list.
bt_dict_consumer(const bt_dict_consumer&) = default;
bt_dict_consumer& operator=(const bt_dict_consumer&) = default;
/// Returns true if the next value indicates the end of the dict
bool is_finished() { return !consume_key() && data.front() == 'e'; }
/// Operator bool is an alias for `!is_finished()`
operator bool() { return !is_finished(); }
/// Returns true if the next value looks like an encoded string
bool is_string() { return consume_key() && data.front() >= '0' && data.front() <= '9'; }
/// Returns true if the next element looks like an encoded integer
bool is_integer() { return consume_key() && data.front() == 'i'; }
/// Returns true if the next element looks like an encoded list
bool is_list() { return consume_key() && data.front() == 'l'; }
/// Returns true if the next element looks like an encoded dict
bool is_dict() { return consume_key() && data.front() == 'd'; }
/// Returns the key of the next pair. This does not have to be called; it is also returned by
/// all of the other consume_* methods. The value is cached whether called here or by some
/// other method; accessing it multiple times simple accesses the cache until the next value is
/// consumed.
string_view key() {
if (!consume_key())
throw bt_deserialize_invalid{"Cannot access next key: at the end of the dict"};
return key_;
}
/// Attempt to parse the next value as a string->string pair (and advance just past it). Throws
/// if the next value is not a string.
std::pair<string_view, string_view> consume_string();
/// Attempts to parse the next value as an string->integer pair (and advance just past it).
/// Throws if the next value is not an integer.
template <typename IntType>
std::pair<string_view, IntType> consume_integer() {
if (!is_integer()) throw bt_deserialize_invalid_type{"next bt dict value is not an integer"};
std::pair<string_view, IntType> ret;
ret.second = bt_list_consumer::consume_integer<IntType>();
ret.first = flush_key();
return ret;
}
/// Consumes a string->list pair, return it as a list-like type. This typically requires
/// dynamic allocation, but only has to parse the data once. Compare with consume_list_data()
/// which allows alloc-free traversal, but requires parsing twice (if the contents are to be
/// used).
template <typename T = bt_list>
std::pair<string_view, T> consume_list() {
std::pair<string_view, T> pair;
pair.first = consume_list(pair.second);
return pair;
}
/// Same as above, but takes a pre-existing list-like data type. Returns the key.
template <typename T>
string_view consume_list(T& list) {
if (!is_list()) throw bt_deserialize_invalid_type{"next bt value is not a list"};
bt_list_consumer::consume_list(list);
return flush_key();
}
/// Consumes a string->dict pair, return it as a dict-like type. This typically requires
/// dynamic allocation, but only has to parse the data once. Compare with consume_dict_data()
/// which allows alloc-free traversal, but requires parsing twice (if the contents are to be
/// used).
template <typename T = bt_dict>
std::pair<string_view, T> consume_dict() {
std::pair<string_view, T> pair;
pair.first = consume_dict(pair.second);
return pair;
}
/// Same as above, but takes a pre-existing dict-like data type. Returns the key.
template <typename T>
string_view consume_dict(T& dict) {
if (!is_dict()) throw bt_deserialize_invalid_type{"next bt value is not a dict"};
bt_list_consumer::consume_dict(dict);
return flush_key();
}
/// Attempts to parse the next value as a string->list pair and returns the string_view that
/// contains the entire thing. This is recursive into both lists and dicts and likely to be
/// quite inefficient for large, nested structures (unless the values only need to be skipped
/// but aren't separately needed). This, however, does not require dynamic memory allocation.
std::pair<string_view, string_view> consume_list_data() {
if (data.size() < 2 || !is_list()) throw bt_deserialize_invalid_type{"next bt dict value is not a list"};
return {flush_key(), bt_list_consumer::consume_list_data()};
}
/// Attempts to parse the next value as a string->dict pair and returns the string_view that
/// contains the entire thing. This is recursive into both lists and dicts and likely to be
/// quite inefficient for large, nested structures (unless the values only need to be skipped
/// but aren't separately needed). This, however, does not require dynamic memory allocation.
std::pair<string_view, string_view> consume_dict_data() {
if (data.size() < 2 || !is_dict()) throw bt_deserialize_invalid_type{"next bt dict value is not a dict"};
return {flush_key(), bt_list_consumer::consume_dict_data()};
}
/// Skips ahead until we find the first key >= the given key or reach the end of the dict.
/// Returns true if we found an exact match, false if we reached some greater value or the end.
/// If we didn't hit the end, the next `consumer_*()` call will return the key-value pair we
/// found (either the exact match or the first key greater than the requested key).
///
/// Two important notes:
///
/// - properly encoded bt dicts must have lexicographically sorted keys, and this method assumes
/// that the input is correctly sorted (and thus if we find a greater value then your key does
/// not exist).
/// - this is irreversible; you cannot returned to skipped values without reparsing. (You *can*
/// however, make a copy of the bt_dict_consumer before calling and use the copy to return to
/// the pre-skipped position).
bool skip_until(string_view find) {
while (consume_key() && key_ < find) {
flush_key();
skip_value();
}
return key_ == find;
}
};
} // namespace lokimq

125
lokimq/hex.h Normal file
View File

@ -0,0 +1,125 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#pragma once
#include <string>
#include <array>
#include <iterator>
#include <cassert>
namespace lokimq {
namespace detail {
/// Compile-time generated lookup tables hex conversion
struct hex_table {
char from_hex_lut[256];
char to_hex_lut[16];
constexpr hex_table() noexcept : from_hex_lut{}, to_hex_lut{} {
for (unsigned char c = 0; c < 10; c++) {
from_hex_lut[(unsigned char)('0' + c)] = 0 + c;
to_hex_lut[ (unsigned char)( 0 + c)] = '0' + c;
}
for (unsigned char c = 0; c < 6; c++) {
from_hex_lut[(unsigned char)('a' + c)] = 10 + c;
from_hex_lut[(unsigned char)('A' + c)] = 10 + c;
to_hex_lut[ (unsigned char)(10 + c)] = 'a' + c;
}
}
constexpr char from_hex(unsigned char c) const noexcept { return from_hex_lut[c]; }
constexpr char to_hex(unsigned char b) const noexcept { return to_hex_lut[b]; }
} constexpr hex_lut;
} // namespace detail
/// Creates hex digits from a character sequence.
template <typename InputIt, typename OutputIt>
void to_hex(InputIt begin, InputIt end, OutputIt out) {
for (; begin != end; ++begin) {
auto c = *begin;
*out++ = detail::hex_lut.to_hex((c & 0xf0) >> 4);
*out++ = detail::hex_lut.to_hex(c & 0x0f);
}
}
/// Creates a hex string from an iterable, std::string-like object
template <typename String>
std::string to_hex(const String& s) {
std::string hex;
hex.reserve(s.size() * 2);
to_hex(s.begin(), s.end(), std::back_inserter(hex));
return hex;
}
/// Returns true if all elements in the range are hex characters
template <typename It>
constexpr bool is_hex(It begin, It end) {
for (; begin != end; ++begin) {
if (detail::hex_lut.from_hex(*begin) == 0 && *begin != '0')
return false;
}
return true;
}
/// Returns true if all elements in the string-like value are hex characters
template <typename String>
constexpr bool is_hex(const String& s) { return is_hex(s.begin(), s.end()); }
/// Convert a hex digit into its numeric (0-15) value
constexpr char from_hex_digit(unsigned char x) noexcept {
return detail::hex_lut.from_hex(x);
}
/// Constructs a byte value from a pair of hex digits
constexpr char from_hex_pair(unsigned char a, unsigned char b) noexcept { return (from_hex_digit(a) << 4) | from_hex_digit(b); }
/// Converts a sequence of hex digits to bytes. Undefined behaviour if any characters are not in
/// [0-9a-fA-F] or if the input sequence length is not even. It is permitted for the input and
/// output ranges to overlap as long as out is no earlier than begin.
template <typename InputIt, typename OutputIt>
void from_hex(InputIt begin, InputIt end, OutputIt out) {
using std::distance;
assert(distance(begin, end) % 2 == 0);
while (begin != end) {
auto a = *begin++;
auto b = *begin++;
*out++ = from_hex_pair(a, b);
}
}
/// Converts hex digits from a std::string-like object into a std::string of bytes. Undefined
/// behaviour if any characters are not in [0-9a-fA-F] or if the input sequence length is not even.
template <typename String>
std::string from_hex(const String& s) {
std::string bytes;
bytes.reserve(s.size() / 2);
from_hex(s.begin(), s.end(), std::back_inserter(bytes));
return bytes;
}
}

1261
lokimq/lokimq.cpp Normal file

File diff suppressed because it is too large Load Diff

771
lokimq/lokimq.h Normal file
View File

@ -0,0 +1,771 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#pragma once
#include "zmq.hpp"
#include <string>
#include <list>
#include <queue>
#include <unordered_map>
#include <memory>
#include <functional>
#include <thread>
#include <mutex>
#include <iostream>
#include <chrono>
#include <atomic>
#include "bt_serialize.h"
#include "string_view.h"
namespace lokimq {
using namespace std::literals;
/// Logging levels passed into LogFunc
enum class LogLevel { trace, debug, info, warn, error, fatal };
/// Authentication levels for command categories and connections
enum class AuthLevel {
denied, ///< Not actually an auth level, but can be returned by the AllowFunc to deny an incoming connection.
none, ///< No authentication at all; any random incoming ZMQ connection can invoke this command.
basic, ///< Basic authentication commands require a login, or a node that is specifically configured to be a public node (e.g. for public RPC).
admin, ///< Advanced authentication commands require an admin user, either via explicit login or by implicit login from localhost. This typically protects administrative commands like shutting down, starting mining, or access sensitive data.
};
/// The access level for a command category
struct Access {
/// Minimum access level required
AuthLevel auth = AuthLevel::none;
/// If true only remote SNs may call the category commands
bool remote_sn = false;
/// If true the category requires that the local node is a SN
bool local_sn = false;
};
/// Return type of the AllowFunc: this determines whether we allow the connection at all, and if
/// so, sets the initial authentication level and tells LokiMQ whether the other hand is an
/// active SN.
struct Allow {
AuthLevel auth = AuthLevel::none;
bool remote_sn = false;
};
class LokiMQ;
/// Encapsulates an incoming message from a remote connection with message details plus extra
/// info need to send a reply back through the proxy thread via the `reply()` method. Note that
/// this object gets reused: callbacks should use but not store any reference beyond the callback.
class Message {
public:
LokiMQ& lokimq; ///< The owning LokiMQ object
std::vector<string_view> data; ///< The provided command data parts, if any.
string_view pubkey; ///< The originator pubkey (32 bytes)
bool service_node; ///< True if the pubkey is an active SN (note that this is only checked on initial connection, not every received message)
/// Constructor
Message(LokiMQ& lmq) : lokimq{lmq} {}
// Non-copyable
Message(const Message&) = delete;
Message& operator=(const Message&) = delete;
/// Sends a reply. Arguments are forwarded to send() but with send_option::optional{} added
/// if the originator is not a SN. For SN messages (i.e. where `sn` is true) this is a
/// "strong" reply by default in that the proxy will attempt to establish a new connection
/// to the SN if no longer connected. For non-SN messages the reply will be attempted using
/// the available routing information, but if the connection has already been closed the
/// reply will be dropped.
///
/// If you want to send a non-strong reply even when the remote is a service node then add
/// an explicit `send_option::optional()` argument.
template <typename... Args>
void reply(const std::string& command, Args&&... args);
};
/** The keep-alive time for a send() that results in a establishing a new outbound connection. To
* use a longer keep-alive to a host call `connect()` first with the desired keep-alive time or pass
* the send_option::keep_alive.
*/
static constexpr auto DEFAULT_SEND_KEEP_ALIVE = 30s;
/// Maximum length of a category
static constexpr size_t MAX_CATEGORY_LENGTH = 50;
/// Maximum length of a command
static constexpr size_t MAX_COMMAND_LENGTH = 200;
/**
* Class that handles LokiMQ listeners, connections, proxying, and workers. An application
* typically has just one instance of this class.
*/
class LokiMQ {
private:
/// The global context
zmq::context_t context;
/// A unique id for this LokiMQ instance, assigned in a thread-safe manner during construction.
const int object_id;
/// The x25519 keypair of this connection. For service nodes these are the long-run x25519 keys
/// provided at construction, for non-service-node connections these are generated during
/// construction.
std::string pubkey, privkey;
/// True if *this* node is running in service node mode (whether or not actually active)
bool local_service_node = false;
/// Addresses on which to listen, or empty if we only establish outgoing connections but aren't
/// listening.
std::vector<std::string> bind;
/// The thread in which most of the intermediate work happens (handling external connections
/// and proxying requests between them to worker threads)
std::thread proxy_thread;
/// Will be true (and is guarded by a mutex) if the proxy thread is quitting; guards against new
/// control sockets from threads trying to talk to the proxy thread.
bool proxy_shutting_down = false;
/// Called to obtain a "command" socket that attaches to `control` to send commands to the
/// proxy thread from other threads. This socket is unique per thread and LokiMQ instance.
zmq::socket_t& get_control_socket();
/// Stores all of the sockets created in different threads via `get_control_socket`. This is
/// only used during destruction to close all of those open sockets, and is protected by an
/// internal mutex which is only locked by new threads getting a control socket and the
/// destructor.
std::vector<std::shared_ptr<zmq::socket_t>> thread_control_sockets;
public:
/// Callback type invoked to determine whether the given new incoming connection is allowed to
/// connect to us and to set its initial authentication level.
///
/// @param ip - the ip address of the incoming connection
/// @param pubkey - the x25519 pubkey of the connecting client (32 byte string)
///
/// @returns an `AuthLevel` enum value indicating the default auth level for the incoming
/// connection, or AuthLevel::denied if the connection should be refused.
using AllowFunc = std::function<Allow(string_view ip, string_view pubkey)>;
/// Callback that is invoked when we need to send a "strong" message to a SN that we aren't
/// already connected to and need to establish a connection. This callback returns the ZMQ
/// connection string we should use which is typically a string such as `tcp://1.2.3.4:5678`.
using SNRemoteAddress = std::function<std::string(const std::string& pubkey)>;
/// The callback type for registered commands.
using CommandCallback = std::function<void(Message& message)>;
/// Called to write a log message. This will only be called if the `level` is >= the current
/// LokiMQ object log level. It must be a raw function pointer (or a capture-less lambda) for
/// performance reasons. Takes four arguments: the log level of the message, the filename and
/// line number where the log message was invoked, and the log message itself.
using Logger = std::function<void(LogLevel level, const char* file, int line, std::string msg)>;
/// Explicitly non-copyable, non-movable because most things here aren't copyable, and a few
/// things aren't movable, either. If you need to pass the LokiMQ instance around, wrap it
/// in a unique_ptr or shared_ptr.
LokiMQ(const LokiMQ&) = delete;
LokiMQ& operator=(const LokiMQ&) = delete;
LokiMQ(LokiMQ&&) = delete;
LokiMQ& operator=(LokiMQ&&) = delete;
/** How long to wait for handshaking to complete on external connections before timing out and
* closing the connection. Setting this only affects new outgoing connections. */
std::chrono::milliseconds SN_HANDSHAKE_TIME = 10s;
/** Maximum incoming message size; if a remote tries sending a message larger than this they get
* disconnected. -1 means no limit. */
int64_t SN_ZMQ_MAX_MSG_SIZE = 1 * 1024 * 1024;
/** How long (in ms) to linger sockets when closing them; this is the maximum time zmq spends
* trying to sending pending messages before dropping them and closing the underlying socket
* after the high-level zmq socket is closed. */
std::chrono::milliseconds CLOSE_LINGER = 5s;
private:
/// The lookup function that tells us where to connect to a peer
SNRemoteAddress peer_lookup;
/// Callback to see whether the incoming connection is allowed
AllowFunc allow_connection;
/// The log level; this is atomic but we use relaxed order to set and access it (so changing it
/// might not be instantly visible on all threads, but that's okay).
std::atomic<LogLevel> log_lvl;
/// The callback to call with log messages
Logger logger;
/// Logging implementation
template <typename... T>
void log_(LogLevel lvl, const char* filename, int line, const T&... stuff);
///////////////////////////////////////////////////////////////////////////////////
/// NB: The following are all the domain of the proxy thread (once it is started)!
/// Addresses to bind to in `start()`
std::vector<std::string> bind_addresses;
/// Our listening ROUTER socket for incoming connections (will be left unconnected if not
/// listening).
zmq::socket_t listener;
/// Info about a peer's established connection to us. Note that "established" means both
/// connected and authenticated.
struct peer_info {
/// True if we've authenticated this peer as a service node.
bool service_node = false;
/// The auth level of this peer
AuthLevel auth_level = AuthLevel::none;
/// Will be set to a non-empty routing prefix if if we have (or at least recently had) an
/// established incoming connection with this peer. Will be empty if there is no incoming
/// connection.
std::string incoming;
/// The index in `remotes` if we have an established outgoing connection to this peer, -1 if
/// we have no outgoing connection to this peer.
int outgoing = -1;
/// The last time we sent or received a message (or had some other relevant activity) with
/// this peer. Used for closing outgoing connections that have reached an inactivity expiry
/// time.
std::chrono::steady_clock::time_point last_activity;
/// Updates last_activity to the current time
void activity() { last_activity = std::chrono::steady_clock::now(); }
/// After more than this much inactivity we will close an idle connection
std::chrono::milliseconds idle_expiry;
};
struct pk_hash {
size_t operator()(const std::string& pubkey) const {
size_t h;
std::memcpy(&h, pubkey.data(), sizeof(h));
return h;
}
};
/// Currently peer connections, pubkey -> peer_info
std::unordered_map<std::string, peer_info, pk_hash> peers;
/// different polling sockets the proxy handler polls: this always contains some internal
/// sockets for inter-thread communication followed by listener socket and a pollitem for every
/// (outgoing) remote socket in `remotes`. This must be in a sequential vector because of zmq
/// requirements (otherwise it would be far nicer to not have to synchronize the two vectors).
std::vector<zmq::pollitem_t> pollitems;
/// Properly adds a socket to poll for input to pollitems
void add_pollitem(zmq::socket_t& sock);
/// The number of internal sockets in `pollitems`
static constexpr size_t poll_internal_size = 3;
/// The pollitems location corresponding to `remotes[0]`.
const size_t poll_remote_offset; // will be poll_internal_size + 1 for a full listener (the +1 is the listening socket); poll_internal_size for a remote-only
/// The outgoing remote connections we currently have open along with the remote pubkeys. Each
/// element [i] here corresponds to an the pollitem_t at pollitems[i+1+poll_internal_size].
/// (Ideally we'd use one structure, but zmq requires the pollitems be in contiguous storage).
std::vector<std::pair<std::string, zmq::socket_t>> remotes;
/// Socket we listen on to receive control messages in the proxy thread. Each thread has its own
/// internal "control" connection (returned by `get_control_socket()`) to this socket used to
/// give instructions to the proxy such as instructing it to initiate a connection to a remote
/// or send a message.
zmq::socket_t command{context, zmq::socket_type::router};
/// Router socket to reach internal worker threads from proxy
zmq::socket_t workers_socket{context, zmq::socket_type::router};
/// indices of idle, active workers
std::vector<unsigned int> idle_workers;
/// Maximum number of general task workers, specified during construction
unsigned int general_workers;
/// Maximum number of possible worker threads we can have. This is calculated when starting,
/// and equals general_workers plus the sum of all categories' reserved threads counts. This is
/// also used to signal a shutdown; we set it to 0 when quitting.
unsigned int max_workers;
/// Worker thread loop
void worker_thread(unsigned int index);
/// Does the proxying work
void proxy_loop();
/// Handles built-in primitive commands in the proxy thread for things like "BYE" that have to
/// be done in the proxy thread anyway (if we forwarded to a worker the worker would just have
/// to send an instruction back to the proxy to do it). Returns true if one was handled, false
/// to continue with sending to a worker.
bool proxy_handle_builtin(size_t conn_index, std::vector<zmq::message_t>& parts);
/// Sets up a job for a worker then signals the worker (or starts a worker thread)
void proxy_to_worker(size_t conn_index, std::vector<zmq::message_t>& parts);
/// proxy thread command handlers for commands sent from the outer object QUIT. This doesn't
/// get called immediately on a QUIT command: the QUIT commands tells workers to quit, then this
/// gets called after all works have done so.
void proxy_quit();
/// Common connection implementation used by proxy_connect/proxy_send. Returns the socket
/// and, if a routing prefix is needed, the required prefix (or an empty string if not needed).
/// For an optional connect that fail, returns nullptr for the socket.
std::pair<zmq::socket_t*, std::string> proxy_connect(const std::string& pubkey, const std::string& connect_hint, bool optional, bool incoming_only, std::chrono::milliseconds keep_alive);
/// CONNECT command telling us to connect to a new pubkey. Returns the socket (which could be
/// existing or a new one).
std::pair<zmq::socket_t*, std::string> proxy_connect(bt_dict&& data);
/// DISCONNECT command telling us to disconnect our remote connection to the given pubkey (if we
/// have one).
void proxy_disconnect(const std::string& pubkey);
/// SEND command. Does a connect first, if necessary.
void proxy_send(bt_dict&& data);
/// REPLY command. Like SEND, but only has a listening socket route to send back to and so is
/// weaker (i.e. it cannot reconnect to the SN if the connection is no longer open).
void proxy_reply(bt_dict&& data);
/// ZAP (https://rfc.zeromq.org/spec:27/ZAP/) authentication handler; this is called with the
/// zap auth socket to do non-blocking processing of any waiting authentication requests waiting
/// on it to verify whether the connection is from a valid/allowed SN.
void process_zap_requests(zmq::socket_t& zap_auth);
/// Handles a control message from some outer thread to the proxy
void proxy_control_message(std::vector<zmq::message_t> parts);
/// Closing any idle connections that have outlived their idle time. Note that this only
/// affects outgoing connections; incomings connections are the responsibility of the other end.
void proxy_expire_idle_peers();
/// Closes an outgoing connection immediately, updates internal variables appropriately.
/// Returns the next iterator (the original may or may not be removed from peers, depending on
/// whether or not it also has an active incoming connection).
decltype(peers)::iterator proxy_close_outgoing(decltype(peers)::iterator it);
struct category {
Access access;
std::unordered_map<std::string, CommandCallback> commands;
unsigned int reserved_threads = 0;
unsigned int active_threads = 0;
std::queue<std::list<zmq::message_t>> pending; // FIXME - vector?
int max_queue = 200;
category(Access access, unsigned int reserved_threads, int max_queue)
: access{access}, reserved_threads{reserved_threads}, max_queue{max_queue} {}
};
/// Categories, mapped by category name.
std::unordered_map<std::string, category> categories;
/// For enabling backwards compatibility with command renaming: this allows mapping one command
/// to another in a different category (which happens before the category and command lookup is
/// done).
std::unordered_map<std::string, std::string> command_aliases;
/// Retrieve category and callback from a command name, including alias mapping. Warns on
/// invalid commands and returns nullptrs. The command name will be updated in place if it is
/// aliased to another command.
std::pair<const category*, const CommandCallback*> get_command(std::string& command);
/// Checks a peer's authentication level. Returns true if allowed, warns and returns false if
/// not.
bool proxy_check_auth(const std::string& pubkey, size_t conn_index, const peer_info& peer,
const std::string& command, const category& cat, zmq::message_t& msg);
///
/// End of proxy-specific members
///////////////////////////////////////////////////////////////////////////////////
/// Structure that contains the data for a worker thread - both the thread itself, plus any
/// transient data we are passing into the thread.
struct run_info {
std::string command;
std::string pubkey;
bool service_node = false;
const CommandCallback* callback = nullptr;
std::vector<zmq::message_t> message_parts;
private:
friend class LokiMQ;
std::thread thread;
std::string routing_id;
};
/// Data passed to workers for the RUN command. The proxy thread sets elements in this before
/// sending RUN to a worker then the worker uses it to get call info, and only allocates it
/// once, before starting any workers. Workers may only access their own index and may not
/// change it.
std::vector<run_info> workers;
public:
/**
* LokiMQ constructor. This constructs the object but does not start it; you will typically
* want to first add categories and commands, then finish startup by invoking `start()`.
* (Categories and commands cannot be added after startup).
*
* @param pubkey the public key (32-byte binary string). For a service node this is the service
* node x25519 keypair. For non-service nodes this (and privkey) can be empty strings to
* automatically generate an ephemeral keypair.
*
* @param privkey the service node's private key (32-byte binary string), or empty to generate
* one.
*
* @param service_node - true if this instance should be considered a service node for the
* purpose of allowing "Access::local_sn" remote calls. (This should be true if we are
* *capable* of being a service node, whether or not we are currently actively). If specified
* as true then the pubkey and privkey values must not be empty.
*
* @param bind list of addresses to bind to. Can be any string zmq supports; typically a tcp
* IP/port combination such as: "tcp://\*:4567" or "tcp://1.2.3.4:5678". Can be empty to not
* listen at all.
*
* @param peer_lookup function that takes a pubkey key (32-byte binary string) and returns a
* connection string such as "tcp://1.2.3.4:23456" to which a connection should be established
* to reach that service node. Note that this function is only called if there is no existing
* connection to that service node, and that the function is never called for a connection to
* self (that uses an internal connection instead).
*
* @param allow_incoming is a callback that LokiMQ can use to determine whether an incoming
* connection should be allowed at all and, if so, whether the connection is from a known
* service node. Called with the connecting IP, the remote's verified x25519 pubkey, and the
* called on incoming connections with the (verified) incoming connection
* pubkey (32-byte binary string) to determine whether the given SN should be allowed to
* connect.
*
* @param log a function or callable object that writes a log message. If omitted then all log
* messages are suppressed.
*
* @param general_workers the maximum number of worker threads to start for general tasks.
* These threads can be used for any command, and will be created (up to the limit) on demand.
* Note that individual categories with reserved threads can create threads in addition to the
* amount specified here. The default (0) means std::thread::hardware_concurrency().
*/
LokiMQ( std::string pubkey,
std::string privkey,
bool service_node,
std::vector<std::string> bind,
SNRemoteAddress peer_lookup,
AllowFunc allow_connection,
Logger logger = [](LogLevel, const char*, int, std::string) { },
unsigned int general_workers = 0);
/**
* Destructor; instructs the proxy to quit. The proxy tells all workers to quit, waits for them
* to quit and rejoins the threads then quits itself. The outer thread (where the destructor is
* running) rejoins the proxy thread.
*/
~LokiMQ();
/// Sets the log level of the LokiMQ object.
void log_level(LogLevel level);
/// Gets the log level of the LokiMQ object.
LogLevel log_level() const;
/**
* Add a new command category. This method may not be invoked after `start()` has been called.
* This method is also not thread safe, and is generally intended to be called (along with
* add_command) immediately after construction and immediately before calling start().
*
* @param name - the category name which must consist of one or more characters and may not
* contain a ".".
*
* @param access_level the access requirements for remote invocation of the commands inside this
* category.
*
* @param reserved_threads if non-zero then the worker thread pool will ensure there are at at
* least this many threads either current processing or available to process commands in this
* category. This is used to ensure that a category's commands can be invoked even if
* long-running commands in some other category are currently using all worker threads. This
* can increase the number of worker threads above the `general_workers` parameter given in the
* constructor, but will only do so if the need arised: that is, if a command request arrives
* for a category when all workers are busy and no worker is currently processing any command in
* that category.
*
* @param max_queue is the maximum number of incoming messages in this category that we will
* queue up when waiting for a worker to become available for this category. Once the queue
* for a category exceeds this many incoming messages then new messages will be dropped until
* some messages are processed off the queue. -1 means unlimited, 0 means we will just drop
* messages for this category when no workers are available.
*/
void add_category(std::string name, Access access_level, unsigned int reserved_threads = 0, int max_queue = 200);
/**
* Adds a new command to an existing category. This method may not be invoked after `start()`
* has been called.
*
* @param category - the category name (must already be created by a call to `add_category`)
*
* @param name - the command name, without the `category.` prefix.
*
* @param callback - a callable object which is callable as `callback(zeromq::Message &)`
*/
void add_command(const std::string& category, std::string name, CommandCallback callback);
/**
* Adds a command alias; this is intended for temporary backwards compatibility: if any aliases
* are defined then every command (not just aliased ones) has to be checked on invocation to see
* if it is defined in the alias list. May not be invoked after `start()`.
*
* Aliases should follow the `category.command` format for both the from and to names, and
* should only be called for `to` categories that are already defined. The category name is not
* currently enforced on the `from` name (for backwards compatility with Loki's quorumnet code)
* but will be at some point.
*
* Access permissions for an aliased command depend only on the mapped-to value; for example, if
* `cat.meow` is aliased to `dog.bark` then it is the access permissions on `dog` that apply,
* not those of `cat`, even if `cat` is more restrictive than `dog`.
*/
void add_command_alias(std::string from, std::string to);
/**
* Finish starting up: binds to the bind locations given in the constructor and launches the
* proxy thread to handle message dispatching between remote nodes and worker threads.
*
* You will need to call `add_category` and `add_command` to register commands before calling
* `start()`; once start() is called commands cannot be changed.
*/
void start();
/**
* Try to initiate a connection to the given SN in anticipation of needing a connection in the
* future. If a connection is already established, the connection's idle timer will be reset
* (so that the connection will not be closed too soon). If the given idle timeout is greater
* than the current idle timeout then the timeout increases to the new value; if less than the
* current timeout it is ignored. (Note that idle timeouts only apply if the existing
* connection is an outgoing connection).
*
* Note that this method (along with send) doesn't block waiting for a connection; it merely
* instructs the proxy thread that it should establish a connection.
*
* @param pubkey - the public key (32-byte binary string) of the service node to connect to
* @param keep_alive - the connection will be kept alive if there was valid activity within
* the past `keep_alive` milliseconds. If an outgoing connection already
* exists, the longer of the existing and the given keep alive is used.
* Note that the default applied here is much longer than the default for an
* implicit connect() by calling send() directly.
* @param hint - if non-empty and a new outgoing connection needs to be made this hint value
* may be used instead of calling the lookup function. (Note that there is no
* guarantee that the hint will be used; it is only usefully specified if the
* connection location has already been incidentally determined).
*/
void connect(const std::string& pubkey, std::chrono::milliseconds keep_alive = 5min, const std::string& hint = "");
/**
* Queue a message to be relayed to the SN identified with the given pubkey without expecting a
* reply. LokiMQ will attempt to relay the message (first connecting and handshaking if not
* already connected to the given SN).
*
* If a new connection it established it will have a relatively short (30s) idle timeout. If
* the connection should stay open longer you should call `connect(pubkey, IDLETIME)` first.
*
* Note that this method (along with connect) doesn't block waiting for a connection or for the
* message to send; it merely instructs the proxy thread that it should send. ZMQ will
* generally try hard to deliver it (reconnecting if the connection fails), but if the
* connection fails persistently the message will eventually be dropped.
*
* @param pubkey - the pubkey to send this to
* @param cmd - the first data frame value which is almost always the remote "category.command" name
* @param opts - any number of std::string and send options. Each send option affects
* how the send works; each string becomes a serialized message part.
*
* Example:
*
* lmq.send(pubkey, "hello", "abc", send_option::hint("tcp://localhost:1234"), "def");
*
* sends the command `hello` to the given pubkey, containing additional message parts "abc" and
* "def", and, if not currently connected, using the given connection hint rather than
* performing a connection address lookup on the pubkey.
*/
template <typename... T>
void send(const std::string& pubkey, const std::string& cmd, const T&... opts);
/**
* Similar to the above, but takes an iterator pair of message parts to send after the value.
*
* @param pubkey - the pubkey to send this to
* @param cmd - the value of the first message part (i.e. the remote command)
* @param first - an input iterator to std::string values
* @param last - the beyond-the-end iterator
* @param opts - any number of send options. This may also contain additional message strings
* which will be appended after the `[first, last)` message parts.
*/
template <typename InputIt, typename... T>
void send(const std::string& pubkey, const std::string& cmd, InputIt first, InputIt end, const T&... opts);
/// The key pair this LokiMQ was created with; if empty keys were given during construction then
/// this returns the generated keys.
const std::string& get_pubkey() const { return pubkey; }
const std::string& get_privkey() const { return privkey; }
};
/// Namespace for options to the send() method
namespace send_option {
/// `serialized` lets you serialize once when sending the same data to many peers by constructing a
/// single serialized option and passing it repeatedly rather than needing to reserialize on each
/// send.
struct serialized {
std::string data;
template <typename T>
serialized(const T& arg) : data{lokimq::bt_serialize(arg)} {}
};
/// Specifies a connection hint when passed in to send(). If there is no current connection to the
/// peer then the hint is used to save a call to the SNRemoteAddress to get the connection location.
/// (Note that there is no guarantee that the given hint will be used or that a SNRemoteAddress call
/// will not also be done.)
struct hint {
std::string connect_hint;
hint(std::string connect_hint) : connect_hint{std::move(connect_hint)} {}
};
/// Does a send() if we already have a connection (incoming or outgoing) with the given peer,
/// otherwise drops the message.
struct optional {};
/// Specifies that the message should be sent only if it can be sent on an existing incoming socket,
/// and dropped otherwise.
struct incoming {};
/// Specifies the idle timeout for the connection - if a new or existing outgoing connection is used
/// for the send and its current idle timeout setting is less than this value then it is updated.
struct keep_alive {
std::chrono::milliseconds time;
keep_alive(std::chrono::milliseconds time) : time{std::move(time)} {}
};
}
namespace detail {
// Sends a control message to the given socket consisting of the command plus optional dict
// data (only sent if the data is non-empty).
void send_control(zmq::socket_t& sock, string_view cmd, std::string data = {});
/// Base case: takes a serializable value and appends it to the message parts
template <typename T>
void apply_send_option(bt_list& parts, bt_dict&, const T& arg) {
parts.push_back(lokimq::bt_serialize(arg));
}
/// `serialized` specialization: lets you serialize once when sending the same data to many peers
template <> inline void apply_send_option(bt_list& parts, bt_dict& , const send_option::serialized& serialized) {
parts.push_back(serialized.data);
}
/// `hint` specialization: sets the hint in the control data
template <> inline void apply_send_option(bt_list&, bt_dict& control_data, const send_option::hint& hint) {
control_data["hint"] = hint.connect_hint;
}
/// `optional` specialization: sets the optional flag in the control data
template <> inline void apply_send_option(bt_list&, bt_dict& control_data, const send_option::optional &) {
control_data["optional"] = 1;
}
/// `incoming` specialization: sets the optional flag in the control data
template <> inline void apply_send_option(bt_list&, bt_dict& control_data, const send_option::incoming &) {
control_data["incoming"] = 1;
}
/// `keep_alive` specialization: increases the outgoing socket idle timeout (if shorter)
template <> inline void apply_send_option(bt_list&, bt_dict& control_data, const send_option::keep_alive& timeout) {
control_data["keep-alive"] = timeout.time.count();
}
/// Calls apply_send_option on each argument and returns a bt_dict with the command plus data stored
/// in the "send" key plus whatever else is implied by any given option arguments.
template <typename InputIt, typename... T>
bt_dict send_control_data(const std::string& cmd, InputIt begin, InputIt end, const T &...opts) {
bt_dict control_data;
bt_list parts{{cmd}};
parts.insert(parts.end(), std::move(begin), std::move(end));
#ifdef __cpp_fold_expressions
(detail::apply_send_option(parts, control_data, opts),...);
#else
(void) std::initializer_list<int>{(detail::apply_send_option(parts, control_data, opts), 0)...};
#endif
control_data["send"] = std::move(parts);
return control_data;
}
} // namespace detail
template <typename InputIt, typename... T>
void LokiMQ::send(const std::string& pubkey, const std::string& cmd, InputIt first, InputIt last, const T &...opts) {
bt_dict control_data = detail::send_control_data(cmd, std::move(first), std::move(last), opts...);
control_data["pubkey"] = pubkey;
detail::send_control(get_control_socket(), "SEND", bt_serialize(control_data));
}
template <typename... T>
void LokiMQ::send(const std::string& pubkey, const std::string& cmd, const T &...opts) {
const std::string* no_it = nullptr;
send(pubkey, cmd, no_it, no_it, opts...);
}
template <typename... Args>
void Message::reply(const std::string& command, Args&&... args) {
if (service_node) lokimq.send(pubkey, command, std::forward<Args>(args)...);
else lokimq.send(pubkey, command, send_option::optional{}, std::forward<Args>(args)...);
}
template <typename... T>
void LokiMQ::log_(LogLevel lvl, const char* file, int line, const T&... stuff) {
if (lvl < log_level())
return;
std::ostringstream os;
#ifdef __cpp_fold_expressions
os << ... << stuff;
#else
(void) std::initializer_list<int>{(os << stuff, 0)...};
#endif
logger(lvl, file, line, os.str());
}
}
// vim:sw=4:et

104
lokimq/string_view.h Normal file
View File

@ -0,0 +1,104 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#pragma once
#include <string>
#ifdef __cpp_lib_string_view
#include <string_view>
namespace lokimq { using string_view = std::string_view; }
#else
#include <ostream>
namespace lokimq {
/// Basic implementation of a subset of std::string_view (enough for what we need in lokimq)
class simple_string_view {
const char *data_;
size_t size_;
public:
constexpr simple_string_view() noexcept : data_{nullptr}, size_{0} {}
constexpr simple_string_view(const simple_string_view&) noexcept = default;
simple_string_view(const std::string& str) : data_{str.data()}, size_{str.size()} {}
constexpr simple_string_view(const char* data, size_t size) noexcept : data_{data}, size_{size} {}
simple_string_view(const char* data) : data_{data}, size_{std::char_traits<char>::length(data)} {}
simple_string_view& operator=(const simple_string_view&) = default;
constexpr const char* data() const noexcept { return data_; }
constexpr size_t size() const noexcept { return size_; }
constexpr bool empty() const noexcept { return size_ == 0; }
operator std::string() const { return {data_, size_}; }
const char* begin() const noexcept { return data_; }
const char* end() const noexcept { return data_ + size_; }
constexpr const char& operator[](size_t pos) const { return data_[pos]; }
constexpr const char& front() const { return *data_; }
constexpr const char& back() const { return data_[size_ - 1]; }
int compare(simple_string_view s) const;
constexpr void remove_prefix(size_t n) { data_ += n; size_ -= n; }
constexpr void remove_suffix(size_t n) { size_ -= n; }
void swap(simple_string_view &s) noexcept { std::swap(data_, s.data_); std::swap(size_, s.size_); }
};
inline bool operator==(simple_string_view lhs, simple_string_view rhs) {
return lhs.size() == rhs.size() && 0 == std::char_traits<char>::compare(lhs.data(), rhs.data(), lhs.size());
};
inline bool operator!=(simple_string_view lhs, simple_string_view rhs) {
return !(lhs == rhs);
}
inline int simple_string_view::compare(simple_string_view s) const {
int cmp = std::char_traits<char>::compare(data_, s.data(), std::min(size_, s.size()));
if (cmp) return cmp;
if (size_ < s.size()) return -1;
else if (size_ > s.size()) return 1;
return 0;
}
inline bool operator<(simple_string_view lhs, simple_string_view rhs) {
return lhs.compare(rhs) < 0;
};
inline bool operator<=(simple_string_view lhs, simple_string_view rhs) {
return lhs.compare(rhs) <= 0;
};
inline bool operator>(simple_string_view lhs, simple_string_view rhs) {
return lhs.compare(rhs) > 0;
};
inline bool operator>=(simple_string_view lhs, simple_string_view rhs) {
return lhs.compare(rhs) >= 0;
};
inline std::ostream& operator<<(std::ostream& os, const simple_string_view& s) {
os.write(s.data(), s.size());
return os;
}
using string_view = simple_string_view;
}
#endif

1
mapbox-variant Submodule

@ -0,0 +1 @@
Subproject commit c94634bbd294204c9ba3f5b267a39582a52e8e5a