Browse Source

Initial LokiMQ release

This library is adapted from lokid's existing quorumnet code (added in
6.x) used for SN-to-SN communication for quorum voting but generalized
to be usable both there and as a basis for other communication channels
with loki projects (for example: wallet-to-lokid communication; loki-ss
and lokinet internal communication with lokid; loki-ss to loki-ss
communication and message passing; perhaps eventually loki p2p traffic).

This initial release compiles but likely has a few warts and bugs that
need ironing out in the implementation before it is production ready.
Some tests will follow.
pull/1/head
Jason Rhinelander 2 years ago
commit
f3d583c520
  1. 6
      .gitmodules
  2. 269
      README.md
  3. 1
      cppzmq
  4. 220
      lokimq/bt_serialize.cpp
  5. 788
      lokimq/bt_serialize.h
  6. 125
      lokimq/hex.h
  7. 1261
      lokimq/lokimq.cpp
  8. 771
      lokimq/lokimq.h
  9. 104
      lokimq/string_view.h
  10. 1
      mapbox-variant

6
.gitmodules

@ -0,0 +1,6 @@
[submodule "mapbox-variant"]
path = mapbox-variant
url = https://github.com/mapbox/variant.git
[submodule "cppzmq"]
path = cppzmq
url = https://github.com/zeromq/cppzmq.git

269
README.md

@ -0,0 +1,269 @@
# LokiMQ - zeromq-based message passing for Loki projects
This C++14 library contains an abstraction layer around ZeroMQ to support integration with Loki
authentication and message passing. It is designed to be usable as the underlying communication
mechanism of SN-to-SN communication ("quorumnet"), the RPC interface used by wallets and local
daemon commands, communication channels between lokid and auxiliary services (storage server,
lokinet), and also provides a local multithreaded job scheduling within a process.
It is not required to use this library to interact with loki components as a client: this is mainly
intended to abstract away much of the server-side handling.
All messages are encrypted (using x25519).
This library makes minimal use of mutexes, and none in the hot paths of the code, instead mostly
relying on ZMQ sockets for synchronization; for more information on this (and why this is generally
much better performing and more scalable) see the ZMQ guide documentation on the topic.
## Basic message structure
LokiMQ messages consist of 1+ part messages where the first part is a string command and remaining
parts are command-specific data.
The command string is one of two types:
`basic` - for basic commands such as authentication (`login`) handled by LokiMQ itself. These
commands may not contain a `.`, and are reserved for LokiMQ itself.
`category.command` - for commands registered by the LokiMQ caller (e.g. lokid). Here `category`
must be at least one character not containing a `.` and `command` may be anything. These categories
and commands are registered according to general function and authentication level (more on this
below). For example, for lokid categories are:
- `system` - is for RPC commands related to the system administration such as mining, getting
sensitive statistics, accessing SN private keys, remote shutdown, etc.
- `sn` - is for SN-to-SN communication such as blink quorum and uptime proof obligation votes.
- `blink` - is for public blink commands (i.e. blink submission) and is only provided by nodes
running as service nodes.
- `blockchain` - is for remote blockchain access such as retrieving blocks and transactions as well
as subscribing to updates for new blocks, transactions, and service node states.
## Command invocation
The application registers categories and registers commands within these categories with callbacks.
The callbacks are passed a LokiMQ::Message object from which the message (plus various connection
information) can be obtained. There is no structure imposed at all on the data passed in subsequent
message parts: it is up to the command itself to deserialize however it wishes (e.g. JSON,
bt-encoded, or any other encoding).
The Message object also provides methods for replying to the caller. Simple replies queue a reply
if the client is still connected. Replies to service nodes can also be "strong" replies: when
replying to a SN that has closed connection with a strong reply we will attempt to reestablish a
connection to deliver the message. In order for this to work the LokiMQ caller must provide a
lookup function to retrieve the remote address given a SN x25519 pubkey.
### Callbacks
Invoked command functions are always invoked with exactly one arguments: a non-const LokiMQ::Message
reference from which the connection info, LokiMQ object, and message data can be obtained. If you
need some extra state data (for example, a reference to some high level object) the LokiMQ object
has an opaque public `void* data` member intended for exactly this purpose.
## Authentication
Each category has access control consisting of three values:
- Auth level, one of:
- None - no authentication required at all, any remote client may invoke this command
- Basic - this requires a basic authentication level (None access is implied)
- Admin - this requires administrative access (Basic access is implied)
- ServiceNode (bool) - if true this requires that the remote connection has proven its identity as
an active service node (via its x25519 key).
- LocalServiceNode (bool) - if true this requires that the local node is running in service node
mode (note that it is *not* required that the local SN be *active*).
Authentication level components are cumulative: for example, a category with Basic auth +
ServiceNode=true + LocalServiceNode=true would only be access if all three conditions are met.
The authentication mechanism works in two ways: defaults based on configuration, and explicit
logins.
Configuration defaults allows controlling the default access for an incoming connection based on its
remote address. Typically this is used to allow connections from localhost (or a unix domain
socket) to automatically be an Admin connection without requiring explicit authentication. This
also allows configuration of how public connections should be treated: for example, a lokid running
as a public RPC server would do so by granting Basic access to all incoming connections.
Explicit logins allow the daemon to specify username/passwords with mapping to Basic or Admin
authentication levels.
Thus, for example, a daemon could be configured to be allow Basic remote access with authentication
(i.e. requiring a username/password login given out to people who should be able to access).
For example, in lokid the categories described above have authentication levels of:
- `system` - Admin
- `sn` - ServiceNode
- `blink` - LocalServiceNode
- `blockchain` - Basic
### Service Node authentication
In order to handle ServiceNode authentication, LokiMQ uses an Allow callback invoked during
connection to determine both whether to allow the connection, and to determine whether the incoming
connection is an active service node.
Note that this status persists for the life of the connection (i.e. it is not rechecked on each
command invocation). If you require stronger protection against being called by
decommissioned/deregistered service nodes from a connection established when the SN was active then
the callback itself will need to verify when invoked.
## Command aliases
Command aliases can be specified. For example, an alias `a.b` -> `c.d` will rewrite any incoming
`a.b` command to `c.d` before handling it. These are applied *before* any authentication is
performed.
The main purpose here is for backwards compatibility either for renaming category or commands, or
for changing command access levels by moving it from one category to another. It's recommended that
such aliases be used only temporarily for version transitions.
## Threads
LokiMQ operates a pool of worker threads to handle jobs. The simplest use just allocates new jobs
to a free worker thread, and we have a "general threads" value to configure how many such threads
are available.
You may, however, also reserve a minimum number of workers per command category. For example, you
could reserve 1 thread for the `sys` category and 2 for the `qnet` category plus 8 general threads.
The general threads will be used most of the time for any categories (including `sys` and `qnet`),
but there will always be at least 1/2 worker threads either currently working on or available for
incoming `system`/`sn` commands. General thread gets used first; only if all general threads are
currently busy *and* a category has unused reserved threads will an additional thread be used.
Note that these actual reserved threads are not exclusive: reserving M of N total threads for a
category simply ensures that no more than (N-M) threads are being used for other categories at any
given time, but the actual jobs may run on any worker thread.
As mentioned above, LokiMQ tries to avoid exceeding the configured general threads value (G)
whenever possible: the only time we will dispatch a job to a worker thread when we have >= G threads
already running is when a new command arrives, the category reserves M threads, and the thread pool
is currently processing fewer than M jobs for that category.
Some examples: assume A and B are commands that take sufficiently long to run that we receive all
commands before the first job is finished. Suppose that A's category reserves 2 threads, B's
category has no reserved threads, and we have 4 general threads configured.
Example 1: commands arrive in order AABBBB. This will not exceed 4 threads: when the third B
arrives there are already 4 jobs running so it gets queued.
Example 2: commands arrive in order AABBAA. This also won't exceed 4 threads: when the third A
arrives there are already 4 jobs running and two of them are already A jobs, so there are no
remaining slots for A jobs.
Example 3: BBBAAA. This won't exceed 5 threads: the first four get started normally. When the
second A arrives there are 4 threads running, but only 1 of them is an A thus there is still a free
slot for A jobs so we start the second A on a fifth thread. The third A, however, has no A jobs
available so gets queued.
Exmaple 4: BBBBBBAAA. At most 6 jobs. The 5th and 6th B's get queued (all general workers are
busy). The first and second get started (there are two unused reserved A slots), the third one gets
queued. The 5th and 6th B's are already interesting on their own: they won't be started until there
are only three active jobs; the third A won't be started until *either* there are only three active
jobs, or one of the other A's finish.
Thus the general thread count should be regarded as the "normal" thread limit and reserved threads
allow an extra burst of thread activity *only if* all general threads are busy with other categories
when a command with reserve threads arrived.
## Internal job queuing
This library supports queuing internal jobs (internally these are in the "" (empty string) category,
which is not externally accessible). These jobs are quite different from ordinary jobs: they have
no authentication and can only be submitted by the program itself to its own worker threads. They
have either no second message part, or else one single message part consists of an opaque void
pointer value. This pointer is passed by value to the registered function, which must take exactly
one `void *` argument.
It is entirely the responsibility of the caller and callee to deal with the `void *` argument,
including construction/destruction/etc. This is very low level but allow the most flexibility. For
example, a caller might do something like:
```C++
// Set up a function that takes ownership:
void hello1(void *data) {
auto* str = static_cast<std::string*>(data);
std::cout << "Hello1 " << *str << "\n";
delete str;
}
LokiMQ::register_task("hello1", &hello1);
// Another function that doesn't take ownership (and handles nullptr):
void hello2(void *data) { //
std::cout << "Hello2 " <<
(data ? *static_cast<std::string*>(data) : "anonymous") <<
"\n";
}
LokiMQ::register_task("hello2", &hello2);
// Later, in the calling code:
const static std::string there{"there"};
void myfunc() {
// ...
lmq.job(&hello1, new std::string{"world"}); // Give up ownership of the pointer
lmq.job(&hello2, &there); // Passing an externally valid pointer
// But don't do this:
//std::string world{"world"};
//lmq.job(&hello2, &world); // Bad: `world` will probably be destroyed
// before the callback gets invoked
}
```
### Dealing with synchronization of jobs
A common pattern is one where a single thread suddenly has some work that can be be parallelized.
We can easily queue all the jobs into the worker thread pool (see above), but the issue then is how
to continue when the work is done. You could (but shouldn't) employ some blocking, locking, mutex +
condition variable monstrosity, but you shouldn't.
Instead LokiMQ provides a mechanism for this by allowing you to submit a batch of jobs with a
completion callback. All jobs will be queued and, when the last one finishes, the finalization
callback will be queued to continue with the task.
From the caller point of view this requires splitting the logic into two parts, a "Before" that sets
up the batch, a "Job" that does the work (multiple times), and an "After" that continues once all
jobs are finished.
For example, the following example shows how you might use it to convert from input values (0 to 49)
to some other output value:
```C++
struct task_data { int input; double result; };
// Called for each job.
void do_my_task(void* in) {
auto& x = *static_cast<task_data*>(in);
x.result = 42.0 * x.input; // Job
}
void start_big_task() {
// ... Before code ...
auto* results = new std::vector<task_data>{50};
lokimq::Batch batch;
for (size_t i = 0; i < results->size(); i++) {
auto* r = (*result)[i];
r->input = i;
batch.add_job(&do_my_task, r);
}
lmq.job(batch, &continue_big_task, results);
// ... to be continued in `continue_big_task` after all the jobs finish
}
// This will be called once all the `do_my_task` calls have completed. (Note that we could be in
// a different thread from the one `start_big_task()` was running in).
void continue_big_task(void* rptr) {
// Put into a unique_ptr to deal with ownership
std::unique_ptr<std::vector<task_data>> results{static_cast<std::vector<int>*>(rptr)};
double sum = 0;
for (auto &r : results) sum += r;
std::cout << "All done, sum = " << sum << "\n";
}
```
This code deliberately does not support blocking to wait for the tasks to finish: if you want such a
bad design you can implement it yourself; LokiMQ isn't going to help you hurt yourself.

1
cppzmq

@ -0,0 +1 @@
Subproject commit 8d5c9a88988dcbebb72939ca0939d432230ffde1

220
lokimq/bt_serialize.cpp

@ -0,0 +1,220 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "bt_serialize.h"
namespace lokimq {
namespace detail {
/// Reads digits into an unsigned 64-bit int.
uint64_t extract_unsigned(string_view& s) {
if (s.empty())
throw bt_deserialize_invalid{"Expected 0-9 but found end of string"};
if (s[0] < '0' || s[0] > '9')
throw bt_deserialize_invalid("Expected 0-9 but found '"s + s[0]);
uint64_t uval = 0;
while (!s.empty() && (s[0] >= '0' && s[0] <= '9')) {
uint64_t bigger = uval * 10 + (s[0] - '0');
s.remove_prefix(1);
if (bigger < uval) // overflow
throw bt_deserialize_invalid("Integer deserialization failed: value is too large for a 64-bit int");
uval = bigger;
}
return uval;
}
void bt_deserialize<string_view>::operator()(string_view& s, string_view& val) {
if (s.size() < 2) throw bt_deserialize_invalid{"Deserialize failed: given data is not an bt-encoded string"};
if (s[0] < '0' || s[0] > '9')
throw bt_deserialize_invalid_type{"Expected 0-9 but found '"s + s[0] + "'"};
uint64_t len = extract_unsigned(s);
if (s.empty() || s[0] != ':')
throw bt_deserialize_invalid{"Did not find expected ':' during string deserialization"};
s.remove_prefix(1);
if (len > s.size())
throw bt_deserialize_invalid{"String deserialization failed: encoded string length is longer than the serialized data"};
val = {s.data(), len};
s.remove_prefix(len);
}
// Check that we are on a 2's complement architecture. It's highly unlikely that this code ever
// runs on a non-2s-complement architecture (especially since C++20 requires a two's complement
// signed value behaviour), but check at compile time anyway because we rely on these relations
// below.
static_assert(std::numeric_limits<int64_t>::min() + std::numeric_limits<int64_t>::max() == -1 &&
static_cast<uint64_t>(std::numeric_limits<int64_t>::max()) + uint64_t{1} == (uint64_t{1} << 63),
"Non 2s-complement architecture not supported!");
std::pair<maybe_signed_int64_t, bool> bt_deserialize_integer(string_view& s) {
// Smallest possible encoded integer is 3 chars: "i0e"
if (s.size() < 3) throw bt_deserialize_invalid("Deserialization failed: end of string found where integer expected");
if (s[0] != 'i') throw bt_deserialize_invalid_type("Deserialization failed: expected 'i', found '"s + s[0] + '\'');
s.remove_prefix(1);
std::pair<maybe_signed_int64_t, bool> result;
if (s[0] == '-') {
result.second = true;
s.remove_prefix(1);
}
uint64_t uval = extract_unsigned(s);
if (s.empty())
throw bt_deserialize_invalid("Integer deserialization failed: encountered end of string before integer was finished");
if (s[0] != 'e')
throw bt_deserialize_invalid("Integer deserialization failed: expected digit or 'e', found '"s + s[0] + '\'');
s.remove_prefix(1);
if (result.second) { // negative
if (uval > (uint64_t{1} << 63))
throw bt_deserialize_invalid("Deserialization of integer failed: negative integer value is too large for a 64-bit signed int");
result.first.i64 = -uval;
} else {
result.first.u64 = uval;
}
return result;
}
template struct bt_deserialize<int64_t>;
template struct bt_deserialize<uint64_t>;
void bt_deserialize<bt_value, void>::operator()(string_view& s, bt_value& val) {
if (s.size() < 2) throw bt_deserialize_invalid("Deserialization failed: end of string found where bt-encoded value expected");
switch (s[0]) {
case 'd': {
bt_dict dict;
bt_deserialize<bt_dict>{}(s, dict);
val = std::move(dict);
break;
}
case 'l': {
bt_list list;
bt_deserialize<bt_list>{}(s, list);
val = std::move(list);
break;
}
case 'i': {
auto read = bt_deserialize_integer(s);
val = read.first.i64; // We only store an i64, but can get a u64 out of it via get<uint64_t>(val)
break;
}
case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': {
std::string str;
bt_deserialize<std::string>{}(s, str);
val = std::move(str);
break;
}
default:
throw bt_deserialize_invalid("Deserialize failed: encountered invalid value '"s + s[0] + "'; expected one of [0-9idl]");
}
}
} // namespace detail
bt_list_consumer::bt_list_consumer(string_view data_) : data{std::move(data_)} {
if (data.empty()) throw std::runtime_error{"Cannot create a bt_list_consumer with an empty string_view"};
if (data[0] != 'l') throw std::runtime_error{"Cannot create a bt_list_consumer with non-list data"};
data.remove_prefix(1);
}
/// Attempt to parse the next value as a string (and advance just past it). Throws if the next
/// value is not a string.
string_view bt_list_consumer::consume_string() {
if (data.empty())
throw bt_deserialize_invalid{"expected a string, but reached end of data"};
else if (!is_string())
throw bt_deserialize_invalid_type{"expected a string, but found "s + data.front()};
string_view next{data}, result;
detail::bt_deserialize<string_view>{}(next, result);
data = next;
return result;
}
/// Consumes a value without returning it.
void bt_list_consumer::skip_value() {
if (is_string())
consume_string();
else if (is_integer())
detail::bt_deserialize_integer(data);
else if (is_list())
consume_list_data();
else if (is_dict())
consume_dict_data();
else
throw bt_deserialize_invalid_type{"next bt value has unknown type"};
}
string_view bt_list_consumer::consume_list_data() {
auto start = data.begin();
if (data.size() < 2 || !is_list()) throw bt_deserialize_invalid_type{"next bt value is not a list"};
data.remove_prefix(1); // Descend into the sublist, consume the "l"
while (!is_finished()) {
skip_value();
if (data.empty())
throw bt_deserialize_invalid{"bt list consumption failed: hit the end of string before the list was done"};
}
data.remove_prefix(1); // Back out from the sublist, consume the "e"
return {start, static_cast<size_t>(std::distance(start, data.begin()))};
}
string_view bt_list_consumer::consume_dict_data() {
auto start = data.begin();
if (data.size() < 2 || !is_dict()) throw bt_deserialize_invalid_type{"next bt value is not a dict"};
data.remove_prefix(1); // Descent into the dict, consumer the "d"
while (!is_finished()) {
consume_string(); // Key is always a string
if (!data.empty())
skip_value();
if (data.empty())
throw bt_deserialize_invalid{"bt dict consumption failed: hit the end of string before the dict was done"};
}
data.remove_prefix(1); // Back out of the dict, consume the "e"
return {start, static_cast<size_t>(std::distance(start, data.begin()))};
}
bt_dict_consumer::bt_dict_consumer(string_view data_) {
data = std::move(data_);
if (data.empty()) throw std::runtime_error{"Cannot create a bt_dict_consumer with an empty string_view"};
if (data.size() < 2 || data[0] != 'd') throw std::runtime_error{"Cannot create a bt_dict_consumer with non-dict data"};
data.remove_prefix(1);
}
bool bt_dict_consumer::consume_key() {
if (key_.data())
return true;
if (data.empty()) throw bt_deserialize_invalid_type{"expected a key or dict end, found end of string"};
if (data[0] == 'e') return false;
key_ = bt_list_consumer::consume_string();
if (data.empty() || data[0] == 'e')
throw bt_deserialize_invalid{"dict key isn't followed by a value"};
return true;
}
} // namespace lokimq

788
lokimq/bt_serialize.h

@ -0,0 +1,788 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <iostream>
#pragma once
#include <vector>
#include <list>
#include <unordered_map>
#include <algorithm>
#include <functional>
#include <cstring>
#include <ostream>
#include <sstream>
#include "string_view.h"
#include "mapbox/variant.hpp"
namespace lokimq {
using namespace std::literals;
/** \file
* LokiMQ serialization for internal commands is very simple: we support two primitive types,
* strings and integers, and two container types, lists and dicts with string keys. On the wire
* these go in BitTorrent byte encoding as described in BEP-0003
* (https://www.bittorrent.org/beps/bep_0003.html#bencoding).
*
* On the C++ side, on input we allow strings, integral types, STL-like containers of these types,
* and STL-like containers of pairs with a string first value and any of these types as second
* value. We also accept std::variants (if compiled with std::variant support, i.e. in C++17 mode)
* that contain any of these, and mapbox::util::variants (the internal type used for its recursive
* support).
*
* One minor deviation from BEP-0003 is that we don't support serializing values that don't fit in a
* 64-bit integer (BEP-0003 specifies arbitrary precision integers).
*
* On deserialization we can either deserialize into a mapbox::util::variant that supports everything, or
* we can fill a container of your given type (though this fails if the container isn't compatible
* with the deserialized data).
*/
/// Exception throw if deserialization fails
class bt_deserialize_invalid : public std::invalid_argument {
using std::invalid_argument::invalid_argument;
};
/// A more specific subclass that is thown if the serialization type is an initial mismatch: for
/// example, trying deserializing an int but the next thing in input is a list. This is not,
/// however, thrown if the type initially looks fine but, say, a nested serialization fails. This
/// error will only be thrown when the input stream has not been advanced (and so can be tried for a
/// different type).
class bt_deserialize_invalid_type : public bt_deserialize_invalid {
using bt_deserialize_invalid::bt_deserialize_invalid;
};
class bt_list;
class bt_dict;
/// Recursive generic type that can fully represent everything valid for a BT serialization.
using bt_value = mapbox::util::variant<
std::string,
int64_t,
mapbox::util::recursive_wrapper<bt_list>,
mapbox::util::recursive_wrapper<bt_dict>
>;
/// Very thin wrapper around a std::list<bt_value> that holds a list of generic values (though *any*
/// compatible data type can be used).
class bt_list : public std::list<bt_value> {
using std::list<bt_value>::list;
};
/// Very thin wrapper around a std::unordered_map<bt_value> that holds a list of string -> generic
/// value pairs (though *any* compatible data type can be used).
class bt_dict : public std::unordered_map<std::string, bt_value> {
using std::unordered_map<std::string, bt_value>::unordered_map;
};
#ifdef __cpp_lib_void_t
using std::void_t;
#else
/// C++17 void_t backport
template <typename... Ts> struct void_t_impl { using type = void; };
template <typename... Ts> using void_t = typename void_t_impl<Ts...>::type;
#endif
namespace detail {
/// Reads digits into an unsigned 64-bit int.
uint64_t extract_unsigned(string_view& s);
inline uint64_t extract_unsigned(string_view&& s) { return extract_unsigned(s); }
// Fallback base case; we only get here if none of the partial specializations below work
template <typename T, typename SFINAE = void>
struct bt_serialize { static_assert(!std::is_same<T, T>::value, "Cannot serialize T: unsupported type for bt serialization"); };
template <typename T, typename SFINAE = void>
struct bt_deserialize { static_assert(!std::is_same<T, T>::value, "Cannot deserialize T: unsupported type for bt deserialization"); };
/// Checks that we aren't at the end of a string view and throws if we are.
inline void bt_need_more(const string_view &s) {
if (s.empty())
throw bt_deserialize_invalid{"Unexpected end of string while deserializing"};
}
union maybe_signed_int64_t { int64_t i64; uint64_t u64; };
/// Deserializes a signed or unsigned 64-bit integer from an input stream. Sets the second bool to
/// true iff the value is int64_t because a negative value was read. Throws an exception if the
/// read value doesn't fit in a int64_t (if negative) or a uint64_t (if positive). Removes consumed
/// characters from the string_view.
std::pair<maybe_signed_int64_t, bool> bt_deserialize_integer(string_view& s);
/// Integer specializations
template <typename T>
struct bt_serialize<T, std::enable_if_t<std::is_integral<T>::value>> {
static_assert(sizeof(T) <= sizeof(uint64_t), "Serialization of integers larger than uint64_t is not supported");
void operator()(std::ostream &os, const T &val) {
// Cast 1-byte types to a larger type to avoid iostream interpreting them as single characters
using output_type = std::conditional_t<(sizeof(T) > 1), T, std::conditional_t<std::is_signed<T>::value, int, unsigned>>;
os << 'i' << static_cast<output_type>(val) << 'e';
}
};
template <typename T>
struct bt_deserialize<T, std::enable_if_t<std::is_integral<T>::value>> {
void operator()(string_view& s, T &val) {
constexpr uint64_t umax = static_cast<uint64_t>(std::numeric_limits<T>::max());
constexpr int64_t smin = static_cast<int64_t>(std::numeric_limits<T>::min()),
smax = static_cast<int64_t>(std::numeric_limits<T>::max());
auto read = bt_deserialize_integer(s);
if (std::is_signed<T>::value) {
if (!read.second) { // read a positive value
if (read.first.u64 > umax)
throw bt_deserialize_invalid("Integer deserialization failed: found too-large value " + std::to_string(read.first.u64) + " > " + std::to_string(umax));
val = static_cast<T>(read.first.u64);
} else {
bool oob = read.first.i64 < smin || read.first.i64 > smax;
if (sizeof(T) < sizeof(int64_t) && oob)
throw bt_deserialize_invalid("Integer deserialization failed: found out-of-range value " + std::to_string(read.first.i64) + " not in [" + std::to_string(smin) + "," + std::to_string(smax) + "]");
val = static_cast<T>(read.first.i64);
}
} else {
if (read.second)
throw bt_deserialize_invalid("Integer deserialization failed: found negative value " + std::to_string(read.first.i64) + " but type is unsigned");
if (sizeof(T) < sizeof(uint64_t) && read.first.u64 > umax)
throw bt_deserialize_invalid("Integer deserialization failed: found too-large value " + std::to_string(read.first.u64) + " > " + std::to_string(umax));
val = static_cast<T>(read.first.u64);
}
}
};
extern template struct bt_deserialize<int64_t>;
extern template struct bt_deserialize<uint64_t>;
template <>
struct bt_serialize<string_view> {
void operator()(std::ostream &os, const string_view &val) { os << val.size(); os.put(':'); os.write(val.data(), val.size()); }
};
template <>
struct bt_deserialize<string_view> {
void operator()(string_view& s, string_view& val);
};
/// String specialization
template <>
struct bt_serialize<std::string> {
void operator()(std::ostream &os, const std::string &val) { bt_serialize<string_view>{}(os, val); }
};
template <>
struct bt_deserialize<std::string> {
void operator()(string_view& s, std::string& val) { string_view view; bt_deserialize<string_view>{}(s, view); val = {view.data(), view.size()}; }
};
/// char * and string literals -- we allow serialization for convenience, but not deserialization
template <>
struct bt_serialize<char *> {
void operator()(std::ostream &os, const char *str) { bt_serialize<string_view>{}(os, {str, std::strlen(str)}); }
};
template <size_t N>
struct bt_serialize<char[N]> {
void operator()(std::ostream &os, const char *str) { bt_serialize<string_view>{}(os, {str, N-1}); }
};
/// Partial dict validity; we don't check the second type for serializability, that will be handled
/// via the base case static_assert if invalid.
template <typename T, typename = void> struct is_bt_input_dict_container : std::false_type {};
template <typename T>
struct is_bt_input_dict_container<T, std::enable_if_t<
std::is_same<std::string, std::remove_cv_t<typename T::value_type::first_type>>::value,
void_t<typename T::const_iterator /* is const iterable */,
typename T::value_type::second_type /* has a second type */>>>
: std::true_type {};
/// Determines whether the type looks like something we can insert into (using `v.insert(v.end(), x)`)
template <typename T, typename = void> struct is_bt_insertable : std::false_type {};
template <typename T>
struct is_bt_insertable<T,
void_t<decltype(std::declval<T>().insert(std::declval<T>().end(), std::declval<typename T::value_type>()))>>
: std::true_type {};
/// Determines whether the given type looks like a compatible map (i.e. has std::string keys) that
/// we can insert into.
template <typename T, typename = void> struct is_bt_output_dict_container : std::false_type {};
template <typename T>
struct is_bt_output_dict_container<T, std::enable_if_t<
std::is_same<std::string, std::remove_cv_t<typename T::value_type::first_type>>::value &&
is_bt_insertable<T>::value,
void_t<typename T::value_type::second_type /* has a second type */>>>
: std::true_type {};
/// Specialization for a dict-like container (such as an unordered_map). We accept anything for a
/// dict that is const iterable over something that looks like a pair with std::string for first
/// value type. The value (i.e. second element of the pair) also must be serializable.
template <typename T>
struct bt_serialize<T, std::enable_if_t<is_bt_input_dict_container<T>::value>> {
using second_type = typename T::value_type::second_type;
using ref_pair = std::reference_wrapper<const typename T::value_type>;
void operator()(std::ostream &os, const T &dict) {
os << 'd';
std::vector<ref_pair> pairs;
pairs.reserve(dict.size());
for (const auto &pair : dict)
pairs.emplace(pairs.end(), pair);
std::sort(pairs.begin(), pairs.end(), [](ref_pair a, ref_pair b) { return a.get().first < b.get().first; });
for (auto &ref : pairs) {
bt_serialize<std::string>{}(os, ref.get().first);
bt_serialize<second_type>{}(os, ref.get().second);
}
os << 'e';
}
};
template <typename T>
struct bt_deserialize<T, std::enable_if_t<is_bt_output_dict_container<T>::value>> {
using second_type = typename T::value_type::second_type;
void operator()(string_view& s, T& dict) {
// Smallest dict is 2 bytes "de", for an empty dict.
if (s.size() < 2) throw bt_deserialize_invalid("Deserialization failed: end of string found where dict expected");
if (s[0] != 'd') throw bt_deserialize_invalid_type("Deserialization failed: expected 'd', found '"s + s[0] + "'"s);
s.remove_prefix(1);
dict.clear();
bt_deserialize<std::string> key_deserializer;
bt_deserialize<second_type> val_deserializer;
while (!s.empty() && s[0] != 'e') {
std::string key;
second_type val;
key_deserializer(s, key);
val_deserializer(s, val);
dict.insert(dict.end(), typename T::value_type{std::move(key), std::move(val)});
}
if (s.empty())
throw bt_deserialize_invalid("Deserialization failed: encountered end of string before dict was finished");
s.remove_prefix(1); // Consume the 'e'
}
};
/// Accept anything that looks iterable; value serialization validity isn't checked here (it fails
/// via the base case static assert).
template <typename T, typename = void> struct is_bt_input_list_container : std::false_type {};
template <typename T>
struct is_bt_input_list_container<T, std::enable_if_t<
!std::is_same<T, std::string>::value &&
!is_bt_input_dict_container<T>::value,
void_t<typename T::const_iterator, typename T::value_type>>>
: std::true_type {};
template <typename T, typename = void> struct is_bt_output_list_container : std::false_type {};
template <typename T>
struct is_bt_output_list_container<T, std::enable_if_t<
!std::is_same<T, std::string>::value &&
!is_bt_output_dict_container<T>::value &&
is_bt_insertable<T>::value>>
: std::true_type {};
/// List specialization
template <typename T>
struct bt_serialize<T, std::enable_if_t<is_bt_input_list_container<T>::value>> {
void operator()(std::ostream& os, const T& list) {
os << 'l';
for (const auto &v : list)
bt_serialize<std::remove_cv_t<typename T::value_type>>{}(os, v);
os << 'e';
}
};
template <typename T>
struct bt_deserialize<T, std::enable_if_t<is_bt_output_list_container<T>::value>> {
using value_type = typename T::value_type;
void operator()(string_view& s, T& list) {
// Smallest list is 2 bytes "le", for an empty list.
if (s.size() < 2) throw bt_deserialize_invalid("Deserialization failed: end of string found where list expected");
if (s[0] != 'l') throw bt_deserialize_invalid_type("Deserialization failed: expected 'l', found '"s + s[0] + "'"s);
s.remove_prefix(1);
list.clear();
bt_deserialize<value_type> deserializer;
while (!s.empty() && s[0] != 'e') {
value_type v;
deserializer(s, v);
list.insert(list.end(), std::move(v));
}
if (s.empty())
throw bt_deserialize_invalid("Deserialization failed: encountered end of string before list was finished");
s.remove_prefix(1); // Consume the 'e'
}
};
/// variant visitor; serializes whatever is contained
class bt_serialize_visitor {
std::ostream &os;
public:
using result_type = void;
bt_serialize_visitor(std::ostream &os) : os{os} {}
template <typename T> void operator()(const T &val) const {
bt_serialize<T>{}(os, val);
}
};
template <typename T>
using is_bt_deserializable = std::integral_constant<bool,
std::is_same<T, std::string>::value || std::is_integral<T>::value ||
is_bt_output_dict_container<T>::value || is_bt_output_list_container<T>::value>;
// General template and base case; this base will only actually be invoked when Ts... is empty,
// which means we reached the end without finding any variant type capable of holding the value.
template <typename SFINAE, typename Variant, typename... Ts>
struct bt_deserialize_try_variant_impl {
void operator()(string_view&, Variant&) {
throw bt_deserialize_invalid("Deserialization failed: could not deserialize value into any variant type");
}
};
template <typename... Ts, typename Variant>
void bt_deserialize_try_variant(string_view& s, Variant& variant) {
bt_deserialize_try_variant_impl<void, Variant, Ts...>{}(s, variant);
}
template <typename Variant, typename T, typename... Ts>
struct bt_deserialize_try_variant_impl<std::enable_if_t<is_bt_deserializable<T>::value>, Variant, T, Ts...> {
void operator()(string_view& s, Variant& variant) {
if ( is_bt_output_list_container<T>::value ? s[0] == 'l' :
is_bt_output_dict_container<T>::value ? s[0] == 'd' :
std::is_integral<T>::value ? s[0] == 'i' :
std::is_same<T, std::string>::value ? s[0] >= '0' && s[0] <= '9' :
false) {
T val;
bt_deserialize<T>{}(s, val);
variant = std::move(val);
} else {
bt_deserialize_try_variant<Ts...>(s, variant);
}
}
};
template <typename Variant, typename T, typename... Ts>
struct bt_deserialize_try_variant_impl<std::enable_if_t<!is_bt_deserializable<T>::value>, Variant, T, Ts...> {
void operator()(string_view& s, Variant& variant) {
// Unsupported deserialization type, skip it
bt_deserialize_try_variant<Ts...>(s, variant);
}
};
template <>
struct bt_deserialize<bt_value, void> {
void operator()(string_view& s, bt_value& val);
};
template <typename... Ts>
struct bt_serialize<mapbox::util::variant<Ts...>> {
void operator()(std::ostream& os, const mapbox::util::variant<Ts...>& val) {
mapbox::util::apply_visitor(bt_serialize_visitor{os}, val);
}
};
template <typename... Ts>
struct bt_deserialize<mapbox::util::variant<Ts...>> {
void operator()(string_view& s, mapbox::util::variant<Ts...>& val) {
bt_deserialize_try_variant<Ts...>(s, val);
}
};
#ifdef __cpp_lib_variant
/// C++17 std::variant support
template <typename... Ts>
struct bt_serialize<std::variant<Ts...>> {
void operator()(std::ostream &os, const std::variant<Ts...>& val) {
mapbox::util::apply_visitor(bt_serialize_visitor{os}, val);
}
};
template <typename... Ts>
struct bt_deserialize<std::variant<Ts...>> {
void operator()(string_view& s, std::variant<Ts...>& val) {
bt_deserialize_try_variant<Ts...>(s, val);
}
};
#endif
template <typename T>
struct bt_stream_serializer {
const T &val;
explicit bt_stream_serializer(const T &val) : val{val} {}
operator std::string() const {
std::ostringstream oss;
oss << *this;
return oss.str();
}
};
template <typename T>
std::ostream &operator<<(std::ostream &os, const bt_stream_serializer<T> &s) {
bt_serialize<T>{}(os, s.val);
return os;
}
} // namespace detail
/// Returns a wrapper around a value reference that can serialize the value directly to an output
/// stream. This class is intended to be used inline (i.e. without being stored) as in:
///
/// std::list<int> my_list{{1,2,3}};
/// std::cout << bt_serializer(my_list);
///
/// While it is possible to store the returned object and use it, such as:
///
/// auto encoded = bt_serializer(42);
/// std::cout << encoded;
///
/// this approach is not generally recommended: the returned object stores a reference to the
/// passed-in type, which may not survive. If doing this note that it is the caller's
/// responsibility to ensure the serializer is not used past the end of the lifetime of the value
/// being serialized.
///
/// Also note that serializing directly to an output stream is more efficient as no intermediate
/// string containing the entire serialization has to be constructed.
///
template <typename T>
detail::bt_stream_serializer<T> bt_serializer(const T &val) { return detail::bt_stream_serializer<T>{val}; }
/// Serializes the given value into a std::string.
///
/// int number = 42;
/// std::string encoded = bt_serialize(number);
/// // Equivalent:
/// //auto encoded = (std::string) bt_serialize(number);
///
/// This takes any serializable type: integral types, strings, lists of serializable types, and
/// string->value maps of serializable types.
template <typename T>
std::string bt_serialize(const T &val) { return bt_serializer(val); }
/// Deserializes the given string view directly into `val`. Usage:
///
/// std::string encoded = "i42e";
/// int value;
/// bt_deserialize(encoded, value); // Sets value to 42
///
template <typename T, std::enable_if_t<!std::is_const<T>::value, int> = 0>
void bt_deserialize(string_view s, T& val) {
return detail::bt_deserialize<T>{}(s, val);
}
/// Deserializes the given string_view into a `T`, which is returned.
///
/// std::string encoded = "li1ei2ei3ee"; // bt-encoded list of ints: [1,2,3]
/// auto mylist = bt_deserialize<std::list<int>>(encoded);
///
template <typename T>
T bt_deserialize(string_view s) {
T val;
bt_deserialize(s, val);
return val;
}
/// Deserializes the given value into a generic `bt_value` type (mapbox::util::variant) which is capable
/// of holding all possible BT-encoded values (including recursion).
///
/// Example:
///
/// std::string encoded = "i42e";
/// auto val = bt_get(encoded);
/// int v = get_int<int>(val); // fails unless the encoded value was actually an integer that
/// // fits into an `int`
///
inline bt_value bt_get(string_view s) {
return bt_deserialize<bt_value>(s);
}
/// Helper functions to extract a value of some integral type from a bt_value which contains an
/// integer. Does range checking, throwing std::overflow_error if the stored value is outside the
/// range of the target type.
///
/// Example:
///
/// std::string encoded = "i123456789e";
/// auto val = bt_get(encoded);
/// auto v = get_int<uint32_t>(val); // throws if the decoded value doesn't fit in a uint32_t
template <typename IntType, std::enable_if_t<std::is_integral<IntType>::value, int> = 0>
IntType get_int(const bt_value &v) {
// It's highly unlikely that this code ever runs on a non-2s-complement architecture, but check
// at compile time if converting to a uint64_t (because while int64_t -> uint64_t is
// well-defined, uint64_t -> int64_t only does the right thing under 2's complement).
static_assert(!std::is_unsigned<IntType>::value || sizeof(IntType) != sizeof(int64_t) || -1 == ~0,
"Non 2s-complement architecture not supported!");
int64_t value = mapbox::util::get<int64_t>(v);
if (sizeof(IntType) < sizeof(int64_t)) {
if (value > static_cast<int64_t>(std::numeric_limits<IntType>::max())
|| value < static_cast<int64_t>(std::numeric_limits<IntType>::min()))
throw std::overflow_error("Unable to extract integer value: stored value is outside the range of the requested type");
}
return static_cast<IntType>(value);
}
/// Class that allows you to walk through a bt-encoded list in memory without copying or allocating
/// memory. It accesses existing memory directly and so the caller must ensure that the referenced
/// memory stays valid for the lifetime of the bt_list_consumer object.
class bt_list_consumer {
protected:
string_view data;
bt_list_consumer() = default;
public:
bt_list_consumer(string_view data_);
/// Copy constructor. Making a copy copies the current position so can be used for multipass
/// iteration through a list.
bt_list_consumer(const bt_list_consumer&) = default;
bt_list_consumer& operator=(const bt_list_consumer&) = default;
/// Returns true if the next value indicates the end of the list
bool is_finished() const { return data.front() == 'e'; }
/// Returns true if the next element looks like an encoded string
bool is_string() const { return data.front() >= '0' && data.front() <= '9'; }
/// Returns true if the next element looks like an encoded integer
bool is_integer() const { return data.front() == 'i'; }
/// Returns true if the next element looks like an encoded list
bool is_list() const { return data.front() == 'l'; }
/// Returns true if the next element looks like an encoded dict
bool is_dict() const { return data.front() == 'd'; }
/// Attempt to parse the next value as a string (and advance just past it). Throws if the next
/// value is not a string.
string_view consume_string();
/// Attempts to parse the next value as an integer (and advance just past it). Throws if the
/// next value is not an integer.
template <typename IntType>
IntType consume_integer() {
if (!is_integer()) throw bt_deserialize_invalid_type{"next value is not an integer"};
string_view next{data};
IntType ret;
detail::bt_deserialize<IntType>{}(next, ret);
data = next;
return ret;
}
/// Consumes a list, return it as a list-like type. This typically requires dynamic allocation,
/// but only has to parse the data once. Compare with consume_list_data() which allows
/// alloc-free traversal, but requires parsing twice (if the contents are to be used).
template <typename T = bt_list>
T consume_list() {
T list;
consume_list(list);
return list;
}
/// Same as above, but takes a pre-existing list-like data type.
template <typename T>
void consume_list(T& list) {
if (!is_list()) throw bt_deserialize_invalid_type{"next bt value is not a list"};
string_view n{data};
detail::bt_deserialize<T>{}(n, list);
data = n;
}
/// Consumes a dict, return it as a dict-like type. This typically requires dynamic allocation,
/// but only has to parse the data once. Compare with consume_dict_data() which allows
/// alloc-free traversal, but requires parsing twice (if the contents are to be used).
template <typename T = bt_dict>
T consume_dict() {
T dict;
consume_dict(dict);
return dict;
}
/// Same as above, but takes a pre-existing dict-like data type.
template <typename T>
void consume_dict(T& dict) {
if (!is_dict()) throw bt_deserialize_invalid_type{"next bt value is not a dict"};
string_view n{data};
detail::bt_deserialize<T>{}(n, dict);
data = n;
}
/// Consumes a value without returning it.
void skip_value();
/// Attempts to parse the next value as a list and returns the string_view that contains the
/// entire thing. This is recursive into both lists and dicts and likely to be quite
/// inefficient for large, nested structures (unless the values only need to be skipped but
/// aren't separately needed). This, however, does not require dynamic memory allocation.
string_view consume_list_data();
/// Attempts to parse the next value as a dict and returns the string_view that contains the
/// entire thing. This is recursive into both lists and dicts and likely to be quite
/// inefficient for large, nested structures (unless the values only need to be skipped but
/// aren't separately needed). This, however, does not require dynamic memory allocation.
string_view consume_dict_data();
};
/// Class that allows you to walk through key-value pairs of a bt-encoded dict in memory without
/// copying or allocating memory. It accesses existing memory directly and so the caller must
/// ensure that the referenced memory stays valid for the lifetime of the bt_dict_consumer object.
class bt_dict_consumer : private bt_list_consumer {
string_view key_;
/// Consume the key if not already consumed and there is a key present (rather than 'e').
/// Throws exception if what should be a key isn't a string, or if the key consumes the entire
/// data (i.e. requires that it be followed by something). Returns true if the key was consumed
/// (either now or previously and cached).
bool consume_key();
/// Clears the cached key and returns it. Must have already called consume_key directly or
/// indirectly via one of the `is_{...}` methods.
string_view flush_key() {
string_view k;
k.swap(key_);
return k;
}
public:
bt_dict_consumer(string_view data_);
/// Copy constructor. Making a copy copies the current position so can be used for multipass
/// iteration through a list.
bt_dict_consumer(const bt_dict_consumer&) = default;
bt_dict_consumer& operator=(const bt_dict_consumer&) = default;
/// Returns true if the next value indicates the end of the dict
bool is_finished() { return !consume_key() && data.front() == 'e'; }
/// Operator bool is an alias for `!is_finished()`
operator bool() { return !is_finished(); }
/// Returns true if the next value looks like an encoded string
bool is_string() { return consume_key() && data.front() >= '0' && data.front() <= '9'; }
/// Returns true if the next element looks like an encoded integer
bool is_integer() { return consume_key() && data.front() == 'i'; }
/// Returns true if the next element looks like an encoded list
bool is_list() { return consume_key() && data.front() == 'l'; }
/// Returns true if the next element looks like an encoded dict
bool is_dict() { return consume_key() && data.front() == 'd'; }
/// Returns the key of the next pair. This does not have to be called; it is also returned by
/// all of the other consume_* methods. The value is cached whether called here or by some
/// other method; accessing it multiple times simple accesses the cache until the next value is
/// consumed.
string_view key() {
if (!consume_key())
throw bt_deserialize_invalid{"Cannot access next key: at the end of the dict"};
return key_;
}
/// Attempt to parse the next value as a string->string pair (and advance just past it). Throws
/// if the next value is not a string.
std::pair<string_view, string_view> consume_string();
/// Attempts to parse the next value as an string->integer pair (and advance just past it).
/// Throws if the next value is not an integer.
template <typename IntType>
std::pair<string_view, IntType> consume_integer() {
if (!is_integer()) throw bt_deserialize_invalid_type{"next bt dict value is not an integer"};
std::pair<string_view, IntType> ret;
ret.second = bt_list_consumer::consume_integer<IntType>();
ret.first = flush_key();
return ret;
}
/// Consumes a string->list pair, return it as a list-like type. This typically requires
/// dynamic allocation, but only has to parse the data once. Compare with consume_list_data()
/// which allows alloc-free traversal, but requires parsing twice (if the contents are to be
/// used).
template <typename T = bt_list>
std::pair<string_view, T> consume_list() {
std::pair<string_view, T> pair;
pair.first = consume_list(pair.second);
return pair;
}
/// Same as above, but takes a pre-existing list-like data type. Returns the key.
template <typename T>
string_view consume_list(T& list) {
if (!is_list()) throw bt_deserialize_invalid_type{"next bt value is not a list"};
bt_list_consumer::consume_list(list);
return flush_key();
}
/// Consumes a string->dict pair, return it as a dict-like type. This typically requires
/// dynamic allocation, but only has to parse the data once. Compare with consume_dict_data()
/// which allows alloc-free traversal, but requires parsing twice (if the contents are to be
/// used).
template <typename T = bt_dict>
std::pair<string_view, T> consume_dict() {
std::pair<string_view, T> pair;
pair.first = consume_dict(pair.second);
return pair;
}
/// Same as above, but takes a pre-existing dict-like data type. Returns the key.
template <typename T>
string_view consume_dict(T& dict) {
if (!is_dict()) throw bt_deserialize_invalid_type{"next bt value is not a dict"};
bt_list_consumer::consume_dict(dict);
return flush_key();
}
/// Attempts to parse the next value as a string->list pair and returns the string_view that
/// contains the entire thing. This is recursive into both lists and dicts and likely to be
/// quite inefficient for large, nested structures (unless the values only need to be skipped
/// but aren't separately needed). This, however, does not require dynamic memory allocation.
std::pair<string_view, string_view> consume_list_data() {
if (data.size() < 2 || !is_list()) throw bt_deserialize_invalid_type{"next bt dict value is not a list"};
return {flush_key(), bt_list_consumer::consume_list_data()};
}
/// Attempts to parse the next value as a string->dict pair and returns the string_view that
/// contains the entire thing. This is recursive into both lists and dicts and likely to be
/// quite inefficient for large, nested structures (unless the values only need to be skipped
/// but aren't separately needed). This, however, does not require dynamic memory allocation.
std::pair<string_view, string_view> consume_dict_data() {
if (data.size() < 2 || !is_dict()) throw bt_deserialize_invalid_type{"next bt dict value is not a dict"};
return {flush_key(), bt_list_consumer::consume_dict_data()};
}
/// Skips ahead until we find the first key >= the given key or reach the end of the dict.
/// Returns true if we found an exact match, false if we reached some greater value or the end.
/// If we didn't hit the end, the next `consumer_*()` call will return the key-value pair we
/// found (either the exact match or the first key greater than the requested key).
///
/// Two important notes:
///
/// - properly encoded bt dicts must have lexicographically sorted keys, and this method assumes
/// that the input is correctly sorted (and thus if we find a greater value then your key does
/// not exist).
/// - this is irreversible; you cannot returned to skipped values without reparsing. (You *can*
/// however, make a copy of the bt_dict_consumer before calling and use the copy to return to
/// the pre-skipped position).
bool skip_until(string_view find) {
while (consume_key() && key_ < find) {
flush_key();
skip_value();
}
return key_ == find;
}
};
} // namespace lokimq

125
lokimq/hex.h

@ -0,0 +1,125 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#pragma once
#include <string>
#include <array>
#include <iterator>
#include <cassert>
namespace lokimq {
namespace detail {
/// Compile-time generated lookup tables hex conversion
struct hex_table {
char from_hex_lut[256];
char to_hex_lut[16];
constexpr hex_table() noexcept : from_hex_lut{}, to_hex_lut{} {
for (unsigned char c = 0; c < 10; c++) {
from_hex_lut[(unsigned char)('0' + c)] = 0 + c;
to_hex_lut[ (unsigned char)( 0 + c)] = '0' + c;
}
for (unsigned char c = 0; c < 6; c++) {
from_hex_lut[(unsigned char)('a' + c)] = 10 + c;
from_hex_lut[(unsigned char)('A' + c)] = 10 + c;
to_hex_lut[ (unsigned char)(10 + c)] = 'a' + c;
}
}
constexpr char from_hex(unsigned char c) const noexcept { return from_hex_lut[c]; }
constexpr char to_hex(unsigned char b) const noexcept { return to_hex_lut[b]; }
} constexpr hex_lut;
} // namespace detail
/// Creates hex digits from a character sequence.
template <typename InputIt, typename OutputIt>
void to_hex(InputIt begin, InputIt end, OutputIt out) {
for (; begin != end; ++begin) {
auto c = *begin;
*out++ = detail::hex_lut.to_hex((c & 0xf0) >> 4);
*out++ = detail::hex_lut.to_hex(c & 0x0f);
}
}
/// Creates a hex string from an iterable, std::string-like object
template <typename String>
std::string to_hex(const String& s) {
std::string hex;
hex.reserve(s.size() * 2);
to_hex(s.begin(), s.end(), std::back_inserter(hex));
return hex;
}
/// Returns true if all elements in the range are hex characters
template <typename It>
constexpr bool is_hex(It begin, It end) {
for (; begin != end; ++begin) {
if (detail::hex_lut.from_hex(*begin) == 0 && *begin != '0')
return false;
}
return true;
}
/// Returns true if all elements in the string-like value are hex characters
template <typename String>
constexpr bool is_hex(const String& s) { return is_hex(s.begin(), s.end()); }
/// Convert a hex digit into its numeric (0-15) value
constexpr char from_hex_digit(unsigned char x) noexcept {
return detail::hex_lut.from_hex(x);
}
/// Constructs a byte value from a pair of hex digits
constexpr char from_hex_pair(unsigned char a, unsigned char b) noexcept { return (from_hex_digit(a) << 4) | from_hex_digit(b); }
/// Converts a sequence of hex digits to bytes. Undefined behaviour if any characters are not in
/// [0-9a-fA-F] or if the input sequence length is not even. It is permitted for the input and
/// output ranges to overlap as long as out is no earlier than begin.
template <typename InputIt, typename OutputIt>
void from_hex(InputIt begin, InputIt end, OutputIt out) {
using std::distance;
assert(distance(begin, end) % 2 == 0);
while (begin != end) {
auto a = *begin++;
auto b = *begin++;
*out++ = from_hex_pair(a, b);
}
}
/// Converts hex digits from a std::string-like object into a std::string of bytes. Undefined
/// behaviour if any characters are not in [0-9a-fA-F] or if the input sequence length is not even.
template <typename String>
std::string from_hex(const String& s) {
std::string bytes;
bytes.reserve(s.size() / 2);
from_hex(s.begin(), s.end(), std::back_inserter(bytes));
return bytes;
}
}

1261
lokimq/lokimq.cpp

File diff suppressed because it is too large

771
lokimq/lokimq.h

@ -0,0 +1,771 @@
// Copyright (c) 2019-2020, The Loki Project
//
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are
// permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice, this list
// of conditions and the following disclaimer in the documentation and/or other
// materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors may be
// used to endorse or promote products derived from this software without specific
// prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#pragma once
#include "zmq.hpp"
#include <string>
#include <list>
#include <queue>
#include <unordered_map>
#include <memory>
#include <functional>
#include <thread>
#include <mutex>
#include <iostream>
#include <chrono>
#include <atomic>
#include "bt_serialize.h"
#include "string_view.h"
namespace lokimq {
using namespace std::literals;
/// Logging levels passed into LogFunc
enum class LogLevel { trace, debug, info, warn, error, fatal };
/// Authentication levels for command categories and connections
enum class AuthLevel {
denied, ///< Not actually an auth level, but can be returned by the AllowFunc to deny an incoming connection.
none, ///< No authentication at all; any random incoming ZMQ connection can invoke this command.
basic, ///< Basic authentication commands require a login, or a node that is specifically configured to be a public node (e.g. for public RPC).
admin, ///< Advanced authentication commands require an admin user, either via explicit login or by implicit login from localhost. This typically protects administrative commands like shutting down, starting mining, or access sensitive data.
};
/// The access level for a command category
struct Access {
/// Minimum access level required
AuthLevel auth = AuthLevel::none;
/// If true only remote SNs may call the category commands
bool remote_sn = false;
/// If true the category requires that the local node is a SN
bool local_sn = false;
};
/// Return type of the AllowFunc: this determines whether we allow the connection at all, and if
/// so, sets the initial authentication level and tells LokiMQ whether the other hand is an
/// active SN.
struct Allow {
AuthLevel auth = AuthLevel::none;
bool remote_sn = false;
};
class LokiMQ;
/// Encapsulates an incoming message from a remote connection with message details plus extra
/// info need to send a reply back through the proxy thread via the `reply()` method. Note that
/// this object gets reused: callbacks should use but not store any reference beyond the callback.
class Message {
public:
LokiMQ& lokimq; ///< The owning LokiMQ object
std::vector<string_view> data; ///< The provided command data parts, if any.
string_view pubkey; ///< The originator pubkey (32 bytes)
bool service_node; ///< True if the pubkey is an active SN (note that this is only checked on initial connection, not every received message)
/// Constructor
Message(LokiMQ& lmq) : lokimq{lmq} {}
// Non-copyable
Message(const Message&) = delete;
Message& operator=(const Message&) = delete;
/// Sends a reply. Arguments are forwarded to send() but with send_option::optional{} added
/// if the originator is not a SN. For SN messages (i.e. where `sn` is true) this is a
/// "strong" reply by default in that the proxy will attempt to establish a new connection
/// to the SN if no longer connected. For non-SN messages the reply will be attempted using
/// the available routing information, but if the connection has already been closed the
/// reply will be dropped.
///
/// If you want to send a non-strong reply even when the remote is a service node then add
/// an explicit `send_option::optional()` argument.
template <typename... Args>
void reply(const std::string& command, Args&&... args);
};
/** The keep-alive time for a send() that results in a establishing a new outbound connection. To
* use a longer keep-alive to a host call `connect()` first with the desired keep-alive time or pass
* the send_option::keep_alive.
*/
static constexpr auto DEFAULT_SEND_KEEP_ALIVE = 30s;
/// Maximum length of a category
static constexpr size_t MAX_CATEGORY_LENGTH = 50;
/// Maximum length of a command
static constexpr size_t MAX_COMMAND_LENGTH = 200;
/**
* Class that handles LokiMQ listeners, connections, proxying, and workers. An application
* typically has just one instance of this class.
*/
class LokiMQ {
private:
/// The global context
zmq::context_t context;
/// A unique id for this LokiMQ instance, assigned in a thread-safe manner during construction.
const int object_id;
/// The x25519 keypair of this connection. For service nodes these are the long-run x25519 keys
/// provided at construction, for non-service-node connections these are generated during
/// construction.
std::string pubkey, privkey;
/// True if *this* node is running in service node mode (whether or not actually active)
bool local_service_node = false;
/// Addresses on which to listen, or empty if we only establish outgoing connections but aren't
/// listening.
std::vector<std::string> bind;