Commit graph

55 commits

Author SHA1 Message Date
Jason Rhinelander a3cb7bc980 Improve logging output
- Show [filename:line] rather than __func__ because __func__ is useless
  when called from a lambda (it just shows `operator()`).
- Add time since startup (like lokinet does)
- Fix up log formatter so that it doesn't have to double-format and
  doesn't break when given a value that isn't a string literal for the
  format string.
- Calling via ->log instead of ->debug, etc. requires changing
  `OXEN_LOG(error, ...)` to `OXEN_LOG(err, ...)` to match the actual
  spdlog log level (which is different from the logging method for some
  reason).
2021-06-16 18:59:06 -03:00
Jason Rhinelander e644959f83 Logger: use file/line instead of func
__func__ is pretty useless as all lambdas are reported as `operator()`
so you have no indication at all where the log msg is coming from.
2021-06-16 18:59:06 -03:00
Jason Rhinelander c488e86141 Move all vendors cmake code to vendors/CMakeLists.txt
Also modernizes it a bit more to use cmake interfaces, plus fixes an
oxenmq linking issue.
2021-06-16 18:58:53 -03:00
Jason Rhinelander 60017e7e4d user pubkey tighter restrictions
- Enforce hex rather than accepting any random 66- or 64-character
  string as a pubkey
- Clean up pubkey -> integer code
- The cleanup fixes a bug where pubkey -> integer conversion was
  skipping the first two bytes on testnet (and ended up in UB by reading
  the null + one byte beyond the end of the string for testnet
  addresses).  THIS WILL BREAK EXISTING TESTNET PUBKEY->SWARM VALUES!
  (but it's only testnet, so that's okay).
2021-04-19 16:14:55 -03:00
Jason Rhinelander 16ec9ac0f2 Misc cleanups and optimizations 2021-04-18 14:50:40 -03:00
Jason Rhinelander 8e3683df2f Force logging colours on stdout
Typically storage server output goes to systemd, which passes it off to
the journal; journalctl knows how to show (-a) ansi colors, and also
knows how to strip them when you don't provide -a.
2021-04-18 14:50:40 -03:00
Jason Rhinelander 54c05ca159 Fix cmake library dependencies 2021-04-18 14:50:40 -03:00
Jason Rhinelander d5876d9647 Simplification
No need to bifurcate rvalue and const-lvalue versions here: a plain
value will work exactly the same (copying or moving based on what the
caller provides).
2021-04-18 14:50:40 -03:00
Jason Rhinelander 8d34f76002 Storage server refactoring & ping reporting redesign
Refactoring:

- Overhaul how pubkeys are stored: rather that storing in strings in multiple forms (hex, binary,
  base32) we now store each pubkey type (legacy, ed25519, x25519) in a type-safe container which
  extends an std::array, thus giving us allocation-free storage for them.
  - Do conversion into these types early on so that *most* of the code takes type-safe pubkeys
    rather than strings in random formats; thus making the public API (e.g. HTTP request parser)
    deal with encoding/decoding, rather than making it happen deep down.  (For example,
    ChannelEncryption shouldn't need to worry about decoding a hex string).  This was pretty messy:
    some code wanted hex, some base32z, some base32z with ".snode" appended, some base64.
- When printing pubkeys in logs, print them as hex, not base32z.  Base32z was really hard to
  reconcile with pubkeys because everywhere *else* we show them as hex.
- Overhaul sn_record_t with a much lighter one: it is now a very simple struct with just the members
  it needs (ip, ports, pubkeys), and is no longer hashable (instead we can hash on the
  .legacy_pubkey member).
  - Simplify some interfaces taking multiple values from sn_record_t by just passing the sn_record_t
- Moved a bunch of things that were in the global namespace into the oxen namespace.
- Simplify swarm storage and lookup: we previous had various methods that did a linear scan on all
  active nodes, and could do string-based searching for different representations of the pubkey
  (hex, base32z, etc.).  Replace it all with a simple structure of:
    unordered_map<legacy_pubkey, sn_record_t>
    unordered_map<ed25519_pubkey, legacy_pubkey>
    unordered_map<x25519_pubkey, legacy_pubkey>
  where the first map holds the entries and the latter two point us to the key in the first one.
- De-templatize ChannelEncryption, and make it take pubkey types rather than strings.  (The template
  was doing nothing as it was only ever used with T=std::string).
- Fix a leak in ChannelEncryption CBC decryption if it throws (the context would leak) by storing
  the context in a unique_ptr with a deleter that frees the context.
- Optimized ChannelEncryption encryption code somewhat by reducing allocations via more use of
  string_views and tweaking how we build the encrypted strings.
- Fix legacy (i.e. Monero) signature generation: the random byte value being generated in was only
  setting the first 11 bytes of 32.
- Miscellaneous code cleanups (much of which are C++14/17 syntax).
- Moved std::literals namespace import to a small number of top-level headers so they are available
  everywhere.  (std::literals is guaranteed to never conflict with user-space literals precisely so
  that doing this everywhere is perfectly safe without polluting -- i.e. "foo"sv, 44s can *never* be
  anything other than string_view and seconds literals).
- Made pubkey parsing (e.g. in HTTP headers) accept any of hex/base32z/base64 so that we can, at
  some point in the future, just stop using base32z representation (since it conflicts badly with
  lokinet .snode address which is based on the ed25519 pubkey, not the legacy pubkey).
- RateLimiter - simply it to take the IP as a uint32_t (and thus avoid allocations).  (Similarly it
  avoids allocations in the SN case via using the new pubkey type).
- Move some repeated tasks into the simpler oxenmq add_timer().

Ping reporting:

This completely rewrites how ping results are handled.

Current situation: SS does its own testing of other SS.  Every 10s it picks some random node and
sends an HTTP ping and OMQ ping to it.  If they fail either it tracks the failure and tries them
again every 10s.  If they are still failing after 2h, then it finally tells oxend about the failure
one time, remembers that it told oxend (in memory), and then never tells it again until the remote
SS starts responding to pings again; once that happens it tells oxend that it is responding again
and then never tells it anything else unless the remote starts failing again for 2h.

This has some major shortcomings:
- The 10s repeat will hammer a node pretty hard with pings, because if it is down for a while, most
  of the network will notice and be pinging it every 10s.  That means 1600x2 incoming requests every
  10s, which is pretty excessive.
- Oxend reporting edge case 1: If a node is bad and then storage server restarts, SS won't have it
  in its "bad list" anymore, so it isn't testing it at all (until it gets selected randomly, but
  with 1600 nodes that is going to be an average of more than 2 hours and could be way longer).
  Thus oxend never gets the "all good" signal and will continue to think the node is bad for much,
  much longer than it actually is.  (In fact, it may *never* get a good signal again if it's working
  the next time SS randomly pings it).
- Restarts the other way are also a problem: when oxend restarts it doesn't know of any of the bad
  nodes anymore, but since SS only tells it once, it never learns about it and thinks it's good.
- `oxend print_sn <PUBKEY>` is much, much less useful than it could be: usually storage servers are
  in "awaiting first result" status because SS won't tell oxend anything until it has decided there
  is some failure.  When it tells you "last ping was .... ago" that's also completely useless
  because SS never reports any pings except for the first >2h bad result, and the first good result.

I suspect the reporting above was out of a concern than talking to oxend rpc too much would overload
it; that isn't the case anymore (and since SS is now using a persistent oxenmq connection the
requests are *extremely* fast since it doesn't even have to establish a connection).

So this PR overhauls it completely as follows:
- SS gets much dumber (i.e. simpler) w.r.t. pings: all it does is randomly pick probably-good nodes
  to test every 10s, and then pings known-failing nodes to re-test them.
- Retested nodes don't get pounded every 10s, instead they get the first retry after 10s, the second
  retry 20s after that, then 30s after that, and so on up to 5 minute intervals between re-tests.
- SS tells oxend *every* ping result that it gets (and doesn't track them, except to keep them or
  remove them from the "bad" list)
- Oxend then becomes responsible for deciding when a SS is bad enough to fail proofs.  On the oxend
  side the rule is:
    - if we have been receiving bad ping reports about a node from SS for more than 1h5min without
      any good ping in that time *and* we received a bad ping in the past 10 minutes then we
      consider it bad.  (the first condition is so that it has to have been bad for more than an
      hour, and the second condition is to ensure that SS is still sending us bad test results).
    - otherwise we consider it good (i.e. because either we aren't getting test results or because
      we're getting good test results).
    - Thus oxend can useful and accurately report the last time some storage server was tested,
      which allows much better diagnostics of remote SN status.
- Thus if oxend restarts it'll start getting the bad results right away, and if SS restarts oxend
  will stop getting them (and then fall back to "no info means good").
2021-04-18 14:50:40 -03:00
Maxim Shishmarev d913217fad No longer require POW for message storage 2021-03-26 12:01:47 +11:00
Jason Rhinelander 0578787228 LokiMQ -> OxenMQ rename (and update to 1.2.3) 2021-01-18 15:58:16 -04:00
Sean Darcy d23c8417aa initial rebrand 2021-01-07 15:12:15 +11:00
Jason Rhinelander 708e7a1d18 Remove cmake cruft
- loki_add_subdirectory was a unnecessary wrapper around
add_subdirectory: it was an attempt to make an idempotent version of
add_subdirectory, but that isn't needed at all and just adds cruft: the
top-level CMakeLists.txt already includes all the subdirectories so we
can just trust it.

- removed incorrect subdirectory "project()" definitions.

- crypto/CMakeLists.txt pointlessly listed all headers in the source
list.

- set c++ standard in the top-level makefile instead of on each target
since we intentionally want it everywhere.

- Linking directly to pthread/dl with conditional OS checks was wrong;
fix it to be proper cmake (linking to Threads::Threads and
${CMAKE_DL_LIBS}).

- Various cmake files erroneously listed their src directories in their
include paths.

- Made various library linkages PRIVATE instead of PUBLIC where a
transient dependency to dependent targets does not make sense.
2021-01-05 18:27:55 -04:00
Jason Rhinelander b8bc84d1e1 Remove boost::{filesystem,chrono,thread}
Replace them with their std:: version.
2021-01-05 17:32:01 -04:00
Jason Rhinelander cd5e4252a8 Modernize boost cmake dependencies 2021-01-05 17:30:13 -04:00
Maxim Shishmarev 4cdd074dc6 Limit http get_stats to return version number only 2020-09-23 17:42:45 +10:00
Maxim Shishmarev 29b2658ca5 run clang-format 2020-09-22 12:47:03 +10:00
Maxim Shishmarev 30f2b7493b Add semi-binary protocol for onion requests 2020-09-21 17:23:06 +10:00
Jason Rhinelander 28f09402e6 Switch to C++17
- updates lokimq to dev branch
- changes compilation mode to C++17 (which is now required by lokimq,
  and already widely applied in lokinet and lokid dev branches)
- replace lokimq::string_view with std::string_view
- replace boost::optional with std::optional, except for:
  - boost::optional<std::function<...>> doesn't need optional at all
    because a std::function<...> is already nullable.
  - boost::optional<T&> isn't supported by std::optional because it
    makes little sense (it is just a `T*`) so just switch to T* instead.
2020-08-08 02:21:33 -03:00
Maxim Shishmarev 1b0532211c
Merge pull request #364 from jagerman/cmake-header-cleanup
Remove headers from cmake add_library
2020-04-14 16:02:27 +10:00
Maxim Shishmarev b9ff7f0800 Notify operator when their node seems unreachable 2020-04-14 15:59:32 +10:00
Jason Rhinelander f01db5bb2f Remove headers from cmake add_library
Headers aren't supposed to be listed in `add_library` calls and are an
unfortunately common cmake anti-pattern.  (cmake does *not* need headers
listed to know how to check that things need rebuilding when a header
changes, which seems to be the reason people think they have to include
them).

Apparently this antipattern emerged partly because of buggy behaviour in
MSVC pre-2017 that didn't understand how to find headers when loading a
CMake project.
2020-04-13 16:19:52 -03:00
Maxim Shishmarev 964ce644a1 Revert temp changes 2020-04-09 10:55:58 +10:00
Maxim Shishmarev f5eeb3b7dc flush on trace 2020-04-08 17:13:46 +10:00
Maxim Shishmarev bfc4f25998 Improve error handling in proxy requests 2020-04-06 16:13:41 +10:00
Maxim Shishmarev 4da932cc44 Send lokimq port in pings to lokid; obtain lokimq ports for other nodes from lokid 2020-03-11 12:26:22 +11:00
Maxim Shishmarev 330581b0d4 Initial lokimq integration. Onion requests to SS. 2020-03-08 21:38:51 +11:00
Jason Rhinelander 923c40e834 Fix spdlog compatibility
Upstream spdlog moves the instance and the absolute namespace qualifier
here breaks the code; this fixes it by going through the base class
which will work for both the bundled and newer spdlog versions.

Also changes the `fmt::memory_buffer` to `spdlog::memory_buf_t` because
the former doesn't work with newer libfmt's (which upstream spdlog and
the devendored debian sid build now use).
2020-02-26 15:41:39 -04:00
Maxim Shishmarev 04cb08b728 Bug fixes 2019-12-20 16:48:11 +11:00
Maxim Shishmarev d28a09b28a Incorporate new keys 2019-12-12 13:48:49 +11:00
Jason Rhinelander 0a4ad82f96 Don't terminate two-arg LOKI_LOG with ;
The semicolon made the macro not a single statement (unlike the 3+
argument version), so code such as

    if (asdf)
        LOKI_LOG(critical, "whatever");
    else
        LOKI_LOG(critical, "something else");

was a syntax error because of the expanded double-semicolon which made
the "else" not follow a single-statement if and thus invalid.
2019-12-03 13:03:49 -04:00
Maxim Shishmarev 61cefaebc6 Use singleton 2019-09-09 15:24:08 +10:00
Maxim Shishmarev f8404d31aa Resolve conflicts and clean up 2019-09-09 12:52:11 +10:00
Beaudan c34b79c6ef Add testnet flag and set it on service_node
Use different data dir and seed nodes for testnet

Use testnet flag to change the size of valid pubkey length

Move to a global flag for testnet instead of service node proptery. Fix compile issues and some other stuff, still will be issues to fix
2019-09-09 12:52:11 +10:00
Maxim Shishmarev d6633a23c8 Address review comments 2019-09-03 12:58:40 +10:00
Maxim Shishmarev 69aa778a70 Make storage server aware of decommissioned nodes; test them 2019-09-02 17:40:17 +10:00
Maxim Shishmarev a61195c192 Bookkeeping for reachability of nodes; reporting them to Lokid 2019-09-02 11:32:31 +10:00
Maxim Shishmarev d05e65e1c2 Periodically ping other service nodes 2019-08-26 17:29:38 +10:00
Maxim Shishmarev 5d09fbe617 Separate type for user pubkey 2019-08-23 11:58:16 +10:00
Maxim Shishmarev 2e763fb1a1 address review suggestions 2019-08-09 10:29:11 +10:00
Maxim Shishmarev 1860248061 Expose error logs via get_logs endpoint 2019-08-07 17:57:07 +10:00
Maxim Shishmarev 23b6b0e087 Clean-up log messages 2019-08-07 17:57:07 +10:00
Maxim Shishmarev 923150ccc1 add function name to every log message 2019-07-26 15:45:20 +10:00
Beaudan Campbell-Brown 966c60f0bd Bootstrap ips (#213)
* Initial swarm bootstrapping from seed nodes

* Bootstrap the IPs when we have finished syncing, plus don't overwrite valid IPs with defaults

* Review fixed plus lint
2019-07-05 16:15:47 +10:00
Maxim Shishmarev 374a14039e Deactivate nodes when they get decommissioned; plus some refactoring 2019-07-03 14:29:28 +10:00
sachaaaaa f3b33c30bb newline at EOF 2019-07-03 12:55:15 +10:00
sachaaaaa 78ce030c49 remove custom_formatter.h and provide << overload for sn_record_t 2019-07-03 11:17:35 +10:00
sachaaaaa e6c9876902 Actually use log level 2019-07-02 17:05:49 +10:00
sachaaaaa f66d20e2c0 clang format and add /common/* to make format 2019-07-02 17:05:49 +10:00
sachaaaaa 0eac8bbefb declutter main: extract command line parser logic and logging stuff 2019-07-02 16:43:54 +10:00