Commit graph

547 commits

Author SHA1 Message Date
Jason Rhinelander 0dabccfa80 Avoid whining about no pings immediately after startup 2021-04-18 14:50:40 -03:00
Jason Rhinelander 89d881a1dc Fewer unnecessary shared_ptrs
Some API improvements:

- When we need to copy a shared_ptr in a function, taking it by
  `const shared_ptr&` forces an extra atomic increment (and then
  decrement at the call site).  Change these to plain `shared_ptr` so
  that the caller can decide whether they want to move (avoiding any
  increment/decrement) or copy.
- When we *don't* need the shared_ptr container at all pass by reference
  to the contained value instead because this pushes the responsibility
  that the pointer not be null back to the caller.
2021-04-18 14:50:40 -03:00
Jason Rhinelander fa3a09949e Use lokinet .snode address for Host header
Rather than sending "Host: service node" or "Host: service-node" we now
send "Host: whatever.snode" or "Host: service-node.snode" if we don't
have the ed25519 key yet.
2021-04-18 14:50:40 -03:00
Jason Rhinelander 26e9413149 Cleanups
- move stuff into oxen namespace
- remove unused constexpr header values
- de-templatize << sn_response_t and move to .cpp
2021-04-18 14:50:40 -03:00
Jason Rhinelander f43c2c77a9 Remove dead code (FailedRequestHandler) 2021-04-18 14:50:40 -03:00
Jason Rhinelander 575a09fc0b Fix oxend ping arguments 2021-04-18 14:50:40 -03:00
Jason Rhinelander 018be6c1b0 Fix logic inversion on is_ip_public 2021-04-18 14:50:40 -03:00
Jason Rhinelander ab609ef938 Properly rate limit random testing
It wasn't updating the next test timestamp.
2021-04-18 14:50:40 -03:00
Jason Rhinelander a02fea6542 Fix report json parsing 2021-04-18 14:50:40 -03:00
Jason Rhinelander 9faa9e4b2d Make the default oxend rpc understand /var/lib/oxen
If our home dir is /var/lib/oxen then the default sock should be
/var/lib/oxen/oxend.sock (or /var/lib/oxen/testnet/oxend.sock).
2021-04-18 14:50:40 -03:00
Jason Rhinelander 31a6c6dafa Use ephemeral routing id to talk to oxend
Using a non-ephemeral address caused problems if multiple connections
were trying to talk to oxend using its own pubkey which can happen if
something else (oxend, rpc, another SS) tries to talk to the same oxend
instance.
2021-04-18 14:50:40 -03:00
Jason Rhinelander 68fef509bd Make key logging on startup better
- Don't hide the legacy key just because it's the same as ed25519
- Show lokinet addr as its own entry (it is the ed25519 key in base32z,
but since storage server uses base32z in other places, better to be
clear what we're showing here).
2021-04-18 14:50:40 -03:00
Jason Rhinelander 0aea2563fe Misc logging fixes 2021-04-18 14:50:40 -03:00
Jason Rhinelander 09742a64e3 Make integration test mode more obvious
Print a warning during startup, and handling the missing key options
gracefully (rather than dumping an exception to the overall main()
exception handler).

Print a critical error if the integration test options are missing or
invalid; without this an exception is thrown but it isn't obvious where
it comes from (especially if you forgot you compiled with integration
tests enabled).
2021-04-18 14:50:40 -03:00
Jason Rhinelander 5ac1206d31 Expose OMQ logger wrapper to temp OMQ instance 2021-04-18 14:50:40 -03:00
Jason Rhinelander 5be297ad55 Integration test compilation fix 2021-04-18 14:50:40 -03:00
Jason Rhinelander b2dfb20b1f Add fix and tests for ports not getting updated 2021-04-18 14:50:40 -03:00
Jason Rhinelander 8d34f76002 Storage server refactoring & ping reporting redesign
Refactoring:

- Overhaul how pubkeys are stored: rather that storing in strings in multiple forms (hex, binary,
  base32) we now store each pubkey type (legacy, ed25519, x25519) in a type-safe container which
  extends an std::array, thus giving us allocation-free storage for them.
  - Do conversion into these types early on so that *most* of the code takes type-safe pubkeys
    rather than strings in random formats; thus making the public API (e.g. HTTP request parser)
    deal with encoding/decoding, rather than making it happen deep down.  (For example,
    ChannelEncryption shouldn't need to worry about decoding a hex string).  This was pretty messy:
    some code wanted hex, some base32z, some base32z with ".snode" appended, some base64.
- When printing pubkeys in logs, print them as hex, not base32z.  Base32z was really hard to
  reconcile with pubkeys because everywhere *else* we show them as hex.
- Overhaul sn_record_t with a much lighter one: it is now a very simple struct with just the members
  it needs (ip, ports, pubkeys), and is no longer hashable (instead we can hash on the
  .legacy_pubkey member).
  - Simplify some interfaces taking multiple values from sn_record_t by just passing the sn_record_t
- Moved a bunch of things that were in the global namespace into the oxen namespace.
- Simplify swarm storage and lookup: we previous had various methods that did a linear scan on all
  active nodes, and could do string-based searching for different representations of the pubkey
  (hex, base32z, etc.).  Replace it all with a simple structure of:
    unordered_map<legacy_pubkey, sn_record_t>
    unordered_map<ed25519_pubkey, legacy_pubkey>
    unordered_map<x25519_pubkey, legacy_pubkey>
  where the first map holds the entries and the latter two point us to the key in the first one.
- De-templatize ChannelEncryption, and make it take pubkey types rather than strings.  (The template
  was doing nothing as it was only ever used with T=std::string).
- Fix a leak in ChannelEncryption CBC decryption if it throws (the context would leak) by storing
  the context in a unique_ptr with a deleter that frees the context.
- Optimized ChannelEncryption encryption code somewhat by reducing allocations via more use of
  string_views and tweaking how we build the encrypted strings.
- Fix legacy (i.e. Monero) signature generation: the random byte value being generated in was only
  setting the first 11 bytes of 32.
- Miscellaneous code cleanups (much of which are C++14/17 syntax).
- Moved std::literals namespace import to a small number of top-level headers so they are available
  everywhere.  (std::literals is guaranteed to never conflict with user-space literals precisely so
  that doing this everywhere is perfectly safe without polluting -- i.e. "foo"sv, 44s can *never* be
  anything other than string_view and seconds literals).
- Made pubkey parsing (e.g. in HTTP headers) accept any of hex/base32z/base64 so that we can, at
  some point in the future, just stop using base32z representation (since it conflicts badly with
  lokinet .snode address which is based on the ed25519 pubkey, not the legacy pubkey).
- RateLimiter - simply it to take the IP as a uint32_t (and thus avoid allocations).  (Similarly it
  avoids allocations in the SN case via using the new pubkey type).
- Move some repeated tasks into the simpler oxenmq add_timer().

Ping reporting:

This completely rewrites how ping results are handled.

Current situation: SS does its own testing of other SS.  Every 10s it picks some random node and
sends an HTTP ping and OMQ ping to it.  If they fail either it tracks the failure and tries them
again every 10s.  If they are still failing after 2h, then it finally tells oxend about the failure
one time, remembers that it told oxend (in memory), and then never tells it again until the remote
SS starts responding to pings again; once that happens it tells oxend that it is responding again
and then never tells it anything else unless the remote starts failing again for 2h.

This has some major shortcomings:
- The 10s repeat will hammer a node pretty hard with pings, because if it is down for a while, most
  of the network will notice and be pinging it every 10s.  That means 1600x2 incoming requests every
  10s, which is pretty excessive.
- Oxend reporting edge case 1: If a node is bad and then storage server restarts, SS won't have it
  in its "bad list" anymore, so it isn't testing it at all (until it gets selected randomly, but
  with 1600 nodes that is going to be an average of more than 2 hours and could be way longer).
  Thus oxend never gets the "all good" signal and will continue to think the node is bad for much,
  much longer than it actually is.  (In fact, it may *never* get a good signal again if it's working
  the next time SS randomly pings it).
- Restarts the other way are also a problem: when oxend restarts it doesn't know of any of the bad
  nodes anymore, but since SS only tells it once, it never learns about it and thinks it's good.
- `oxend print_sn <PUBKEY>` is much, much less useful than it could be: usually storage servers are
  in "awaiting first result" status because SS won't tell oxend anything until it has decided there
  is some failure.  When it tells you "last ping was .... ago" that's also completely useless
  because SS never reports any pings except for the first >2h bad result, and the first good result.

I suspect the reporting above was out of a concern than talking to oxend rpc too much would overload
it; that isn't the case anymore (and since SS is now using a persistent oxenmq connection the
requests are *extremely* fast since it doesn't even have to establish a connection).

So this PR overhauls it completely as follows:
- SS gets much dumber (i.e. simpler) w.r.t. pings: all it does is randomly pick probably-good nodes
  to test every 10s, and then pings known-failing nodes to re-test them.
- Retested nodes don't get pounded every 10s, instead they get the first retry after 10s, the second
  retry 20s after that, then 30s after that, and so on up to 5 minute intervals between re-tests.
- SS tells oxend *every* ping result that it gets (and doesn't track them, except to keep them or
  remove them from the "bad" list)
- Oxend then becomes responsible for deciding when a SS is bad enough to fail proofs.  On the oxend
  side the rule is:
    - if we have been receiving bad ping reports about a node from SS for more than 1h5min without
      any good ping in that time *and* we received a bad ping in the past 10 minutes then we
      consider it bad.  (the first condition is so that it has to have been bad for more than an
      hour, and the second condition is to ensure that SS is still sending us bad test results).
    - otherwise we consider it good (i.e. because either we aren't getting test results or because
      we're getting good test results).
    - Thus oxend can useful and accurately report the last time some storage server was tested,
      which allows much better diagnostics of remote SN status.
- Thus if oxend restarts it'll start getting the bad results right away, and if SS restarts oxend
  will stop getting them (and then fall back to "no info means good").
2021-04-18 14:50:40 -03:00
Jason Rhinelander 18a3906c47 Use oxenmq 1.2.4's send_later() mechanism
send_later() was added exactly for this case because app codes shouldn't
have to know about the internal omq protocol layer messages.
2021-04-13 01:19:50 -03:00
Jason Rhinelander 66f39d0590 Remove dead forward declarations 2021-04-13 01:19:50 -03:00
Maxim Shishmarev 414245193d Remove jsonrpc fluff 2021-04-13 11:54:06 +10:00
Maxim Shishmarev fa5f451fa3 Allow proxying certain requests to oxend 2021-04-13 11:18:29 +10:00
Maxim Shishmarev c299ff7d44 Fix build 2021-04-09 11:16:51 +10:00
Jason Rhinelander 96d5843826 Removed unused worker_ioc_ 2021-04-08 21:26:11 -03:00
Jason Rhinelander 96bb02aa75 Replace Oxen HTTP RPC with OMQ RPC
This replaces the HTTP RPC interface for communicating with the local
oxend with OxenMQ-based RPC requests.  This has advantages:

- Persistent, single connection rather than opening new connections
every time we need anything.
- We don't have to poll for new blocks every 1s: rather we can use
`sub.block` to have oxend push notifications to us when new blocks are
added; previously we had excessive (1s) polling which we needed to be
sure we noticed new blocks; now we avoid all of that *and* learn about
new blocks faster (since we get pushed notifications).
- Allows better host security (by protecting access to the unix socket
rather than needing a localhost port)
- Matches lokinet, which moved to use OMQ RPC in the last hard fork
(0.8).

Currently we still need HTTP JSON RPC requests to contact the bootstrap
nodes but that will eventually change as well.
2021-04-08 21:26:10 -03:00
Jason Rhinelander 60b18518b0 Remove storage-server blockchain testing
This is a mis-feature in the wrong place that doesn't protect against
anything malicious.

There is one comment (in oxen-core) that suggests the purpose here was
to be complex enough that the result couldn't be proxied to some other
node.  This does not accomplish that because:

- RPC requests are fast.  You *could* retrieve 1000 blocks quickly
enough from a high performance public RPC node to calculate the
checksum.
- Even if you couldn't, you could *easily* hack up oxend to proxy the
entire "test" request to some other node, thereby allowing you to run
multiple SNs without actually needing to store the blockchain.
- The testee gets I/O-trashed by having to go look up random blocks to
calculate the checksum for a test that really isn't useful.
- It is possible to abuse the blockchain test result feature by spamming
other storage servers to make other nodes waste significant resources to
compute these blockchain tests for random heights.
- The tests aren't actually *used* for anything: if you fail a test,
nothing happens.

Most importantly of all, this should never have been in storage server:
it has nothing at all to do with storing files, and is entirely outside
storage-server's purview to perform any such blockchain test: rather
that belongs in oxend (if it were to be performed at all).
2021-04-08 21:25:14 -03:00
Jason Rhinelander 71e41ca034 Remove long-obsolete DNS version check
This feature has proven itself broken: it hasn't been updated since the
first public version of storage server and, as a process requiring
manual updating, would continue to fail even if we updated it now.  We
have better ways to let people know of new versions now (by oxend
enforcing it and public announcements).
2021-04-08 21:25:14 -03:00
Maxim Shishmarev 41b434d6b2 Require loki or oxen prefix in server urls 2021-04-08 14:47:44 +10:00
Maxim Shishmarev 305d73d62f Merge branch 'dev' into onions-to-ip 2021-04-08 14:46:47 +10:00
Maxim Shishmarev ca21cb288e
Merge pull request #413 from msgmaxim/remove_pow
No longer require POW for message storage
2021-04-08 12:30:54 +10:00
Maxim Shishmarev 665e943f5b Address review comments 2021-04-08 12:26:00 +10:00
Jason Rhinelander 466babe353 Remove unwanted endian conversion
address_v4::to_ulong() already does the desired conversion to host
order before returning the value.
2021-04-07 20:21:53 -03:00
Maxim Shishmarev d9cb5cc473 Address review comments 2021-03-30 17:06:56 +11:00
Maxim Shishmarev f36bf751dd Allow http in onion requests to an external server 2021-03-29 17:28:34 +11:00
Maxim Shishmarev 92de289086 Increase limit for the number of messages retrieved at once 2021-03-26 15:00:32 +11:00
Maxim Shishmarev d913217fad No longer require POW for message storage 2021-03-26 12:01:47 +11:00
Jason Rhinelander c4efcf36b5 Reduce oxend ping interval to match lokinet/oxend
Currently lokinet pings with a 30s interval, and oxend in the next
release will use a similar interval for internal checks for when it
should send uptime proofs; this unifies storage server to use the same
ping frequency.
2021-03-24 20:17:39 -03:00
Maxim Shishmarev 04254433fb
Merge pull request #410 from msgmaxim/https_request
Set SNI correctly in outgoing https requests
2021-03-23 11:58:03 +11:00
Maxim Shishmarev 07f005d928 Use a valid host name for service nodes 2021-03-23 09:38:30 +11:00
Maxim Shishmarev 5b32dbecb7 Correctly count updated ips 2021-03-23 09:35:43 +11:00
Maxim Shishmarev b67e9c134b Fix not updating ip addresses of snodes 2021-03-22 14:09:38 +11:00
Maxim Shishmarev 4a97312af6 Set SNI correctly in outgoing https requests 2021-03-22 10:20:09 +11:00
Jason Rhinelander 0578787228 LokiMQ -> OxenMQ rename (and update to 1.2.3) 2021-01-18 15:58:16 -04:00
Jason Rhinelander 165c78e49c Fix version info string 2021-01-18 15:34:55 -04:00
Jason Rhinelander 15270cc35a
Merge pull request #403 from jagerman/drone-ci
Static build system + drone CI integration
2021-01-18 13:36:38 -04:00
Jason Rhinelander 42087df696
Merge pull request #404 from jagerman/legacy-lokid-options
Make `lokid-rpc-ip` and `lokid-rpc-port` options still work
2021-01-18 13:36:09 -04:00
Jason Rhinelander abb08444d5 Revert X-Oxen-* header changes
Changing these break SS communications.
2021-01-17 00:15:47 -04:00
Jason Rhinelander 6a94bc3f2a Make oxend-rpc-ip actually work
It was wired up to the port value, which meant there was no way to
override the IP.
2021-01-16 17:12:29 -04:00
Jason Rhinelander ef78d3a7af Make lokid-rpc-ip and lokid-rpc-port options still work 2021-01-16 17:12:19 -04:00
Jason Rhinelander 22fab68954 Move version into CMakeLists.txt and generate version.cpp
Having the version available in CMakeLists.txt lets cmake use the
version (e.g. in the following commit which generates a tar file).

Moreover it improves ccache hits drastically as this no longer needs to
pass version/tags via defined and instead only the version.cpp file has
to be rebuilt if a git tag or version changes.

This also simplifies how the version gets parsed to use std::from_chars
and a std::array<uint16_t,3> rather than a struct, which lets `<` work
for version comparison.
2021-01-13 15:19:22 -04:00