Commit graph

80 commits

Author SHA1 Message Date
Jason Rhinelander 314894d4b1 Database redesign, refactor & migration
Redesigns the database to be a more appropriate, less duplicative design
using "owners" and "messages" with a foreign key between them.

Rewrites all the database code using SQLiteCpp which substantially
reduces the amount of boilerplate, duplicate code for query handling.

Makes the statement handlers thread_local for better thread safety; this
also allows the actual query to be where it is executed, rather than
having all the prepared queries in one place nowhere close to where they
are actually used.
2021-06-16 19:37:33 -03:00
Jason Rhinelander b7b0d75d4f SS refactor/cleanups
- Eliminated storage::Item because it just duplicates message_t for no
useful reason.

- Always pass user pubkeys as user_pubkey_t, never as std::strings.
Pubkeys are not strings, and the same pubkey has multiple string
representations.

- Properly split up the pubkey as [type][pubkey] where type = byte (5
for Session) and pubkey is a type-dependent key (currently only
supporting 32-byte Ed25519 keys).

- Allow pubkeys to be loaded from either hex (66) or binary (33) values.

- Allow pubkey prefixes on testnet.  A 32-byte key will be interpreted
as if the key was prefixed with a 0.  Thus there is now just one
"proper" pubkey size of 33 bytes or 66 bytes, and we silently accept the
missing-prefix for testnet (but can do away with the proper sizes being
different).

- Expose pubkey stringification in different ways (prefixed/unprefixed,
hex/raw).

- Simplify time_point-to-integer conversion with new time.hpp functions
from_epoch_ms and to_epoch_ms.

- Restore old message hash format so that we can keep using it until the
mandatory transition point.

- Added a much more network efficient serialization format, using
standard b-encoding rather than NIH custom encoding.  After the
mandatory upgrade we can remove the old format.

- Refactor swarm bootstrap code to be more efficient
2021-06-16 19:37:33 -03:00
Jason Rhinelander 4a514a107e Add SQLiteCpp
This is a nice C++ wrapper that cleans up the data interface
considerably over using the C sqlite3 API.

I also evaluated (and started implementing) sqlite_orm for this, but ran
into considerable obstacles: the orm components get in the way without
being good enough to really solve anything (and essentially just making
you write queries in C++ code that is much less elegant than straight
queries), plus it fundamentally doesn't support threaded operation,
which sucks.
2021-06-16 18:59:17 -03:00
Jason Rhinelander 8772aeaaeb Wire up remaining delete/expiry endpoints
- Further documents each endpoint

- Changes the expiry update endpoints to just return the hashes of
messages rather than hashes + timestamps as its quite a bit simpler
(especially for signing).

- implements delete_msgs/delete_before/expire_all/expire_msgs endpoint
processing logic.
2021-06-16 18:59:17 -03:00
Jason Rhinelander 339ad4a3c8 Make get_results return # rows, -1 on error
This lets it be used for queries that might (legitimately) return 0
rows.
2021-06-16 18:59:07 -03:00
Jason Rhinelander c28ed7f2b8 WIP: Wire up delete/expire rpc endpoints, part 1
Exposes the rpc functions and adds a preliminary distribution between
SNs.

- Adds bt-to-json conversion and expose the json-to-bt converter for us
with converting incoming json requests to bt-encoding (for inter-swarm
relaying), and back (for responding).
- Wire up database functions <-> service node calls <-> rpc endpoints.
- Add swarm command distribution.

(This is not yet working).
2021-06-16 18:59:07 -03:00
Jason Rhinelander 783612bc10 Database: implement delete/expiry-update calls 2021-06-16 18:59:07 -03:00
Jason Rhinelander 81f0638c16 Update required sqlite3 to 3.35.0
This lets us use RETURNING for updates/deletes which makes things much
nicer.
2021-06-16 18:59:07 -03:00
Jason Rhinelander 4e7a27f709 DRY out database & switch to smart pointers
Removes a bunch of repetition; switches raw pointers to smart pointers
for database and prepared statements.
2021-06-16 18:59:07 -03:00
Jason Rhinelander b136241659 Add delete/expiry and swarm propagation stubs 2021-06-16 18:59:07 -03:00
Jason Rhinelander e4c98a528a RPC: refactor and wire up to OMQ
Refactors storage rpc requests to abstract parsing and then wires them
up so that all of direct, onion-requested, and omq can now take the same
codepaths for requests.

The new OMQ requests, in particular, are publicly accessable at
`storage.whatever` endpoints (e.g. storage.store).

This also relaxes/changes some of the argument parsing:
- allow `pubkey` to be passed as that (currently `pubKey` is required,
and continues to work)
`last_hash` (AKA `lastHash`) is no longer required; omitting it is now
the same as providing an empty one.

Data is now stored as binary in the database; previously we were storing
the base64-encoded value received from the client.  (This will, however,
break any client that expected to be able to send random data).

Also adds an `info` storage endpoint that returns the current version &
timestamp.

TODO: need a database migration to convert existing (base64) data.  (We
need a db migration for other reasons, as well).
2021-06-16 18:59:07 -03:00
Jason Rhinelander 481bf72977 Add FIXME comments around some broken database queries 2021-06-16 18:59:06 -03:00
Jason Rhinelander 586dbd5a28 Replace unsafe static local vars with atomics 2021-06-16 18:59:06 -03:00
Jason Rhinelander 8993fa093f Refactor how TTL & timestamps are stored
Currently we have an awkward storage of timestamp + ttl + expiry in the
database, and timestamp + ttl passed in from the client (as strings!).
Althis is awkward because we want to be able to shorten the expiry, but
that would mess up TTL.  Additionally everything is stored as
`uint64_t`s, which is messy and not type safe.

This commit makes these changes:

- ttl and timestamp can now be sent by the client as integers (in
addition to the current string value).  NB: this is not reliable until
the entire SN network is on the next SS release.

- ttl, timestamp, and expiry are now type-safe std::chrono types instead
of raw integers (milliseconds for ttl and system_clock::time_points for
the other two).

- ttl is no longer stored: instead we just store timestamp + expiry.
(This will let us update expiry later without worrying about TTL).

- ttl/timestamp value validation is moved out of `common/oxen_common.h`
(which was a very odd place for it) and into request_handler.

- serialization no longer supports message_t; rather message_t is *only*
for holding the value the client gives us.  SS now uses storage::Item
everywhere other than incoming client data.

- Removed TTL/Nonce storage and retrieval

- Fully specify the queries columns instead of using `SELECT *`

- Don't use "`" for identifier quoting inside sqlite.  It's non-standard
MySQL garbage that sqlite3 supported only for MySQL compatibility.
2021-06-16 18:59:06 -03:00
Jason Rhinelander 6b70b3fe71 Optimize random message retrieval
The approach being used here with an offset is painfully inefficient,
and has a race condition; this switches it to something better.

This also allows elimination of the ridiculous
"util::uniform_distribution_portable" call which didn't produce anything
portable at all.
2021-06-16 18:59:06 -03:00
Jason Rhinelander 7fab7e3767 Database: enable WAL
Because it's better in just about every way.
2021-06-16 18:59:06 -03:00
Jason Rhinelander c881a83fb3 Database: use logger
Raw printing to stderr is gross and racey.
2021-06-16 18:59:06 -03:00
Jason Rhinelander 92367efe5b Add used page count stats 2021-06-16 18:59:06 -03:00
Jason Rhinelander a3cb7bc980 Improve logging output
- Show [filename:line] rather than __func__ because __func__ is useless
  when called from a lambda (it just shows `operator()`).
- Add time since startup (like lokinet does)
- Fix up log formatter so that it doesn't have to double-format and
  doesn't break when given a value that isn't a string literal for the
  format string.
- Calling via ->log instead of ->debug, etc. requires changing
  `OXEN_LOG(error, ...)` to `OXEN_LOG(err, ...)` to match the actual
  spdlog log level (which is different from the logging method for some
  reason).
2021-06-16 18:59:06 -03:00
Jason Rhinelander 9206e640a9 Separate Database from boost::asio
Database has a dependency on boost::asio so that it can set up a timer,
but this is awkward as it couples the Database class with an
implementation detail of the Database user.

Fix this by removing it, making the cleanup timer callback the
responsibility of the caller.

This also fixes some spurious failures due to race conditions between
the threads in the storage test code.
2021-04-25 20:51:59 -03:00
Jason Rhinelander 8d34f76002 Storage server refactoring & ping reporting redesign
Refactoring:

- Overhaul how pubkeys are stored: rather that storing in strings in multiple forms (hex, binary,
  base32) we now store each pubkey type (legacy, ed25519, x25519) in a type-safe container which
  extends an std::array, thus giving us allocation-free storage for them.
  - Do conversion into these types early on so that *most* of the code takes type-safe pubkeys
    rather than strings in random formats; thus making the public API (e.g. HTTP request parser)
    deal with encoding/decoding, rather than making it happen deep down.  (For example,
    ChannelEncryption shouldn't need to worry about decoding a hex string).  This was pretty messy:
    some code wanted hex, some base32z, some base32z with ".snode" appended, some base64.
- When printing pubkeys in logs, print them as hex, not base32z.  Base32z was really hard to
  reconcile with pubkeys because everywhere *else* we show them as hex.
- Overhaul sn_record_t with a much lighter one: it is now a very simple struct with just the members
  it needs (ip, ports, pubkeys), and is no longer hashable (instead we can hash on the
  .legacy_pubkey member).
  - Simplify some interfaces taking multiple values from sn_record_t by just passing the sn_record_t
- Moved a bunch of things that were in the global namespace into the oxen namespace.
- Simplify swarm storage and lookup: we previous had various methods that did a linear scan on all
  active nodes, and could do string-based searching for different representations of the pubkey
  (hex, base32z, etc.).  Replace it all with a simple structure of:
    unordered_map<legacy_pubkey, sn_record_t>
    unordered_map<ed25519_pubkey, legacy_pubkey>
    unordered_map<x25519_pubkey, legacy_pubkey>
  where the first map holds the entries and the latter two point us to the key in the first one.
- De-templatize ChannelEncryption, and make it take pubkey types rather than strings.  (The template
  was doing nothing as it was only ever used with T=std::string).
- Fix a leak in ChannelEncryption CBC decryption if it throws (the context would leak) by storing
  the context in a unique_ptr with a deleter that frees the context.
- Optimized ChannelEncryption encryption code somewhat by reducing allocations via more use of
  string_views and tweaking how we build the encrypted strings.
- Fix legacy (i.e. Monero) signature generation: the random byte value being generated in was only
  setting the first 11 bytes of 32.
- Miscellaneous code cleanups (much of which are C++14/17 syntax).
- Moved std::literals namespace import to a small number of top-level headers so they are available
  everywhere.  (std::literals is guaranteed to never conflict with user-space literals precisely so
  that doing this everywhere is perfectly safe without polluting -- i.e. "foo"sv, 44s can *never* be
  anything other than string_view and seconds literals).
- Made pubkey parsing (e.g. in HTTP headers) accept any of hex/base32z/base64 so that we can, at
  some point in the future, just stop using base32z representation (since it conflicts badly with
  lokinet .snode address which is based on the ed25519 pubkey, not the legacy pubkey).
- RateLimiter - simply it to take the IP as a uint32_t (and thus avoid allocations).  (Similarly it
  avoids allocations in the SN case via using the new pubkey type).
- Move some repeated tasks into the simpler oxenmq add_timer().

Ping reporting:

This completely rewrites how ping results are handled.

Current situation: SS does its own testing of other SS.  Every 10s it picks some random node and
sends an HTTP ping and OMQ ping to it.  If they fail either it tracks the failure and tries them
again every 10s.  If they are still failing after 2h, then it finally tells oxend about the failure
one time, remembers that it told oxend (in memory), and then never tells it again until the remote
SS starts responding to pings again; once that happens it tells oxend that it is responding again
and then never tells it anything else unless the remote starts failing again for 2h.

This has some major shortcomings:
- The 10s repeat will hammer a node pretty hard with pings, because if it is down for a while, most
  of the network will notice and be pinging it every 10s.  That means 1600x2 incoming requests every
  10s, which is pretty excessive.
- Oxend reporting edge case 1: If a node is bad and then storage server restarts, SS won't have it
  in its "bad list" anymore, so it isn't testing it at all (until it gets selected randomly, but
  with 1600 nodes that is going to be an average of more than 2 hours and could be way longer).
  Thus oxend never gets the "all good" signal and will continue to think the node is bad for much,
  much longer than it actually is.  (In fact, it may *never* get a good signal again if it's working
  the next time SS randomly pings it).
- Restarts the other way are also a problem: when oxend restarts it doesn't know of any of the bad
  nodes anymore, but since SS only tells it once, it never learns about it and thinks it's good.
- `oxend print_sn <PUBKEY>` is much, much less useful than it could be: usually storage servers are
  in "awaiting first result" status because SS won't tell oxend anything until it has decided there
  is some failure.  When it tells you "last ping was .... ago" that's also completely useless
  because SS never reports any pings except for the first >2h bad result, and the first good result.

I suspect the reporting above was out of a concern than talking to oxend rpc too much would overload
it; that isn't the case anymore (and since SS is now using a persistent oxenmq connection the
requests are *extremely* fast since it doesn't even have to establish a connection).

So this PR overhauls it completely as follows:
- SS gets much dumber (i.e. simpler) w.r.t. pings: all it does is randomly pick probably-good nodes
  to test every 10s, and then pings known-failing nodes to re-test them.
- Retested nodes don't get pounded every 10s, instead they get the first retry after 10s, the second
  retry 20s after that, then 30s after that, and so on up to 5 minute intervals between re-tests.
- SS tells oxend *every* ping result that it gets (and doesn't track them, except to keep them or
  remove them from the "bad" list)
- Oxend then becomes responsible for deciding when a SS is bad enough to fail proofs.  On the oxend
  side the rule is:
    - if we have been receiving bad ping reports about a node from SS for more than 1h5min without
      any good ping in that time *and* we received a bad ping in the past 10 minutes then we
      consider it bad.  (the first condition is so that it has to have been bad for more than an
      hour, and the second condition is to ensure that SS is still sending us bad test results).
    - otherwise we consider it good (i.e. because either we aren't getting test results or because
      we're getting good test results).
    - Thus oxend can useful and accurately report the last time some storage server was tested,
      which allows much better diagnostics of remote SN status.
- Thus if oxend restarts it'll start getting the bad results right away, and if SS restarts oxend
  will stop getting them (and then fall back to "no info means good").
2021-04-18 14:50:40 -03:00
Maxim Shishmarev b58ef69647 Speed up unit tests for storage 2021-04-09 16:32:13 +10:00
Jason Rhinelander 28e6fa66e9 Add static build capability to cmake
This is adapted from the versions in loki-core and lokinet.
2021-01-13 15:19:22 -04:00
Sean Darcy d23c8417aa initial rebrand 2021-01-07 15:12:15 +11:00
Jason Rhinelander 708e7a1d18 Remove cmake cruft
- loki_add_subdirectory was a unnecessary wrapper around
add_subdirectory: it was an attempt to make an idempotent version of
add_subdirectory, but that isn't needed at all and just adds cruft: the
top-level CMakeLists.txt already includes all the subdirectories so we
can just trust it.

- removed incorrect subdirectory "project()" definitions.

- crypto/CMakeLists.txt pointlessly listed all headers in the source
list.

- set c++ standard in the top-level makefile instead of on each target
since we intentionally want it everywhere.

- Linking directly to pthread/dl with conditional OS checks was wrong;
fix it to be proper cmake (linking to Threads::Threads and
${CMAKE_DL_LIBS}).

- Various cmake files erroneously listed their src directories in their
include paths.

- Made various library linkages PRIVATE instead of PUBLIC where a
transient dependency to dependent targets does not make sense.
2021-01-05 18:27:55 -04:00
Jason Rhinelander b8bc84d1e1 Remove boost::{filesystem,chrono,thread}
Replace them with their std:: version.
2021-01-05 17:32:01 -04:00
Jason Rhinelander cd5e4252a8 Modernize boost cmake dependencies 2021-01-05 17:30:13 -04:00
Maxim Shishmarev 1f5f85ebcf Limit db size to 3.5 GB 2020-09-14 15:24:49 +10:00
Jason Rhinelander 28f09402e6 Switch to C++17
- updates lokimq to dev branch
- changes compilation mode to C++17 (which is now required by lokimq,
  and already widely applied in lokinet and lokid dev branches)
- replace lokimq::string_view with std::string_view
- replace boost::optional with std::optional, except for:
  - boost::optional<std::function<...>> doesn't need optional at all
    because a std::function<...> is already nullable.
  - boost::optional<T&> isn't supported by std::optional because it
    makes little sense (it is just a `T*`) so just switch to T* instead.
2020-08-08 02:21:33 -03:00
Jason Rhinelander f01db5bb2f Remove headers from cmake add_library
Headers aren't supposed to be listed in `add_library` calls and are an
unfortunately common cmake anti-pattern.  (cmake does *not* need headers
listed to know how to check that things need rebuilding when a header
changes, which seems to be the reason people think they have to include
them).

Apparently this antipattern emerged partly because of buggy behaviour in
MSVC pre-2017 that didn't understand how to find headers when loading a
CMake project.
2020-04-13 16:19:52 -03:00
Maxim Shishmarev 23b6b0e087 Clean-up log messages 2019-08-07 17:57:07 +10:00
Beaudan 1d2a71435f This should not be logged as an error because it is just a duplicate message 2019-07-15 14:13:18 +10:00
Maxim Shishmarev 9671403cee better error logging 2019-07-11 16:11:16 +10:00
sachaaaaa 0eac8bbefb declutter main: extract command line parser logic and logging stuff 2019-07-02 16:43:54 +10:00
Beaudan 16ee7cfdbf Remove boost log includes and remove last accidental logs 2019-06-27 15:21:28 +10:00
Maxim Shishmarev c2d393abc6
Final (minor) API changes before the release (#181)
* remove namespace service_node

* API changes

* fix unit tests not compiling

* clang format
2019-06-26 14:42:45 +10:00
Maxim Shishmarev 89cb8b6623 clean-up 2019-06-25 16:29:12 +10:00
Maxim Shishmarev 3bf4bbf2d9 Boost log -> spdlog 2019-06-25 15:39:49 +10:00
Beaudan Campbell-Brown 425b864912 Refactor logs (#174)
* Add LOG macro and function name to logging

* Move common.h to common folder

* Rename all BOOST_LOG_TRIVIAL to LOG

* LOG -> LOKI_LOG and use boost::filesystem

* Don't log from worker thread

* Do filename in line
2019-06-25 12:57:15 +10:00
sachaaaaa f121c735ed Limit client retrieve request to 10 messages at a time 2019-05-15 11:20:53 +10:00
Maxim Shishmarev eb10292a28
Merge pull request #115 from msgmaxim/peer-testing
Peer testing, part II (final)
2019-05-13 15:15:11 +10:00
Maxim Shishmarev 1a4b635033 db cleanup on startup 2019-05-13 14:36:40 +10:00
Maxim Shishmarev 744702cfb5 Use a single io_context 2019-05-10 16:21:27 +10:00
Maxim Shishmarev 6314864335 add responding to message tests, retrying if necessary 2019-05-10 15:47:26 +10:00
Maxim Shishmarev 2b16effe28 Add a database method to retrieve a message by hash 2019-05-10 11:58:47 +10:00
Maxim Shishmarev 63da523b27 Add db methods necessary for retrieving a random row; better error handling in db code 2019-05-07 13:25:37 +10:00
sachaaaaa f4a148da53 Proper use of sqlite error message 2019-04-26 16:05:14 +10:00
Maxim Shishmarev 9d73e10d28 use bulk_store api when process batches 2019-04-17 10:42:04 +10:00
sachaaaaa e36d23e6b4 Add ORDER BY in sql SELECT statement to prevent undefined order 2019-04-15 12:33:10 +10:00
sachaaaaa bd86b6ebca Moved all vendor code under vendors and add make format 2019-04-11 16:40:50 +10:00