Implemented tree rebalancing during rbyd compaction

This isn't actually for performance reasons, but to reduce storage
overhead of the rbyd metadata tree, which was showing signs of being
problematic for small block sizes.

Originally, the plan for compaction was to rely on the self-balancing
rbyd append algorithm and simply append each tag to a new tree.
Unfortunately, since each append requires a rewrite of the trunk
(current search path), this introduces ~n*log(n) alts but only uses ~n alts
for the final tree. This really starts to put pressure on small blocks,
where the exponential-ness of the log doesn't kick in and overhead
limits are already tight.

Measuring lfsr_mdir_commit code size, this shows a ~556 byte cost on
thumb: 16416 -> 16972 (+3.4%). Though there are still some optimizations
on the table, this implementation needs a cleanup pass.

               alt overhead  code cost
  rebalance:        <= 28*n      16972
  append:    <= 24*n*log(n)      16416

Note these all assume worst case alt overhead, but we _need_ to assume
worst case for our rbyd estimations, or else the filesystem can get
stuck in unrecoverable compaction states.

Because of the code cost I'm not sure if rebalancing will stay, be
optional, or replace append-compaction completely yet.

Some implementation notes:

- Most tree balancing algorithms rely on true recursion, I suspect
  recursion may be a hard requirement in general, but it's hard to find
  bounded-ram algorithms.

  This solution gets around the ram requirement by leveraging the fact
  that our tags exist in a log to build up each layer in the tree
  tail-recursively. It's interesting to note that this is a special
  case of having little ram but lots of storage.

- Humorously this shouldn't result in a performance improvement. Rbyd
  trees result in a worst case 2*log(n) height, and rebalancing gives us
  a perfect worst case log(n) height, but, since we need an additional
  alt pointer for each node in our tree, things bump back up to 2*log(n).

- Originally the plan was to terminate each node with an alt-always tag,
  but during implementation I realized there was no easy way to get the
  key that splits the children with awkward tree lookups. As a
  workaround each node is terminated with an altle tag that contains the
  key followed by an unreachable null tag. This is redundant information,
  but makes the algorithm easier to implement.

  Fortunately null tags use the smallest tag encoding, which isn't that
  small, but that means this wastes at most 4*n bytes.

- Note this preserves the first-tag-always-ends-up-at-off=0x4 rule, which
  is necessary for the littlefs magic to end up in a consistent place.

- I've dropped dropping vestigial names for now, which means vestigial
  names can remain in btrees indefinitely. Need to revisit this.
This commit is contained in:
Christopher Haster
2023-06-25 14:46:42 -05:00
parent fd43534b0e
commit 43dc3a5c8d
5 changed files with 278 additions and 15 deletions

View File

@@ -87,6 +87,7 @@ def fromleb128(data):
return word, len(data)
def fromtag(data):
data = data.ljust(4, b'\0')
tag = (data[0] << 8) | data[1]
weight, d = fromleb128(data[2:])
size, d_ = fromleb128(data[2+d:])