forked from Imagelibrary/littlefs
Implemented tree rebalancing during rbyd compaction
This isn't actually for performance reasons, but to reduce storage
overhead of the rbyd metadata tree, which was showing signs of being
problematic for small block sizes.
Originally, the plan for compaction was to rely on the self-balancing
rbyd append algorithm and simply append each tag to a new tree.
Unfortunately, since each append requires a rewrite of the trunk
(current search path), this introduces ~n*log(n) alts but only uses ~n alts
for the final tree. This really starts to put pressure on small blocks,
where the exponential-ness of the log doesn't kick in and overhead
limits are already tight.
Measuring lfsr_mdir_commit code size, this shows a ~556 byte cost on
thumb: 16416 -> 16972 (+3.4%). Though there are still some optimizations
on the table, this implementation needs a cleanup pass.
alt overhead code cost
rebalance: <= 28*n 16972
append: <= 24*n*log(n) 16416
Note these all assume worst case alt overhead, but we _need_ to assume
worst case for our rbyd estimations, or else the filesystem can get
stuck in unrecoverable compaction states.
Because of the code cost I'm not sure if rebalancing will stay, be
optional, or replace append-compaction completely yet.
Some implementation notes:
- Most tree balancing algorithms rely on true recursion, I suspect
recursion may be a hard requirement in general, but it's hard to find
bounded-ram algorithms.
This solution gets around the ram requirement by leveraging the fact
that our tags exist in a log to build up each layer in the tree
tail-recursively. It's interesting to note that this is a special
case of having little ram but lots of storage.
- Humorously this shouldn't result in a performance improvement. Rbyd
trees result in a worst case 2*log(n) height, and rebalancing gives us
a perfect worst case log(n) height, but, since we need an additional
alt pointer for each node in our tree, things bump back up to 2*log(n).
- Originally the plan was to terminate each node with an alt-always tag,
but during implementation I realized there was no easy way to get the
key that splits the children with awkward tree lookups. As a
workaround each node is terminated with an altle tag that contains the
key followed by an unreachable null tag. This is redundant information,
but makes the algorithm easier to implement.
Fortunately null tags use the smallest tag encoding, which isn't that
small, but that means this wastes at most 4*n bytes.
- Note this preserves the first-tag-always-ends-up-at-off=0x4 rule, which
is necessary for the littlefs magic to end up in a consistent place.
- I've dropped dropping vestigial names for now, which means vestigial
names can remain in btrees indefinitely. Need to revisit this.
This commit is contained in:
@@ -87,6 +87,7 @@ def fromleb128(data):
|
||||
return word, len(data)
|
||||
|
||||
def fromtag(data):
|
||||
data = data.ljust(4, b'\0')
|
||||
tag = (data[0] << 8) | data[1]
|
||||
weight, d = fromleb128(data[2:])
|
||||
size, d_ = fromleb128(data[2+d:])
|
||||
|
||||
Reference in New Issue
Block a user