littlefs

Files

Christopher Haster 6bd00caf93 Reimplemented eager shrub eviction, now with a more reliable heuristic

Unfortunately, waiting to evict shrubs until mdir compaction does not
work because we only have a single pcache. When we evict a bshrub we
need a pcache for writing the new btree root, but if we do this during
mdir compaction, our pcache is already busy handling the mdir
compaction. We can't do a separate pass for bshrub eviction, since this
would require tracking an unbounded number of new btree roots.

In the previous shrub design, we meticulously tracked the compacted
shrub estimate in RAM, determining exactly how the estimate would change
as a part of shrub carve operations.

This worked, but was fragile. It was easy for the shrub estimate to
diverge from the actual value, and required quite a bit of extra code to
maintain. Since the use cases for bshrubs is growing a bit, I didn't
want to return to this design.

So here's a new approach based on emulating btree compacts/splits inside
the shrubs:

1. When a bshrub is fetched, scan the bshrub and calculate a compaction
   estimate. Store this.

2. On every commit, find the upper bound of new data being progged, and
   keep track of estimate + progged. We can at least get this relatively
   easily from commit attr lists. We can't get the amount deleted, which
   is the problem.

3. When estimate + progged exceeds shrub_size, scan the bshrub again and
   recalculate the estimate.

4. If estimate exceeds the shrub_size/2, evict the bshrub, converting it
   into a btree.

As you may note, this is very close to how our btree compacts/splits
work, but emulated. In particular, evictions/splits occur at
(shrub_size/block_size)/2 in order to avoid runaway costs when the
bshrub/btree gets close to full.

Benefits:

- This eviction heuristic is very robust. Calculating the amount progged
  from the attr list is relatively cheap and easy, and any divergence
  should be fixed when we recalculate the estimate.

- The runtime cost is relatively small, amortized O(log n) which is
  the existing runtime to commit to rbyds.

Downsides:

- Just like btree splits, evictions force our bshrub to be ~1/2 full on
  average. This combined with the 2x cost for mdir pairs, the 2x cost
  for mdirs being ~1/2 full on average, and the need for both a synced
  and unsynced copy of file bshrubs brings our file bshrub's overhead up
  to ~16x, which is getting quite high...

Anyways, bshrubs now work, and the new file topology is passing testing.

An unfortunate surprise is the jump in stack cost. This seems to come from
moving the lfsr_btree_flush logic into the hot-path that includes bshrub
commit + mdir commit + all the mtree logic. Previously the separate of
btree/shrub commits meant that the more complex block/btree/crystal logic
was on a separate path from the mdir commit logic:

                    code           stack           lfsr_file_t
  before bshrubs:  31840            2072                   120
  after bshrubs:   30756  (-3.5%)   2448 (+15.4%)          104 (-15.4%)

I _think_ the reality is not actually as bad as measured, most of these
flush/carve/commit functions calculate some work and then commit it in
seperate steps. In theory GCC's shrinkwrapping optimizations should
limit the stack to only what we need as we finish different
calculations, but our current stack measurement scripts just add
together the whole frames, so any per-call stack optimizations get
missed...

2023-11-21 00:04:30 -06:00

amor.py

Tweaked amor.py to use size field for amortized measurements

2023-11-05 15:55:15 -06:00

avg.py

Tried to write errors to stderr consistently in scripts

2023-11-05 15:55:07 -06:00

bench.py

Changed --context short flag to -C in scripts

2023-11-06 01:59:03 -06:00

changeprefix.py

Added scripts/crc32c.py