Commit Graph

1232 Commits

Author SHA1 Message Date
Christopher Haster
f29a4982c4 Added block-level erased-state checksums
Much like the erased-state checksums in our rbyds (ecksums), these
block-level erased-state checksums (becksums) allow us to detect failed
progs to erased parts of a block and are key to achieving efficient
incremental write performance with large blocks and frequent power
cycles/open-close cycles.

These are also key to achieving _reasonable_ write performance for
simple writes (linear, non-overwriting), since littlefs now relies
solely on becksums to efficiently append to blocks.

Though I suppose the previous block staging logic used with the CTZ
skip-list could be brought back to make becksums optional and avoid
btree lookups during simple writes (we do a _lot_ of btree
lookups)... I'll leave this open as a future optimization...

Unlike in-rbyd ecksums, becksums need to be stored out-of-band so our
data blocks only contain raw data. Since they are optional, an
additional tag in the file's btree makes sense.

Becksums are relatively simple, but they bring some challenges:

1. Adding becksums to file btrees is the first case we have for multiple
   struct tags per btree id.

   This isn't too complicated a problem, but requires some new internal
   btree APIs.

   Looking forward, which I probably shouldn't be doing this often,
   multiple struct tags will also be useful for parity and content ids
   as a part of data redundancy and data deduplication, though I think
   it's uncontroversial to consider this both heavier-weight features...

2. Becksums only work if unfilled blocks are aligned to the prog_size.

   This is the whole point of crystal_size -- to provide temporary
   storage for unaligned writes -- but actually aligning the block
   during writes turns out to be a bit tricky without a bunch of
   unecesssary btree lookups (we already do too many btree lookups!).

   The current implementation here discards the pcache to force
   alignment, taking advantage of the requirement that
   cache_size >= prog_size, but this is corrupting our block checksums.

Code cost:

           code          stack
  before: 31248           2792
  after:  32060 (+2.5%)   2864 (+2.5%)

Also lfsr_ftree_flush needs work. I'm usually open to gotos in C when
they improve internal logic, but even for me, the multiple goto jumps
from every left-neighbor lookup into the block writing loop is a bit
much...
2023-12-14 01:05:34 -06:00
Christopher Haster
26afd8b118 Reworked lfsr_ftree_flush to try to minimize btree lookups
Mainly by not looking up left neighbors after the first of many
block/fragment/crystal writes.

The right neighbors should already be avoiding redundant lookups since
redundant lookups imply a full fragment/block on the not-last write.

Additionally, once we fail our crystallization check, we can assume all
future fragment writes will fail, so we only need to do that lookup
once.

There's probably still more to optimize, but the way these heuristics
interact are tricky...
2023-12-12 12:10:06 -06:00
Christopher Haster
fcddef6f1a Dropped lfsr_data_t hole representation
We don't need this, it's not easily gc-able, and encourages redundant
lookups. It's better to just handle holes explicitly where needed.
2023-12-12 12:10:02 -06:00
Christopher Haster
4534d095e9 Reframed data slice operations in terms of lfsr_data_slice
This combines the previous lfsr_data_truncate/lfsr_data_fruncate
behavior into a single flexible function, and makes truncate/fruncate
small aliases (drop in the future?).

The combined behavior lets us adopt lfsr_data_slice in more places.
2023-12-12 12:07:58 -06:00
Christopher Haster
c4d75efa40 Added bptr checksums
Looking forward, bptr checksums provide an easy mechanism to validate
data residing in blocks. This extends the merkle-tree-like nature of the
filesystem all the way down to the data level, and is common in other
COW filesystems.

Two interesting things to note:

1. We don't actually check data-level checksums yet, but we do calculate
   data-level checksums unconditionally.

   Writing checksums is easy, but validating checksums is a bit more
   tricky. This is made a bit harder for littlefs, since we can't hold
   an entire block of data in RAM, so we have to choose between separate
   bus transactions for checksum + data reads, or extremely expensive
   overreads every read.

   Note this already exists at the metadata-level, the separate bus
   transactions for rbyd fetch + rbyd lookup means we _are_ susceptible
   to a very small window where bit errors can get through.

   But anyways, writing checksums is easy. And has basically no cost
   since we are already processing the data for our write. So we might
   as well write the data-level checksums at all times, even if we
   aren't validating at the data-level.

2. To make bptr checksums work cheaply we need an additional cksize
   field to indicate how much data is checksummed.

   This field seems redundant when we already have the bptr's data size,
   but if we didn't have this field, we would be forced to recalculate
   the checksum every time a block is sliced. This would be
   unreasonable.

   The immutable cksize field does mean we may be checksumming more data
   than we need to when validating, but we should be avoiding small
   block slices anyways for storage cost reasons.

This does add some stack cost because our bptr struct is larger now:

            code          stack
  before:  31200           2768
  after:   31272 (+0.2%)   2800 (+1.1%)
2023-12-12 12:07:55 -06:00
Christopher Haster
16fa88aac3 Rearranged lfsr_ftree_flush a bit and dropped lfsr_ftree_readnext
This avoids redundant lookups when holes are involved.

And we don't really leverage data holes as an abstraction well. We use
data holes in two places, but they do two different things, so they may
as well be specialized operations.

            code          stack
  before:  31092           2752
  after:   31200 (+0.3%)   2768 (+0.6%)
2023-12-10 14:10:52 -06:00
Christopher Haster
9f02cbb26b Tweaked mount/format/dbg littlefs info print
The info should now be ordered more-or-less by decreasing importance:

  littlefs v2.0 4096x256 0x{0,1}.36d w12.256
         ^  ^ ^    ^   ^   '-.-' ^     ^   ^
         '--|-|----|---|-----|---|-----|---|-- littlefs
            '-|----|---|-----|---|-----|---|-- on-disk major version
              '----|---|-----|---|-----|---|-- on-disk minor version
                   '---|-----|---|-----|---|-- block size
                       '-----|---|-----|---|-- block count
                             '---|-----|---|-- mroot blocks
                                 '-----|---|-- mroot trunk
                                       '---|-- mtree weight
                                           '-- mweight
2023-12-08 14:23:53 -06:00
Christopher Haster
6ccd9eb598 Adopted different strategy for hypothetical future configs
Instead of writing every possible config that has the potential to be
useful in the future, stick to just writing the configs that we know are
useful, and error if we see any configs we don't understand.

This prevents unnecessary config bloat, while still allowing configs to
be introduced in a backwards compatible way in the future.

Currently unknown configs are treated as a mount error, but in theory
you could still try to read the filesystem, just with potentially
corrupted data. Maybe this could be behind some sort of "FORCE" mount
flag. littlefs must never write to the filesystem if it finds unknown
configs.

---

This also creates a curious case for the hole in our tag encoding
previously taken up by the OCOMPATFLAGS config. We can query for any
config > SIZELIMIT with lookupnext, but the OCOMPATFLAGS flag would need
an extra lookup which just isn't worth it.

Instead I'm just adding OCOMPATFLAGS back in. To support OCOMPATFLAGS
littlefs has to do literally nothing, so this is really more of a
documentation change. And who know, maybe OCOMPATFLAGS will have some
weird use case in the future...
2023-12-08 14:03:56 -06:00
Christopher Haster
337bdf61ae Rearranged tag encodings to make space for BECKSUM, ORPHAN, etc
Also:

- Renamed GSTATE -> GDELTA for gdelta tags. GSTATE tags added as
  separate in-device flags. The GSTATE tags were already serving
  this dual purpose.

- Renamed BSHRUB* -> SHRUB when the tag is not necessarily operating
  on a file bshrub.

- Renamed TRUNK -> BSHRUB

The tag encoding space now has a couple funky holes:

- 0x0005 - Hole for aligning config tags.

  I guess this could be used for OCOMPATFLAGS in the future?

- 0x0203 - Hole so that ORPHAN can be a 1-bit difference from REG. This
  could be after BOOKMARK, but having a bit to differentiate littlefs
  specific file types (BOOKMARK, ORPHAN) from normal file types (REG,
  DIR) is nice.

  I guess this could be used for SYMLINK if we ever want symlinks in the
  future?

- 0x0314-0x0318 - Hole so that the mdir related tags (MROOT, MDIR,
  MTREE) are nicely aligned.

  This is probably a good place for file-related tags to go in the
  future (BECKSUM, CID, COMPR), but we only have two slots, so will
  probably run out pretty quickly.

- 0x3028 - Hole so that all btree related tags (BTREE, BRANCH, MTREE)
  share a common lower bit-pattern.

  I guess this could be used for MSHRUB if we ever want mshrubs in the
  future?
2023-12-08 13:28:47 -06:00
Christopher Haster
04c6b5a067 Added grm rcompat flag, dropped ocompat, tweaked compat flags a bit
I'm just not seeing a use case for optional compat flags (ocompat), so
dropping for now. It seems their *nix equivalent, feature_compat, is
used to inform fsck of things, but this doesn't really make since in
littlefs since there is no fsck. Or from a different perspective,
littlefs is always running fsck.

Ocompat flags can always be added later (since they do nothing).

Unfortunately this really ruins the alignment of the tag encoding. For
whatever reason config limits tend to come in pairs. For now the best
solution is just leave tag 0x0006 unused. I guess you can consider it
reserved for hypothetical ocompat flags in the future.

---

This adds an rcompat flag for the grm, since in theory a filesystem
doesn't need to support grms if it never renames files (or creates
directories?). But if a filesystem doesn't support grms and a grms gets
written into the filesystem, this can lead to corruption.

I think every piece of gstate will end up with its own compat flag for
this reason.

---

Also renamed r/w/oflags -> r/w/ocompatflags to make their purpose
clearer.

---

The code impact of adding the grm rcompat flag is minimal, and will
probably be less for additional rcompat flags:

            code          stack
  before:  31528           2752
  after:   31584 (+0.2%)   2752 (+0.0%)
2023-12-07 15:05:51 -06:00
Christopher Haster
c76ff08f67 Added lfsr_rbyd_compact, lfsr_rbyd_appendshrub
Also renamed lfsr_rbyd_compact -> lfsr_rbyd_appendcompaction to make
room for a more high-level lfsr_rbyd_compact and emphasize that this is
an append operation.

lfsr_rbyd_appendshrub was a common pattern emerging in mdir commit
related functions for moving shrubs around. It's just a useful function
to have.

lfsr_rbyd_compact, on the other hand, is only useful for the evicting
bshrubs. Most other rbyd compaction code involves weird corner cases and
doesn't seem to generalize well (or at least I can't see it). But adding
lfsr_rbyd_compact is nice for consistency with the mdir commit/compact
functions. And it can be used by lfsr_bshrub_commit at least...

Code changes minimal:

  before:  31596           2752
  after:   31528 (+0.2%)   2752 (+0.0%)
2023-12-06 23:58:46 -06:00
Christopher Haster
7e9c0fbd88 Changed lfsr_rbyd/btree/bshrub_commit to _not_ be atomic, adopted more
Now that error recovery is well defined (at least in theory), and
high-level mdir/file functions create on-stack copies, the on-stack
copies for the low-level rbyd/btree/bshrub commit functions are
redundant and not useful.

Dropping the redundant on-stack copies in low-level functions saves a
bit for stack usage.

Additionally, we can adopt lfsr_rbyd_commit in more places where it was
avoided to avoid even more redundant on-stack copies.

            code          stack
  before:  31676           2776
  after:   31596 (-0.3%)   2752 (-0.9%)
2023-12-06 22:49:06 -06:00
Christopher Haster
3a6afaf1c5 Renamed lfs_alloc_ack -> lfs_alloc_ckpoint
This name describes this operation ever so slightly better, I've already
been refering to this as "checkpointing the allocator" places.
2023-12-06 22:24:18 -06:00
Christopher Haster
43270ed50f Moved post-compaction mdir commits into lfsr_mdir_compact__
Since we need access to the pending attr-list in lfsr_mdir_compact__
now, we might as well just do the pending commit. Worst case, a call to
lfsr_mdir_compact__ can provide a NULL attr-list for the previous
behavior (though we always follow up compaction with a commit).

The only downside is that this behavior is now a bit different from
lfsr_rbyd_compact. Need to revisit lfsr_rbyd_compact and see if it
should do the same.

This had a tiny improvement to code size, which I think is just the cost
of two function calls:

            code          stack
  before:  31716           2776
  after:   31676 (-0.1%)   2776 (+0.0%)
2023-12-06 22:24:12 -06:00
Christopher Haster
0026121bc3 Merged lfsr_ftree_bufferedreadnext/readnext
Turns out we don't need lfsr_ftree_(unbuffered)readnext. Even if we did,
the unbuffered behavior can be obtained by passing a NULL buffer to
lfsr_ftree_readnext.

Code changes minimal:

            code          stack
  before:  31760           2776
  after:   31716 (-0.1%)   2776 (+0.0%)
2023-12-06 22:24:10 -06:00
Christopher Haster
2793ae2e03 Renamed lfsr_flags_is* -> lfsr_o_is*, dropped lfsr_file_is*
There is a point where all of these small function redeclarations become
more noise than helpful, though I'm not sure where that line is.
2023-12-06 22:24:08 -06:00
Christopher Haster
166845f43f Adopt lfsr_ftree_t to help with staging files
lfsr_ftree_t acts as a sort of proto-file type, holding enough
information for file reads/writes if the relevant mdir is known.

This lets low-level file write operations operate on a copy of the
proto-file without needing to copy the relevant mdir, file stuff, etc.

To make this work, lfsr_mdir_commit also needs to stage any bshrubs in
the attr-list, since these may not be in our opened file list, but this
is a good thing to handle implicitly anyways. We should only ever have
one untracked bshrub being operated on (multithreaded support would be a
whole other can of worms).

Unfortunately the extra machinery in lfsr_mdir_commit, and the fact that
passing two pointers around instead of one adds quite a bit of code,
means this comes with a code cost. But the tradeoff for stack cost and
no risk of stack pointers in our opened file list makes this probably
worth it:

            code          stack
  before:  31584           2824
  after:   31760 (+0.6%)   2776 (-1.7%)
2023-12-06 22:24:05 -06:00
Christopher Haster
6bfbbae341 Implemented file-level error recover in theory
Not yet tested, thus the "in theory". Testing this is going to be a bit
tricky. Fortunately on-stack copies are a pretty resilient way to
recover from errors.

This comes with an unfortunate, but necessary, code/stack increase:

            code          stack
  before:  31280           2736
  after:   31584 (+1.0%)   2824 (+3.2%)
2023-12-06 22:24:01 -06:00
Christopher Haster
5636895eee Significantly improved lfsr_btree_carve
The main optimization here is to try the minimize the number of
individual btree/bshrub commits. We can't always perform
lfsr_btree_carve in a single commit unfortunately, due to unbounded
crystal fragments and needing to commit to different leaf rbyds, but we
can combine attrs into single commits more than we were previously.

This is especially important for bshrubs, where commits can trigger full
mdir compactions.

Additionally, the method we use for breaking up small blocks into
fragments has been changed to never "lose" the underlying block pointer
until the fragmentation is complete. This prevents the block pointer
misallocation bug found earlier.

In theory this won't be necessary once file writes create on-stack
bshrub copies for error recovery, but it's at least nice to prove that
this is possible in case we ever want to not maintain on-stack bshrub
copies (code savings?).

            code          stack
  before:  31512           2648
  after:   31280 (-0.7%)   2736 (+3.3%)
2023-12-06 22:23:59 -06:00
Christopher Haster
effdb1e8c6 Enabled bypassing the file's write buffer during file writes
This is a nuanced optimization that is relatively unique to littlefs's
use case.

Because we're in a RAM constrained environment, it's not unreasonable
for whatever temporary buffer is used to write to a file to exceed the
write buffer allocated for the file. Since the temporary buffer is
temporary, it may be orders of magnitude larger than the file's write
buffer. If this happens, breaking up the write into write-buffer-sized
chunks so we can copy the data through the file's write buffer just
wastes IO.

In theory, littlefs should work just fine with a zero-sized file write
buffer, though this is not yet tested.

One interesting subtlety with the current implementation, we still flush
the file's write buffer when we bypass it. In theory you can avoid
flushing, but this risks strange write orders that could make low-level
write heuristics (such as the crystallization threshold) behave really
poorly.

---

Unfortunately the tests are now failing due to an interesting but
unrelated bug. It turns out bypassing our file buffer allows in-btree
block pointers to interact with heavily fragmented btrees for the first
time in our tests. This leads to an incorrect allocation of a block that
is in the previous copy of a file's btree during lfsr_btree_carve.

This isn't an issue for btree inner nodes. Btree commits happen
atomically, with the new btree being allocated on the stack and
protected by allocator checkpoints until completion.

But this is an issue for the block pointers, because lfsr_btree_carve is
not atomic and results in intermediary states where the block pointers
are lost.

What's a bit funny, is this commit is actually a part of some
preparation to introduce temporary file copies during lfsr_file_write
for error recovery. This would mean the previous state of our file would
remain viewable by the block allocator, fixing this bug.

I need to think a bit more on if this is the correct solution (debugging
these multi-layer bugs turns my brain into mush), but I think it is, in
which case this bug was fixed before it was even discovered.
2023-12-06 22:23:56 -06:00
Christopher Haster
6261bafed2 Added more file tests with multiple files, fixed bugs
Fortunately these operations are heavily tested in test_dirs. The only
difference with files is the possibility for shrubs to need to be
copied.

Bugs fixed:

- It's counterintuitive, but lfsr_rbyd_appendcompactattr _can_ error
  with LFS_ERR_RANGE when we are copying a shrub. This can happen if the
  underlying mdir needs compaction itself.

- It's possible to null-trunk bshrubs to appear in our filesystem
  traversal. Null-trunk bshrubs don't usually appear in any stable
  state, but they are created by lfsr_bshrub_alloc and lfsr_btree_commit
  to represent new, yet-uncommitted shrubs.

  This gets a bit tricky because we also use null-trunks to indicate if
  lfsr_btree_traversal has traversed the root. We can't rely on
  bid >= weight for this because zero-weight btrees are allowed.

  The solution here, though maybe temporary (famous last words), is to
  treat null-trunk btrees as not having a root. Which isn't really true,
  but null-trunk btree roots only exist between allocator checkpoints,
  so they are allowed to be unreachable.

  We really need more asserts that this is the case though... At least
  added an assert that we never commit/read null trunks on disk.
2023-12-06 22:23:53 -06:00
Christopher Haster
939dd2145a Added some corner-case tests, fixed related bugs/POSIX nuances
POSIX is notoriously full of subtle and confusing nuances. Not through
any fault of POSIX, but as a result of trying to describe a complex
system with simple and easy to use operations.

Corner cases fixed here:

- rename("dir", "file") => ENOTDIR

  This is the main surprise to me, and a mistake on my part. I thought
  EISDIR would be appropriate for any renames with mismatched types,
  since both involve a directory. It would be simpler code-wise, and
  avoid ambiguity around if "file" is not a dir, or some other file
  exists in the file's path. But I guess ENOTDIR makes more sense if you
  think of the destination as the target being operated on.

- remove("/") => EINVAL
- rename("/", "x") => EINVAL
- rename("x", "/") => ENOTEMPTY
- open("/") => EISDIR

  It's a bit difficult to lookup what error codes around root operations
  should be, since they mostly end up as EPERM on modern systems, but
  this doesn't really make sense for littlefs.

  The solution chosen here is to prefer directory-related errors (EISDIR,
  ENOTEMPTY) when possible, and fall back to EINVAL when the only issue
  is that the target is the root directory.

Also I tweaked lfsr_mtree_pathlookup a bit so mid=0 indicates the target
is the root and mid=-1 indicates the target can't be created (because of
a missing directory). I think using mid=0 for the latter is a leftover
from when mid=-1 was a bit of a mess...
2023-12-06 22:23:51 -06:00
Christopher Haster
abbd2d6c3f Made it possible to actually rename shrubbed files
This needed a bit of extra handling to copy the shrub, since it exists
outside of the mdir's main tree.

Also added relevant tests.
2023-12-06 22:23:47 -06:00
Christopher Haster
b1ce27f733 Reorganized test suites a bit
- Renamed test_dtree -> test_dirs
- Renamed test_dseek -> test_dread
- Split test_files -> test_files, test_fwrite
2023-12-06 22:23:45 -06:00
Christopher Haster
4da7c88eb0 Added shortcut encoding for linear/log powerlosses in test-runner
The previous encoding was a bit problematic with our linear and log
heuristics, which can grow thousands of powerlosses deep. You know you
have a problem when you're copying a test id that spans a dozen lines.

It also meant we were spending O(n^2) time just encoding powerloss ids:

  before: 942.68s
  after:  921.94s (-2.2%)

This new encoding takes advantage of the unused characters in our leb16
encoding, with an 'x' prefix indicating linear-heuristic powerlosses
and a 'y' prefix indicating log-heuristic powerlosses ('w' is used for
negative leb16s).

Before:

  - explicit: 42q2q2
  - linear:   123456789abcdefg1h1i1j1k1l1m1n1o1p1q1r1s1t1u1v1
  - log:      1248g1g2g4g8gg1gg2gg4gg8

After:

  - explicit: 42q2q2
  - linear:   xg2
  - log:      yc
2023-12-06 22:23:43 -06:00
Christopher Haster
d485795336 Removed concept of geometries from test/bench runners
This turned out to not be all that useful.

Tests already take quite a bit to run, which is a good thing! We have a
lot of tests! 942.68s or ~15 minutes of tests at the time of writing to
be exact. But simply multiplying the number of tests by some number of
geometries is heavy handed and not a great use of testing time.

Instead, tests where different geometries are relevant can parameterize
READ_SIZE/PROG_SIZE/BLOCK_SIZE at the suite level where needed. The
geometry system was just another define parameterization layer anyways.

Testing different geometries can still be done in CI by overriding the
relevant defines anyways, and it _might_ be interesting there.
2023-12-06 22:23:41 -06:00
Christopher Haster
b8d3a5ef46 Fixed inotify race conditions and fd leak in scripts
Since we were only registering our inotify reader after the previous
operation completed, it was easy to miss modifications that happened
faster than our scripts. Since our scripts are in Python, this happened
quite often and made it hard to trust the current state of scripts
with --keep-open, sort of defeating the purpose of --keep-open...

I think previously this race condition wasn't avoided because of the
potential to loop indefinitely if --keep-open referenced a file that the
script itself modified, but it's up to the user to avoid this if it is
an issue.

---

Also while fixing this, I noticed our use of the inotify_simple library
was leaking file descriptors everywhere! I just wasn't closing any
inotify objects at all. A bit concerning since scripts with --keep-open
can be quite long lived...
2023-12-06 22:23:38 -06:00
Christopher Haster
a9772d785a Removed removal of root bookmark in test_mtree
It turns out permanent root bookmark creates some rather interesting
constraints on our mtree:

1. We can never delete all mids, since at least one mid needs to exist
   to represent the root's bookmark.

2. We can never revert to an inlined mdir after uninlining, since our
   root bookmark always exists to stop this. This is an unfortunate
   downside as it would be nice to be able to reinline mdirs, but not
   the end of the world.

This restricts what operations are possible, and transitively, what we
can test.

This commit drops the removal of root bookmarks in test_mtree, which was
a workaround to keep tests from early implementation running. This was
preventing some minor optimizations. This required dropping some tests,
but these tests tested operations that aren't really possible in
practice.

Dropping the removal of root bookmarks allowed for a minor optimization
in lfsr_mdir_drop, and may lead to more in the future (or maybe just
stricter asserts):

            code          stack
  before:  31280           2648
  after:   31208 (-0.2%)   2648 (+0.0%)
2023-12-06 22:23:36 -06:00
Christopher Haster
74c4bb0792 Reverted moving lfsr_mdir_drop out of lfsr_mdir_commit
The logic is simpler with lfsr_mdir_drop being an implicit side-effect
of lfsr_mdir_commit, and right now a simpler system is preferred over a
complex one.

I think it will be more useful to focus on reducing the stack overhead
of lfsr_mdir_drop in general.

Still, it may be worth reverting this revert in the future.
2023-12-06 22:23:34 -06:00
Christopher Haster
608d3bd5a5 Moved lfsr_mdir_drop out of lfsr_mdir_commit, made explicit
This directly trades code cost for stack cost, by moving the stack
overhead of lfsr_mdir_drop out of the hot-path. Currently, the stack
hot-path is the mdir commit needed to flush file bshrubs, and this can
never drop an mdir:

  before:  31280          2648
  after:   31312 (+0.1%)  2520 (-4.8%)
2023-12-06 22:23:31 -06:00
Christopher Haster
eb6c361dfa Adopted lazy orphaned mdir drops
This ended up being much less of a simplification than I hoped it would.

It's still easier/more efficient to revert to a relocation in most cases
when dropping in an mdir split, and the small gain from simplifying how
drops/commits interact is overshadowed by the code duplication necessary
to separate lfsr_mdir_drop out from lfsr_mdir_commit:

            code          stack
  before:  30952           2528
  after:   31280 (+1.1%)   2648 (+4.7%)

Still, this does at least simplify the logical corner cases (we don't
need to abort commits when droppable anymore), and lfsr_mdir_drop is
ultimately necessary for supporting lazy file creation.

Also having a fix-orphans step during mount allows other littlefs
implementations the option to create orphanned mdirs without compat
issues. So this ends up the more flexible approach.

It _might_ be worth having both eager mdir drops and an explicit
lfsr_mdir_drop for lazy file creation in the future, but I doubt this
will end up worth the code duplication...

---

Oh right, I forgot to actually describe this change.

This trades eager mdir drops:

1. Drop mdirs from the mtree immediately as soon as their weight goes
   to zero.

For lazy mdir drops:

1. Drop mdirs from the mtree in a second commit.
2. Scan and drop orphaned mdirs on the first write after mount.

This sounds very similar to the previous "deorphan" scan, which risked
an extreme performance cost during mount, but it should be noted this
orphan scan only needs to touch every mdir once. This makes it no worse
than the overhead of actually mounting the filesystem.

We can also keep an eye out for orphaned mdirs when we mount, so no
extra scan is needed unless there was an unlucky powerloss.

Eager mdir dropping sounds simpler, but thanks to deferred commits
introduces some subtle complexity around aborting commits that would
drop an mdir to zero. Remember commits are viewable on-disk as soon as a
commit completes.

In _theory_, lazy mdir drops simplify the logic around committing to
mdirs.

Though the real kicker is that lazy mdir drops are required for lazy file
creation.

The current idea for lazy file creation involves tracking mid-less
opened-but-not-yet-created files. These files can have bshrubs, so they
need space on an mdir somewhere. But they aren't actually created yet,
so they don't have an mid.

This is fine (though it's probably going to be tricky) as long as we
allocate an mid on file sync, but there is always a risk of losing power
with mdirs that contain only RAM-backed files. Fortunately, no-mids
means no orphaned files, but it does mean orphaned mdirs with no synced
contents.

Long story short, lazy mdir drops are currently a necessary evil, and
logical simplification, that unfortunately comes with some cost.
2023-12-06 22:23:28 -06:00
Christopher Haster
aa79c274a6 Changed lfsr_mdir_commit__ to only copy this rbyd
This saves a bit of stack space in theory, but after the redund block
restructure this amounts to a single word.

But lfsr_mdir_commit__ isn't on the hot path anyways, so this doesn't
even matter...

            code          stack
  before:  30948           2528
  after:   30952 (+0.0%)   2528 (+0.0%)
2023-12-06 22:23:25 -06:00
Christopher Haster
c1c51a316a Some more cleanup around code organization
I would merge this with the previous cleanup items, but there's no way
I'm going to try to rebase this through the redund block restructure...
2023-12-06 22:23:18 -06:00
Christopher Haster
51e39747c0 Reverting alternate redund block layout in lfsr_mdir_t
See the previous commit for the reason. The alternate redund block
layout is just inferior in terms of both code and RAM.
2023-12-06 22:23:16 -06:00
Christopher Haster
9d182c2055 Attempted alternate redund block layout in lfsr_mdir_t
The idea here is to revert moving redund blocks into lfsr_rbyd_t, and
instead just keep a redundant copy of the rbyd blocks in the redund
blocks in lfsr_mdir_t.

Surprisingly, extra overhead in lfsr_mdir_t ended up with worse stack
usage than extra overhead in lfsr_rbyd_t. I guess we end up allocated
more mdirs than rbyds, which makes a bit of sense given how complicated
lfsr_mdir_commit is:

                    code          stack          structs
  redund union:    30976           2496             1072
  redund in rbyd:  30948 (-0.1%)   2528 (+1.3%)     1100 (+2.6%)
  redund in mdir:  31000 (+0.1%)   2536 (+1.6%)     1092 (+1.8%)

The mdir option does seem to improve struct overhead, but this hasn't
been a reliable measurement since it doesn't take into account how many
of each struct is allocated.

Given that the mdir option is inferior in both code and stack cost, and
requires more care to keep the rbyd/redund blocks in sync, I think I'm
going to revert this for now but keep the commit in the commit history
since it's an interesting comparison.
2023-12-06 22:23:13 -06:00
Christopher Haster
becbc0c2ad Moved redundant blocks into the lfsr_rbyd_t struct
This simplifies dependent structs with redundancy, mainly lfsr_mdir_t,
at a significant RAM cost:

            code          stack          structs
  before:  30976           2496             1072
  after:   30948 (-0.1%)   2528 (+1.3%)     1100 (+2.6%)

Which, to be honest, is not as bad as I thought it would be. Though it
is still pretty bad for no new features.

The motivation for this change:

1. The organization of the previous lfsr_mdir_t struct was a bit hacky
   and relied on exact padding so the redund block array and rbyd block
   lined up at the right offset.

2. The previous organization prevented theoretical "read-only rbyd
   structs" that could omit write-related fields, e.g. eoff and cksum.

   This idea is currently unused.

3. The current mdir=level-1, btree/data=level-0 redund design makes this
   RAM tradeoff pretty bad, but in theory higher btree redund levels
   would need the extra redund blocks in the rbyd struct anyways.

Still, the RAM impact to the current default configuration means this
should probably be reverted...
2023-12-06 22:23:11 -06:00
Christopher Haster
019044e4c6 Adopted better struct field names, cast to lfsr_openedmdir_t
- Renamed mdir->u.m to mdir->u.mdir.
- Prefer mdir->u.rbyd.* where possible.
- Changed file/dir mdirs to be stored directly, requiring a cast to
  lfsr_openedmdir_t to enroll in the opened mdir list.
2023-12-06 22:23:08 -06:00
Christopher Haster
a89b3e42ba Some cleanup items
- Adopted *_IS* naming convention for sign-bit macros.
- Made all struct initializing macros function-like, including the
  *_NULL() macros.
- Renamed ggrm/dgrm -> grm_g/grm_d.
- Renamed lfsr_mroot_commit_ -> lfsr_mroot_commit.
- Renamed LFSR_FILE_BSPROUT -> LFSR_FILE_ISDIRECT.
- Renamed LFSR_BSPROUT_NULL -> LFSR_FILE_BNULL().
- Dropped *_unerase functions for explicitly setting eoff=-1.
2023-12-06 22:23:06 -06:00
Christopher Haster
f4af2b407e More mid-related function cleanup
Reverted to one set of signed lfsr_mid_rid/bid functions, and tried to
make their usage more consistent.

We have two ways to compare mdirs now, lfsr_mdir_cmp (compares block
addresses) and lfsr_mdir_bid (compares mids), and it's not very clear
when to use which one. lfsr_mdir_cmp is a bit more robust in weird mid
cases (mainly inlined mdirs when mroot mid=-1), so currently preferring
that.

Also did some bit twiddling to preserve mid=-1 => bid=-1 and rid=-1,
this save a bit of code:

            code          stack
  before:  31056           2488
  after:   30972 (-0.3%)   2496 (+0.3%)
2023-12-06 22:23:02 -06:00
Christopher Haster
41b9caf25d Renamed mid related functions and tried to make them less cumbersome
- lfs->mleaf_bits -> lfs->mbits
- lfsr_mleafweight -> lfsr_mweight
- lfsr_midbmask -> lfsr_mid_bid
- lfsr_midrmask -> lfsr_mid_rid
- added lfsr_mid_cbid
- added lfsr_mid_crid
- added lfsr_mdir_* variants
2023-12-06 22:23:00 -06:00
Christopher Haster
928108da0a Removed the mtree param from lfsr_mtree_* functions
There's only one mtree in a given filesystem. With the recent
lfsr_mdir_commit restructure, it makes more sense for the mtree to be
implicit.

            code           stack
  before:  31096            2480
  after:   31016 (-0.3%)    2480 (+0.0%)
2023-12-06 22:22:57 -06:00
Christopher Haster
30a9a62620 Heavily reworked lfsr_mdir_commit, split into more mid-level functions
Originally, the intention of this rework was to make it possible to
shrub the mtree, i.e. allow an mshrub, i.e. inline the root rbyd of the
mtree to be inlined in the mroot.

This would allow small mtrees, 2, 3, etc mdirs, to save a block that
would be needed for the mtree's root.

But as the mshrub was progressing, minor problems kept unfolding, and
ultimately I've decided to shelve the idea of mshrubs for now. They add
quite a bit of complexity for relatively little gain:

- bshrubs are just complicated to update. They require a call to
  lfsr_mdir_commit to update the inlined-root, which is a bit of a
  problem when your mshrub needs to be updated inside lfsr_mdir_commit,
  and your system disallows recursion...

  Recursion _can_ be avoided by separate bshrub commit variants that go
  through either lfsr_mdir_commit or lfsr_mdir_commit_, but this
  complicates things and requires some code duplication, weakening the
  value of reusing the bshrub data-structure.

- It's not always possible to compact the mshrub's backing mroot when
  we need to modify the mshrub.

  If an mroot becomes full and needs to split, for example, we need to
  allocate the new mdirs, update the (new) mshrub, and then commit
  everything into the mroot when we compact. But the "update the (new)
  mshrub" step can't be done until after we compact, because the mroot
  is by definition full.

  This _can_ also be worked around, by building an attr list containing
  all of the mshrub changes, and committing the mshrub/mroot changes in
  the same transaction, but this complicates things and increases the
  stack cost for the current hot-path.

- Every shrub needs a configurable shrub size, and the mshrub is no
  exception. This adds another config option and complicates shared
  shrub eviction code.

- The value for mshrubs is not actually that great.

  Unlike file bshrubs, there's only one mshrub in the filesystem, and
  I'm not sure there's a situation where a filesystem has >1 mdirs and
  the exact number of allocated blocks is critical.

And this complexity is reflected in code cost and robustness, not to
mention developer time. I think for littlefs this is just not worth
doing. At least not now.

We can always introduce mshrubs in a backwards compatible manner if
needed.

---

But this rework did lead to better code organization around mdir commits
and how they update the mtree/mroot, so I'm keeping those changes.

In general lfsr_mdir_commit has been broken up into mtree/mroot specific
functions that _do_ propagate in-device changes. Any commit to the mroot
changes the on-disk state of the filesystem anyways, so the mroot commit
_must_ be the last thing lfsr_mdir_commit does.

This leads to some duplicated updates, but that's not really a problem.

Here's the new call graph inside lfsr_mdir_commit:

                lfsr_mdir_commit
         .---------' | | | '-----------------.
         v           | | '-----------------. |
  lfsr_mtree_commit  | '--------.          | |
         '---------. |          |          | |
                   v v          |          | |
             lfsr_mroot_commit  |          | |
                   | '--------. |          | |
                   |          v v          | |
                   |    lfsr_mdir_commit_  | |
                   | .--------' '--------. | |
                   | | .-----------------|-' |
                   v v v                 v   v
              lfsr_mdir_commit__    lfsr_mdir_compact__

This rework didn't really impact code/stack that much. It added a bit of
code, but saved a bit of RAM. The real value is that the narrower-scoped
functions contain more focused logic:

            code          stack
  before:  30780           2504
  after:   31096 (+1.0%)   2480 (-1.0%)
2023-12-06 22:22:52 -06:00
Christopher Haster
a8f54fb1e0 Brought back the lfsr_mptr_t
This is just a useful type to have to make the code a bit more
readable.

This doesn't affect the code that much, except we are making more
on-stack copies of mptrs since the mdir doesn't technically contain
a mutable mptr. Maybe this should change?

            code          stack
  before:  30768           2496
  after:   30776 (+0.0%)   2504 (+0.3%)
2023-11-21 14:16:09 -06:00
Christopher Haster
bc8d54f9e0 Cleaned up bshrub code a bit
Mostly moved things around, removed a vestigial but harmless eviction
check in lfsr_mdir_compact__, add lfsr_bshrub_alloc/fetch, etc.
2023-11-21 14:10:33 -06:00
Christopher Haster
2a4aadca0e Fixed a number of bshrub-related alloc/clobber test failures
- There was a lingering strict pcache assert in lfs_bd_erase. Very
  unlikely to hit, but it is possible and shouldn't be an assert now
  that pcache can be left in an arbitrary state. That being said, it
  was asserting on an actual bug in this case.

- Our btree traversal was not traversing the roots of zero-weight
  btrees. Zero-weight btrees can happen as an intermediary step during
  btree/bshrub carving. If the stars align with the block allocator and
  intermediary carving states this can cause incorrect block
  allocations.

- Staged updates to bsprouts/bshrubs need to be played out before
  updates to opened mdirs lfsr_mdir_commit, this is just because
  lfsr_file_isbsprout/isbshrub depend on mdir.block and updating the
  mdirs first corrupts this.

  Maybe a different organization to this code would be useful, it is
  already full of TODOs.
2023-11-21 02:37:23 -06:00
Christopher Haster
4793d2f144 Fixed new bshrub roots and related bug fixing
It turned out by implicitly handling root allocation in
lfsr_btree_commit_, we were never allowing lfsr_bshrub_commit to
intercept new roots as new bshrubs. Fixing this required moving the
root allocation logic up into lfsr_btree_commit.

This resulted in quite a bit of small bug fixing because it turns out if
you can never create non-inlined bshrubs you never test non-inlined
bshrubs:

- Our previous rbyd.weight == btree.weight check for if we've reached
  the root no longer works, changed to an explicit check that the blocks
  match. Fortunately, now that new roots set trunk=0 new roots are no
  longer a problematic case.

- We need to only evict when we calculate an accurate estimate, the
  previous code had a bug where eviction occurred early based only on the
  progged-since-last-estimate.

- We need to manually set bshrub.block=mdir.block on new bshrubs,
  otherwise the lfsr_bshrub_isbshrub check fails in mdir commit staging.

Also updated btree/bshrub following code in the dbg scripts, which
mostly meant making them accept both BRANCH and SHRUBBRANCH tags as
btree/bshrub branches. Conveniently very little code needs to change
to extend btree read operations to support bshrubs.
2023-11-21 00:06:08 -06:00
Christopher Haster
6bd00caf93 Reimplemented eager shrub eviction, now with a more reliable heuristic
Unfortunately, waiting to evict shrubs until mdir compaction does not
work because we only have a single pcache. When we evict a bshrub we
need a pcache for writing the new btree root, but if we do this during
mdir compaction, our pcache is already busy handling the mdir
compaction. We can't do a separate pass for bshrub eviction, since this
would require tracking an unbounded number of new btree roots.

In the previous shrub design, we meticulously tracked the compacted
shrub estimate in RAM, determining exactly how the estimate would change
as a part of shrub carve operations.

This worked, but was fragile. It was easy for the shrub estimate to
diverge from the actual value, and required quite a bit of extra code to
maintain. Since the use cases for bshrubs is growing a bit, I didn't
want to return to this design.

So here's a new approach based on emulating btree compacts/splits inside
the shrubs:

1. When a bshrub is fetched, scan the bshrub and calculate a compaction
   estimate. Store this.

2. On every commit, find the upper bound of new data being progged, and
   keep track of estimate + progged. We can at least get this relatively
   easily from commit attr lists. We can't get the amount deleted, which
   is the problem.

3. When estimate + progged exceeds shrub_size, scan the bshrub again and
   recalculate the estimate.

4. If estimate exceeds the shrub_size/2, evict the bshrub, converting it
   into a btree.

As you may note, this is very close to how our btree compacts/splits
work, but emulated. In particular, evictions/splits occur at
(shrub_size/block_size)/2 in order to avoid runaway costs when the
bshrub/btree gets close to full.

Benefits:

- This eviction heuristic is very robust. Calculating the amount progged
  from the attr list is relatively cheap and easy, and any divergence
  should be fixed when we recalculate the estimate.

- The runtime cost is relatively small, amortized O(log n) which is
  the existing runtime to commit to rbyds.

Downsides:

- Just like btree splits, evictions force our bshrub to be ~1/2 full on
  average. This combined with the 2x cost for mdir pairs, the 2x cost
  for mdirs being ~1/2 full on average, and the need for both a synced
  and unsynced copy of file bshrubs brings our file bshrub's overhead up
  to ~16x, which is getting quite high...

Anyways, bshrubs now work, and the new file topology is passing testing.

An unfortunate surprise is the jump in stack cost. This seems to come from
moving the lfsr_btree_flush logic into the hot-path that includes bshrub
commit + mdir commit + all the mtree logic. Previously the separate of
btree/shrub commits meant that the more complex block/btree/crystal logic
was on a separate path from the mdir commit logic:

                    code           stack           lfsr_file_t
  before bshrubs:  31840            2072                   120
  after bshrubs:   30756  (-3.5%)   2448 (+15.4%)          104 (-15.4%)

I _think_ the reality is not actually as bad as measured, most of these
flush/carve/commit functions calculate some work and then commit it in
seperate steps. In theory GCC's shrinkwrapping optimizations should
limit the stack to only what we need as we finish different
calculations, but our current stack measurement scripts just add
together the whole frames, so any per-call stack optimizations get
missed...
2023-11-21 00:04:30 -06:00
Christopher Haster
6b82e9fb25 Fixed dbg scripts to allow explicit trunks without checksums
Note this is intentionally different from how lfsr_rbyd_fetch behaves
in lfs.c. We only call lfsr_rbyd_fetch when we need validated checksums,
otherwise we just don't fetch.

The dbg scripts, on the other hand, always go through fetch, but it is
useful to be able to inspect the state of incomplete trunks when
debugging.

This use to be how the dbg scripts behaved, but they broke because of
some recent script work.
2023-11-20 23:28:27 -06:00
Christopher Haster
c94b5f4767 Redesigned the inlined topology of files, now using geoxylic btrees
As a part of the general redesign of files, all files, not just small
files, can inline some data directly in the metadata log. Originally,
this was a single piece of inlined data or an inlined tree (shrub) that
effectively acted as an overlay over the block/btree data.

This is now changed so that when we have a block/btree, the root of the
btree is inlined. In effect making a full btree a sort of extended
shrub.

I'm currently calling this a "geoxylic btree", since that seems to be a
somewhat related botanical term. Geoxylic btrees have, at least on
paper, a number of benefits:

- There is a single lookup path instead of two, this simplifies code a
  bit and decreases lookup costs.

- One data structure instead of two also means lfsr_file_t requires
  less RAM, since all of the on-disk variants can go into one big union.
  Though I'm not sure this is very significant vs stack/buffer costs.

- The write path is much simpler and has less duplication (it was
  difficult to deduplicate the shrub/btree code because of how the
  shrub goes through the mdir).

  In this redesign, lfsr_btree_commit_ leaves root attrs uncommitted,
  allowing lfsr_bshrub_commit to finish the job via lfsr_mdir_commit.

- We don't need to maintain a shrub estimate, we just lazily evict trees
  during mdir compaction. This has a side-effect of allowing shrubs to
  temporarily grow larger than shrub_size before eviction.

  NOTE THIS (fundamentally?) DOESN'T WORK

- There is no awkwardly high overhead for small btrees. The btree root
  for two-block files should be able to comfortably fit in the shrub
  portion of the btree, for example.

- It may be possible to also make the mtree geoxylic, which should
  reduce storage overhead of small mtrees and make better use of the
  mroot.

All of this being said, things aren't working yet. Shrub eviction during
compaction runs into a problem with a single pcache -- how do we write
the new btrees without dropping the compaction pcache? We can't evict
btrees in a separate pass becauce their number is unbounded...
2023-11-20 23:23:58 -06:00
Christopher Haster
7243c0f371 Fixed some confusion in tracebd.py around buffered lines with headers
Also limited block_size/block_count updates to only happen when the
configured value is None. This matches dbgbmap.py.

Basically just a cleanup of some bugs after the rework related to
matching dbgbmap.py. Unfortunately these scripts have too much surface
area and no tests...
2023-11-13 13:42:11 -06:00