Commit Graph

2369 Commits

Author SHA1 Message Date
Christopher Haster
0828fd9bf3 Reverted LFS3_CKDATACKSUMREADS -> LFS3_CKDATACKSUMS
LFS3_CKDATACKSUMREADS is just too much.

The downside is it may not be clear how LFS3_CKDATACKSUMREADS interacts
with the future planned LFS3_CKREADS (LFS3_CKREADS implies
LFS3_CKDATACKSUMS + LFS3_CKMETAREDUND), but on the flip side you may
actually be able to type LFS3_CKDATACKSUMS on the first try.
2025-07-16 14:25:20 -05:00
Christopher Haster
17cefcdd42 Dropped LFS3_FORCEINLINE from lfs3_data_slice
This use to save code/stack, but apparently not anymore:

                            code          stack          ctx
  before:                  36960           2392          652
  after:                   36936 (-0.1%)   2384 (-0.3%)  652 (+0.0%)

                            code          stack          ctx
  ckdatacksumreads before: 38368           2720          660
  ckdatacksumreads after:  38024 (-0.9%)   2624 (-3.5%)  660 (+0.0%)

The stack hot-path has changed significantly since then, with many
functions adopting LFS3_NOINLINE to get off the stack hot-path. Not sure
if that's related.

I'm also starting to think LFS3_FORCEINLINE is a symptom of
over-optimization. We shouldn't be doing the compilers job, if it can't
figure out the best inlining strategy so be it.
2025-07-16 14:20:26 -05:00
Christopher Haster
dbad3e6863 Prefer lfs3_data_slice over LFS3_DATA_SLICE macro
Maybe it's because they are relatively new, but compound literals seem
to do more harm then good.

I'm still keeping the LFS3_DATA_SLICE macro around in case it's useful
(for tests?), but now prefering lfs3_data_slice where possible.

---

This doesn't really impact the default build, but it saves a big chunk
of code/stack when compiling with LFS3_CKDATACKSUMREADS:

                            code          stack          ctx
  before:                  36956           2392          652
  after:                   36960 (+0.0%)   2392 (+0.0%)  652 (+0.0%)

                            code          stack          ctx
  ckdatacksumreads before: 38576           2744          660
  ckdatacksumreads after:  38368 (-0.5%)   2720 (-0.9%)  660 (+0.0%)

LFS3_CKDATACKSUMREADS adds cksize/cksum fields to lfs3_data_t, so it's
very sensitive lfs3_data_t function changes.

Though to be far, at 5-words, lfs3_data_t really shouldn't be a
pass-by-value struct. We only keep lfs3_data_t a pass-by-value struct
because LFS3_CKDATACKSUMREADS is low-priority/best-effort and it would
make the codebase a mess.
2025-07-16 14:16:07 -05:00
Christopher Haster
bf3078b7bd Dropped LFS3_DATA_TRUNCATE/FRUNCATE
These can be accomplished with LFS3_DATA_SLICE, and I think the
TRUNCATE/FRUNCATE variants just muddy things and make the math harder to
read.

LFS3_DATA_TRUNCATE is already basically a noop. The only non-trivial
transformation is LFS3_DATA_FRUNCATE, and LFS3_DATA_FRUNCATE is the
confusing one.

---

I have no idea why _removing_ code is adding so much stack. This needs
investigation:

           code          stack          ctx
  before: 36944           2384          652
  after:  36956 (+0.0%)   2392 (+0.3%)  652 (+0.0%)
2025-07-16 14:01:43 -05:00
Christopher Haster
7b7dbae1df Simplified fragment coalescing bounds logic
I think these were copied from the initial fragment slice calculation,
but we're already checking for <=fragment_size, so the extra lfs3_min is
unnecessary.

Saves a bit of code:

           code          stack          ctx
  before: 36952           2376          652
  after:  36944 (-0.0%)   2384 (+0.3%)  652 (+0.0%)

Not sure why this added stack, compiler noise?
2025-07-16 13:59:34 -05:00
Christopher Haster
2d10a61732 Reverted bptr -> bptr_ in mtree traverse/gc functions
This was missed when reverting the trailing underscores_ in other
unconditional out-pointers.

The trailing underscore now just hints at the parameter being an
out-pointer, optionality is no longer implied.
2025-07-16 12:53:53 -05:00
Christopher Haster
55cc661283 Tweaked LFS3_DBGRBYDBALANCE, adopted lfs3_rheight_t
This tweaks LFS3_DBGRBYDBALANCE to be a bit less intrusive, by putting
the relevant heights in the single lfs3_rheight_t struct.

Also added ifdefs to lfs3_rbyd_lookupnext_ just to make it clear this
code is opt-in.

No code changes.
2025-07-16 12:50:09 -05:00
Christopher Haster
7c1fe0f199 btree: Tried to better deduplicate split commit building logic
This may have changed during some refactor, but we can reuse the entire
right branch logic, and at least deduplicate the lfs3_data_frombranch
call on the left branch.

Saves a nice bit of code:

           code          stack          ctx
  before: 37020           2392          652
  after:  36952 (-0.2%)   2376 (-0.7%)  652 (+0.0%)

Also deduplicating the lfs3_data_t allocations saved stack, though that
is more concerning than anything else...

Also adopted l/r_buf names in lfs3_bcommit_t. This better matches names
in lfs3_file_graft_ and elsewhere.
2025-07-15 21:44:27 -05:00
Christopher Haster
3e47304e9b btree: Adopted LFS3_ERR_EXIST for terminating at shrubs
A bit of an abuse of this error code, but this is more explicit than the
previous rattr_count > 0 condition.

Forgetting to set rattr_count=0 on a normal exit has introduced bugs
before.

---

Though I'm not sure why this adds code. Somehow, _removing_ the
rattr_count=0 statements when lfs3_btree_commit_ collapses the root
added code?

           code          stack          ctx
  before: 36996           2392          652
  after:  37020 (+0.1%)   2392 (+0.0%)  652 (+0.0%)

Seriously, add bcommit->rattr_count = 0 to lfs3_btree_commit_ and the
lfs3_btree_commit_'s code cost shrinks by 8 bytes. Is the compiler
hiding stuff in bcommit?

I'm just going to chalk this up to compiler noise for now...
2025-07-15 20:53:26 -05:00
Christopher Haster
6d003543d8 btree: Moved internal commit state into new lfs3_bcommit_t struct
This somewhat replaces lfs3_bctx_t. Really lfs3_bctx_t consumed the
previously separate bid, rattr, and rattr_count out-pointers and
underwent a slight name change. The previous contents of lfs3_bctx_t are
all available under bcommit.ctx, with some minor tweaks.

The main motivation for this was to get rid of the mess that was the
bid/rattr out-pointers. They represent a side-channel of internal btree
state that is probably better implemented as a single struct.

Hopefully this makes the logic of lfs3_btree_commit_ callers -- and
expected action on non-zero rattr_count -- more obvious.

---

Some other tweaks:

- Separated ctx.buf into bcommit.ctx.branch_l_buf/branch_r_buf.

  I realized this informs the compiler that the lfs3_data_frombranch
  calls should not overflow.

  This may need to be reverted if we ever commit different data types in
  lfs3_btree_commit_, but that's not the end of the world. Right now
  this is bound to whatever split needs (2 branches + name).

- Added rattr_count <= rattrs assert after each btree commit builder.

  These asserts were just adopted after the btree code was written. The
  extra safeguards are good to have in case of future refactor.

Shaves off a bit more code/stack while also (hopefully) improving code
readability:

           code          stack          ctx
  before: 37048           2416          652
  after:  36996 (-0.1%)   2392 (-1.0%)  652 (+0.0%)
2025-07-15 20:52:55 -05:00
Christopher Haster
794bd3df61 btree: Slightly tweaked lfs3_btree_commit_'s internal gotos
This moves the default recurse logic (previously the commit label) back
up before the compact/relocate/split/merge branches.

I know the general rule is to try to limit gotos to foward jumps, but in
this case, placing the default recurse logic at the end of
lfs3_btree_commit_ disrupts the default "happy" path and makes
refactoring more difficult than it needs to be.

Contextually, the default recurse logic is a part of the default commit
logic, and split, merge, etc, are exceptional branches that just happen
to sometimes converge.

---

I think the real problem is that all of the gotos in lfs3_btree_commit_
are modeling mutually recursive functions, but in a context where we
can't actually recurse.

_Technically_, it is possible to transform any tail-recursive function
into loops and if statements (structured program theorem), but doing so
risks significant code duplication. We could duplicate this recurse
logic everywhere it's needed for example. But this is also something we
want to avoid in littlefs.

So goto soup it is.

---

Some code changes, but probably just compiler noise:

           code          stack          ctx
  before: 37052           2416          652
  after:  37048 (-0.0%)   2416 (+0.0%)  652 (+0.0%)

I also added some more informative-only labels now that we've adopted
-Wno-unused-label. These are useful for documenting independent chunks
of logic in a large function like this, and as debugging targets.
2025-07-15 20:52:44 -05:00
Christopher Haster
0364ed5011 attr: Fixed custom attrs overflowing rattr.count
Not sure how this was missed. The whole tradeoff of shrinking
rattr.count was that by default lfs3_rattr_t would take up less space,
but user-provided buffers would need an indirect lfs3_data_t to support
arbitrary buffer sizes.

This managed to scrape by with a 16-bit count (15-bit really), but
fortunately failed test_attrs_fattr_resync_receive with an 8-bit count.
And only barely! 256 is the smallest possible custom attr that
overflows.

I guess a point towards making internal limitation as tight as possible
to catch mistakes like these earlier.

---

Added test_attrs_setattr_big and test_attrs_fattr_big to catch this in
the future.

Note that while this added some code, stack is unaffected. This is
because custom attribute handling is off the hot-path, which is why the
lfs3_rattr_t -> lfs3_rattr_t+lfs3_data_t split is worth it:

           code          stack          ctx
  before: 37016           2416          652
  after:  37052 (+0.1%)   2416 (+0.0%)  652 (+0.0%)
2025-07-15 16:50:11 -05:00
Christopher Haster
5b0ec8090a Adopted rattr.from for simpler appendrattr_ lazy encoding
This breaks down the previously 16-bit rattr.count field into two 8-bit
rattr.from and rattr.count fields. Now, instead of using a mixture of
rattr.tag and sign(rattr.count) to determine rattr encoding, we just
jump based on rattr.from:

  lfs3_rattr_t:
  .---+---+---+---.
  |  tag  |frm|cnt| -+-> 16-bit tag   - on-disk encoding + rbyd flags
  +---+---+---+---+  +->  8-bit from  - in-RAM encoding
  |     weight    |  '->  8-bit count - from-specific count
  +---+---+---+---+
  |      ptr      |
  '---+---+---+---'

The internal appendrattr_ ctx also saw a bit of rework, and now uses a
big union with multiple buffers instead of stacking a ridiculous number
of LFS_MAX calls. Expanding the LFS_MAX stack grows O(n^2), so this is
probably good for compile times.

And all rattr.from branches now generate an lfs3_data_t*. This was
already a side-effect of all the internal lfs3_data_from* functions, and
it simplifies the tail end of appendrattr_. No more relying on
data_count's sign bit.

Also rearranged rattr.from encoders to match source code order.

---

Unfortunately, while this did simplify the source code, it didn't really
lead to much improvement in code size:

           code          stack          ctx
  before: 37024           2416          652
  after:  37016 (-0.0%)   2416 (+0.0%)  652 (+0.0%)

I guess jump tables are more a performance optimization than a code size
one. That and the benefit of cheaper appendrattr_ logic is likely
overshadowed by the extra constants needed to populate rattr.from in
every LFS3_RATTR_* macro.

Also test_attrs_fattr_resync_receive is now failing, but I think that's
just because of an unrelated bug exposed by the shrinking count field.
In theory rattr.count should be limited to internal fixed-size buffers.
2025-07-15 16:50:11 -05:00
Christopher Haster
0bed3867d8 Adopted more single-char field names
Limited to nested struct fields where the names don't really matter:

- bptr.data -> bptr.d
- mdir.rbyd -> mdir.r

Ok it actually just ended up those two.

This is on the tail end of some optimization work that ended up
abandoned because of maintainability concerns. But it did highlight that
struct nesting gets a bit out-of-control when trying to both optimize
stack allocations and respect C99's strict aliasing.

Consider further fragmenting lfs3_rbyd_t for fine-grain stack
allocations:

  typedef struct lfs3_rbyd {
      struct lfs3_rtrunkcksum {
          struct lfs3_rtrunk {
              lfs3_rid_t weight;
              struct lfs3_rtrunktrunk {
                  lfs3_block_t blocks[2];
                  lfs3_size_t trunk;
              } rtrunktrunk;
          } rtrunk;
          uint32_t cksum;
      } rtrunkcksum;
      lfs3_size_t eoff;
  } lfs3_rbyd_t;

Accessing fields just starts to get silly:

  rbyd.rtrunkcksum.rtrunk.trunktrunk.trunk

At least single-char field names keeps a little bit of readability:

  rbyd.ck.t.t.trunk

Or for some real examples:

- file->b.o.mdir.rbyd.weight -> file->b.o.mdir.r.weight
- bptr->data.u.disk.block -> bptr->d.u.disk.block
2025-07-15 16:50:06 -05:00
Christopher Haster
29e1701964 scripts: gdb: Globbed all dbg scripts into dbg.gdb.py
This goes ahead and makes all dbg scripts available in dbg.gdb.py, via
the magic of globbing __file__ relative, and dynamic python class
generation.

Probably one of the more evil scripts I've written, but this means we
don't need to worry about dbg.gdb.py falling out-of-date when adding new
dbg scripts.

Not all of the dbg scripts are useful inside gdb, but most of them are.
After all, what's cooler than this!

  (gdb) dbgrbyd -b4096 "disk" -t \
          file->b.shrub.blocks[0] \
          --trunk lfs3_rbyd_trunk(&file->b.shrub)
  rbyd 0x46.23a w2048, rev 00000000, size 629, cksum 8f5169e1
  00000004:           .->     0-334 data w335 0
  00000009:         .-+->       335 data w1 1                  71
  0000000e:         | .->       336 data w1 1                  67
  00000013:       .-+-+->       337 data w1 1                  66
  ...
  00000144: | | | |   .->       350 data w1 1                  74
  0000019a: | | | | .-+->       351 data w1 1                  78
  000001f5: | | | | | .->   352-739 data w388 1                76
  00000258: +-+-+-+-+-+->  740-2047 data w1308 1               6c

Note some tricks to help interact with bash and gdb:

- Flags are passed as is (-b4096, -t, --trunk)
- All non-flags are parsed as expressions (file->b.shrub.blocks[0])
- String expressions may be useful for paths and stuff ("./disk")
2025-07-04 18:55:46 -05:00
Christopher Haster
090611af14 scripts: dbgflags.py: Tweaked internals for readability
Mainly just using 'P_NAME' instead of 'P', 'NAME' in the FLAGS table,
every bit of horizontal spacing helps with these definitions.
2025-07-04 18:08:11 -05:00
Christopher Haster
19747f691e scripts: dbgflags.py: Reimplemented filters as flags
So instead of:

  $ ./scripts/dbgflags.py o 0x10000003

The filter is now specified as a normal(ish) argparse flag:

  $ ./scripts/dbgflags.py --o 0x10000003

This is a bit easier to interop with in dbg.gdb.py, and I think a bit
more readable.

Though -a and --a now do _very_ different things. I'm sure that won't
confuse anyone...
2025-07-04 18:08:11 -05:00
Christopher Haster
0c19a68536 scripts: test.py/bench.py: Added support for multiple header files
Like test.py --gdb-script, being able to specify multiple header files
seems useful and is easy enough to add.

---

Note that the default is only used if no other header files are
specified, so this _replaces_ the default header file:

  $ ./scripts/test.py --include=my_header.h

If you don't want to replace the default header file, you currently need
to specify it explicitly:

  $ ./scripts/test.py \
        --include=runners/test_runner.h \
        --include=my_header.h
2025-07-04 18:08:11 -05:00
Christopher Haster
0b804c092b scripts: gdb: Added some useful GDB scripts to test.py --gdb
These just invoke the existing dbg*.py python scripts, but allow quick
references to variables in the debugginged process:

  (gdb) dbgflags o file->b.o.flags
  LFS3_O_RDWR    0x00000002  Open a file as read and write
  LFS3_o_REG     0x10000000  Type = regular-file
  LFS3_o_UNSYNC  0x01000000  File's metadata does not match disk

Quite neat and useful!

This works by injecting dbg.gdb.py via gdb -x, which includes the
necessary python hooks to add these commands to gdb. This can be
overridden/extended with test.py/bench.py's --gdb-script flag.

Currently limited to scripts that seem the most useful for process
internals:

- dbgerr - Decode littlefs error codes
- dbgflags - Decode littlefs flags
- dbgtag - Decode littlefs tags
2025-07-04 18:08:04 -05:00
Christopher Haster
b700c8c819 Dropped fragmenting blocks > 1 fragment
So we now keep blocks around until they can be replaced with a single
fragment. This is simpler, cheaper, and reduces the number of commits
needed to graft (though note arbitrary range removals still keep this
unbounded).

---

So, this is a delicate tradeoff.

On one hand, not fully fragmenting blocks risks keeping around bptrs
containing very little data, depending on fragment_size.

On the other hand:

- It's expensive, and disk utilization during random _deletes_ is not
  the biggest of concerns.

  Note our crystallization algorithm should still clean up partial
  blocks _eventually_, so this doesn't really impact random writes.
  The main concerns are lfs3_file_truncate/fruncate, and in the future
  collapserange/punchhole.

- Fragmenting bptrs introduces more commits, which have their own
  prog/erase cost, and it's unclear how this impacts logging operations.

  There's no point in fragmenting blocks at the head of a log if we're
  going to fruncate them eventually.

I figure lets err on minimizing complexity/code size for now, and if
this turns out to be a mistake, we can always revert or introduce
fragmenting >1 fragment blocks as an optional feature in the future.

---

Saves a big chunk of code, stack, and even some ctx (no more
fragment_thresh):

           code          stack          ctx
  before: 37504           2448          656
  after:  37024 (-1.3%)   2416 (-1.3%)  652 (-0.6%)
2025-07-03 19:46:18 -05:00
Christopher Haster
3f2e8b53c5 Manually inlined lfs3_file_crystallize into lfs3_file_flush_
This was the main culprit behind our stack increase. Inlining
lfs3_file_crystallize into lfs3_file_flush_ adds a bit of code, but as a
tradeoff:

- Keeps all lfs3_file_crystallization_ calls at the same abstraction
  level, which is generally easier to reason about and avoids issues
  with things like lfs3_alloc_ckpoints.

- Makes some low-level interactions, such as LFS3_o_UNCRYST masking,
  more obvious.

- Reduces the stack hot-path by the cost of lfs3_file_flush_

Saves some stack at a code cost:

               code          stack          ctx
  before:     37492           2464          656
  after:      37504 (+0.0%)   2448 (-0.6%)  656 (+0.0%)

Now that the dust has settled a bit, we can also compare the lazy
grafting vs lazy crystallization builds:

               code          stack          ctx
  lazy-graft: 38020           2456          656
  lazycryst:  37504 (-1.4%)   2448 (-0.3%)  656 (+0.0%)
2025-07-03 18:55:28 -05:00
Christopher Haster
35e407372c Adopted similar mark-if-truncate-to-zero logic for file caches
It worked well for file leaves, so we might as well adopt the same
post-truncate/fruncate logic for caches.

This moves checks for cache.size==0 from lfs3_file_write into
lfs3_file_truncate/fruncate.

Note that lfs3_file_truncate/fruncate are the only functions (for now)
that can reduce the size of a file.

Adds a bit of code, which is probably why this wasn't adopted earlier,
but it reduces the state we need to worry about and makes things easier
to understand:

           code          stack          ctx
  before: 37468           2464          656
  after:  37492 (+0.1%)   2464 (+0.0%)  656 (+0.0%)
2025-07-03 18:55:20 -05:00
Christopher Haster
8365b27dea Reworked lfs3_file_truncate/fruncate to simplify crystallize
Now that we don't need to worry about losing data due to ungrafted
state, we can decide whether or not to discard leaves after
truncate/fruncate.

This simplifies lfs3_file_truncate/fruncate (and makes them much more
readable as a plus), but also lets us simplify lfs3_file_crystallize
since we no longer need to worry about implicit flushes.

lfs3_file_crystallize's call sites:

- lfs3_file_flush_ - We've already committed to flushing, so
  opportunistically clearing LFS3_o_UNFLUSH has no effect.

  lfs3_file_flush_'s logic should already take advantage of possible
  flushes anyways.

- lfs3_file_flush - We only call lfs3_file_crystallize _after_
  lfs3_file_flush_, so this has no effect.

This saves a bit more code and stack:

           code          stack          ctx
  before: 37588           2472          656
  after:  37468 (-0.3%)   2464 (-0.3%)  656 (+0.0%)
2025-07-03 18:46:32 -05:00
Christopher Haster
e443af800b Adopted compiler friendly generalized lfs3_file_crystallize_ API
Seeing as the generalized lfs3_file_crystallize_ API had a much lower
cost than I thought, we might as well keep it around a bit longer.

Though I at least tweaked it to hopefully be easier for compilers to
optimize: By accepting crystal_min=-1 as an alias for
crystal_min=crystal_max, compilers should always by able to const
propagate this.

---

Note sure why this still adds 8 bytes of code, it just looks like
compiler noise in lfs3_file_crystallize__? Is the LFS3_NOINLINE
attribute messing with compiler optimizations?

           code          stack          ctx
  before: 37580           2472          656
  after:  37588 (+0.0%)   2472 (+0.0%)  656 (+0.0%)
2025-07-03 18:39:43 -05:00
Christopher Haster
d6f332fa9f Dropped the generalized lfs3_file_crystallize_ API
Eventually the generalized crystallize API may be useful again for the
"eager crystallization" write strategy, but the codebase has drifted
apart enough already that this will require some reimplementation
anyways (review the commit history!).

Might as well clean up API weirdness we're not using.

Saves surprisingly little code. I guess the compiler was able to
optimize out the duplicated args once the logic was a bit simpler?

           code          stack          ctx
  before: 37588           2472          656
  after:  37580 (-0.0%)   2472 (+0.0%)  656 (+0.0%)
2025-07-03 18:07:32 -05:00
Christopher Haster
a85f08cfe3 Dropped lazy grafting, but kept lazy crystallization
This merges LFS3_o_GRAFT into LFS3_o_UNCRYST, simplifying the file write
path and avoiding the mess that is ungrafted leaves.

---

This goes for a different lazy crystallization/grafting strategy that
was overlooked before. Instead of requiring all leaves to be both
crystallized and grafted, we allow leaves to be uncrystallied, but they
_must_ be grafted (in-tree) at all times.

This gets us most of the rewrite preformance of lazy-crystallization,
without needing to worry about out-of-date file leaves.

Out-of-date file leaves were a headache for both code cost and concerns
around confusing filesystem states and related bugs.

Note LFS3_o_UNCRYST gets some extra behavior here:

- LFS3_o_UNCRYST indicates when crystallization is _necessary_, and no
  longer when crystallization is _possible_.

  We already keep track of when crystallization is _possible_ via bptr's
  erased-state, and this lets us control recrystallization in
  lfs3_file_flush_ without erased-state-clearing hacks (which probably
  wouldn't work with the future ddtree).

- We opportunistically clear the UNCRYST flag if it's not possible for
  future lfs3_file_crystallize_ calls to make progress:
  - When we crystallize a full block
  - When we hit the end of the file
  - When we hit a hole
  - When we hit an unaligned block

---

Note this does impact performance!

Unlike true lazy grafting, eagerly grafting means we're always
committing to the bshrub/btree more than is strictly necessary, and this
translates to more frequent btree node erases/compactions.

Current simulated benchmarks show a ~3x increase (~20us -> ~60us) in
write times for linear file writes on NOR flash.

However:

- The moment you need unaligned progs, this performance optimization
  goes out the window, as we need to graft bptrs before any padding
  fragments.

- This only kicks in once we start crystallizing. So any writes <
  crystal_thresh (both in new files and in between blocks) are forced
  to commit to the bshrub/btree every flush.

  This risks a difficult to predict performance characteristic.

- If you sync frequently (logging), we're forced to crystallize/graft
  anyways.

- The performance hit can be alleviated with either larger writes or
  larger caches, though I realize this goes against littlefs's
  "RAM-not-required" mantra.

Worst case, we can always bring back "lazy grafting" as a
high-performance option in the future.

Though note the above concerns around in-between/pre crystallization
performance. This may only make sense when cache_size >= both prog_size
and crystal_thresh.

And of course, there's a significant code tradeoff!

           code          stack          ctx
  before: 38020           2456          656
  after:  37588 (-1.1%)   2472 (+0.7%)  656 (+0.0%)

Uh, ignore that stack cost. The simplified logic leads to more functions
being inlined, which makes a mess of our stack measurements because we
don't take shrinkwrapping into account.
2025-07-03 18:04:18 -05:00
Christopher Haster
eb884011ec Reworked the read path to use a single flush
The motivation for this comes from the observation that lfs3_file_flush
already implies lfs3_file_crystallize, so most of the time the isuncryst
check in lfs3_file_readnext is useless.

We _do_ hit the isuncryst check when bypassing the cache, but the
situation where we bypass the cache, on a read-write file, _and_ can
avoid crystallization, seems too niche to care about.

So this reworks lfs3_file_read to prevent cache bypassing until pending
data is at least crystallized. This mirrors how we force flushing in
lfs3_file_write.

lfs3_file_read:

        |<------------------------------------------------------------.
        v                                                             |
  data in cache? --> read from cache -------------------------------->|
        | n       y                                                   |
        v                                                             |
  data in btree? --> crystallized? --> bypass? --> read from disk --->|
        | n       y        | n      y     | n   y                     |
        |                  |              v                           |
        |                  |           flushed? --> read into cache ->|
        |                  |              | n    y                    |
        |                  |              v                           |
        |                  '-------> flush cache -------------------->|
        v                                                             |
  fill with zeros ----------------------------------------------------'

lfs3_file_write:

     |<------------------------------------.
     v                                     |
  flushed? --> bypass? --> write to disk ->|
     | n    y     | n   y                  |
     |            v                        |
     |         move cache                  |
     v            v                        |
  aligned? --> write into cache ---------->|
     | n    y                              |
     v                                     |
  flush cache -----------------------------'

---

As a part of the rework, I also manually inlined lfs3_file_readnext into
lfs3_file_readget_. This duplicates some logic (not code cost!), but
helps clean up some of the ifdef soup in lfs3_file_readnext.

I also tried to refactor lfs3_file_readnext to better match
lfs3_file_read and lfs3_file_write's logic, but I'm not it actually
gained us anything.

lfs3_file_readnext:

       |<----------------------------.
       v                             |
  data in leaf? --> read from leaf   |
       | n       y        |          |
       v                  v          |
  data in hole? --> fill with zeros  |
       | n       y        |          |
       v                  |          |
  fetch leaf -------------|----------'
                          v
                        done!

Saves a bit of code:

           code          stack          ctx
  before: 38060           2456          656
  after:  38020 (-0.1%)   2456 (+0.0%)  656 (+0.0%)

Also likely eliminates lfs3_file_readnext from ever becoming the stack
hot-path again.
2025-07-03 15:54:05 -05:00
Christopher Haster
b6a36e75cf Limited graft traversal scope to lfs3_alloc
This drops the LFS3_TSTATE_GRAFT state for just explicitly iterating
over graft state in lfs3_alloc. This is cheaper as long as lfs3_alloc is
the only traversal we trigger while grafting.

We already rely on the lfs3_alloc-specific behavior of never touching
cksize/cksum fields anyways.

Note both lfs3_alloc_markinuse and lfs3_alloc_markinuse_ already have
multiple call sites and can't be inlined due to lookahead population in
lfs3_mtree_gc. We also don't need to worry about graft state there as
incremental traversals only make progress when bshrubs are at rest.

Saves a bit of code:

                 code          stack          ctx
  before:       38092           2456          656
  after:        38060 (-0.1%)   2456 (+0.0%)  656 (+0.0%)

  before graft: 37936           2456          636
  after graft:  38060 (+0.3%)   2456 (+0.0%)  656 (+3.1%)

Actually, surprisingly little code, but anything that simplifies
lfs3_mtree_traverse_ is welcome.
2025-07-01 14:19:48 -05:00
Christopher Haster
1bf2a4b520 Fixed grafting allocator checkpoint hole
This was quite a deep bug.

We don't track the original bshrub when grafting, so it was possible to
realloc those blocks even when we need their contents to finish the
graft operation.

This was found while experimenting with eager leaf grafting, but can
also occur when grafting data fragments.

---

In theory, the block allocator's checkpoint mechanism protects against
this.

Before we alloc, we set a checkpoint with lfs3_alloc_ckpoint. This marks
the position of the block allocator before allocation, so if we loop
around the entire block device we don't double alloc any in-flight
blocks:

                     ckpoint      lookahead
                        v         .---'---.
  [mm---ddd-d---d-------|dd--d-ddd|--------d-----d-]
                         '---.---'
                    in-flight allocations

But this only protects _new_ blocks, _old_ blocks can be anywhere on
disk and are unprotected.

In theory again, old blocks are always tracked via copy-on-write
snapshots, but this is not the case for bshrubs while grafting!

Grafting is unfortunately a multi-commit operation (we may remove
multiple fragments that span different btree nodes), and each bshrub
commit discards the old snapshot. This creates a window where old blocks
can be double alloced _while grafting_, leading to corrupted data.

You may wonder why are we discarding the old snapshot? Why not keep
track of it until the grafting completes?

The problem there is that we need the intermediate snapshot in order for
shrubs to survive compactions. We really have 3 states:

  old -> mid-graft -> new

And the only one we don't need to fallback to is the old state.

---

A couple solutions:

1. Track all three states

   This would add complexity increase the cost of every lfs3_file_t.

2. Open a temporary file to track the old state

   This would add complexity and a big chunk of stack to what is already
   one of the critical functions on our stack hot-path.

3. Carefully make sure graft commits don't lose track of in-flight data
   until an atomic commit

   This doesn't work when you're trying to coalesce two data fragments
   in two different btree nodes. At least not without completely
   restructuring the btree commit logic.

4. Just explicitly track in-flight graft state out-of-band

This goes with option no 4., adding lfs3->graft and lfs3->graft_count to
track in-flight graft state when we're grafting. lfs3_mtree_traverse_
can include the relevant blocks during traversals, effectively masking
out graft state from the lookahead buffer.

This adds a bit of code/ctx, but is probably the cheapest option:

           code          stack          ctx
  before: 37936           2456          636
  after:  38092 (+0.4%)   2456 (+0.0%)  656 (+3.1%)
2025-07-01 14:02:45 -05:00
Christopher Haster
13fbd2f006 Slightly reworked btree staging in lfs3_btree_commit_
This applies the same pattern of taking both the old + staging btree as
arguments to try to avoid redundant stack allocations.

Extra appealing is being able to reuse the staging shrubs in bshrubs for
btree commits.

However, it doesn't work out so well for the btree logic:

           code          stack          ctx
  before: 37936           2424          636
  after:  37936 (+0.0%)   2456 (+1.3%)  636 (+0.0%)

A couple reasons:

- Passing staging references limits what the compiler can optimize,
  compilers aren't great at cross-function optimization

- These staging references push struct allocation upwards, which risks
  pushing them onto the stack hot-path.

  Gah, again this is likely not a real issue, just a failure of our
  tooling to take stack shrinkwrapping into account.

- The extra arguments adds stack overhead to the call frame. It's just
  one word, but this can add up.

I should probably revert this, but I'm going to keep it around for a
bit:

- It's only 32 bytes (1 rbyd + 1 pointer + compiler noise). Is 32 bytes
  enough to really care about?

- I'm not sure how much weight to put into our stack measurements at the
  moment. They don't take shrinkwrapping into account that create a
  weird bias.

- This internal API better conveys how it behaves w.r.t. atomic updates
  and errors.

- The API may also lead to better stack usage in the future.
2025-07-01 14:02:40 -05:00
Christopher Haster
8ee08a5b89 Slightly reworked mdir staging in in lfs3_mdir_commit_
I've noticed a common pattern where we tend to create copies in multiple
function frames in order to allow fallback in case of errors. This risks
redundant stack allocations across layers.

To avoid this, this commit adopts old + staging arguments for most of
the internal mdir commit functions:

  static int lfs3_mdir_commit_(lfs3_t *lfs3,
          lfs3_mdir_t *mdir_, lfs3_mdir_t *mdir,
          ...);

We already needed this for lfs3_mdir_compact__, so hey, points for
consistency.

Saves a tiny bit of code:

           code          stack          ctx
  before: 37964           2424          636
  after:  37936 (-0.1%)   2424 (+0.0%)  636 (+0.0%)
2025-07-01 13:59:06 -05:00
Christopher Haster
4747477057 Tweaked lfs3_btree/bshrub_traverse to include weight
Not sure why we weren't already, it doesn't really make sense to return
bid without weight, and this matches lfs3_btree/bshrub_lookupnext.

Sure we don't need weight currently, but this is useful to include in
case we need it in the future (lfs3_bptr_fetch during traversal?).

And while we're not using it, the compiler is happy to optimize it out,
so no code changes:

           code          stack          ctx
  before: 37964           2424          636
  after   37964 (+0.0%)   2424 (+0.0%)  636 (+0.0%)
2025-06-28 19:08:42 -05:00
Christopher Haster
10c0a60ced Tried to dedup bptr/data fetching
Like the bshrub/btree dedup, this add lfs3_bptr_fetch to help dedup
bptr/data fetching.

The original plan was to eliminate bptrs from lfs3_file_lookupnext and
lfs3_file_traverse, and just return tagged data like the other
lookup/traverse functions. But this didn't work out very well. We return
arbitrary attrs from lfs3_file_traverse, so all this would've
accomplished is making every lfs3_file_lookupnext call messier.

But I think I'm still going to keep lfs3_bptr_fetch around as it
provides a nice place to deduplicate some other bits of logic:

- It makes sense to limit bptrs to compressed weights here, as opposed
  to the somewhat arbitrary lfs3_file_lookupnext function.

- And it would be a bit silly to not put the bptr's LFS3_CKFETCHES logic
  in lfs3_bptr_fetch.

  This may fetch more than previously (during crystallization pokes?),
  but better safe than sorry. LFS3_CKFETCHES will likely be a relatively
  niche feature anyways.

As for lfs3_file_traverse, I got rid of it completely.

We already have special logic in lfs3_mtree_traverse_ and lfs3_file_ck
for bptrs anyways, since bptrs, unlike data fragments, reference actual
blocks. And this disentangles lfs3_mtree_traverse_ from the file APIs,
which was a bit of an awkward design.

---

This adds a bit of code to the default build, but I think it's worth it
for the better code organization:

                     code          stack          ctx
  before:           37896           2424          636
  after:            37964 (+0.2%)   2424 (+0.0%)  636 (+0.0%)

It also saves some code in LFS3_CKFETCHES mode, thanks to deduping all
the fetch ckfetches fetch checkhes:

                     code          stack          ctx
  ckfetches before: 38144           2464          636
  ckfetches after:  38072 (-0.2%)   2472 (+0.3%)  636 (+0.0%)
2025-06-28 18:50:57 -05:00
Christopher Haster
d2847f5f0e Deduped bshrub/btree fetching
This adds lfs3_bshrub_fetch to better deduplicate the common pattern of
fetching either a bshrub or btree based on tag.

The API ends up a bit funny because of how mdirs are attached to
specific mids. All we need is the relevant mdir object, and we can do a
single masked mdir lookup to find any bshrubs/btrees.

Saves a little bit of code:

           code          stack          ctx
  before: 37920           2424          636
  after:  37896 (-0.1%)   2424 (+0.0%)  636 (+0.0%)

Also flipped around some lfs3_data_read* parameters to better match
common tag+weight+data ordering in lfs3_*_lookup functions.
2025-06-28 18:50:29 -05:00
Christopher Haster
f39f2812af Renamed lfs3_file_readonce/flushonce_ -> readget_/flushset_
This just makes the purpose of these functions a bit more clear, and
matches LFS3_o_WRSET.
2025-06-27 14:14:36 -05:00
Christopher Haster
2ebb8a301b Attempted better allocator checkpoints
This tries to call lfs3_alloc_ckpoint in more correct positions, and
fixes a bug where we _never_ called lfs3_alloc_ckpoint before
finishing crystallization in lfs3_file_readnext and
lfs3_file_truncate/fruncate:

- lfs3_file_crystallize now implicitly calls lfs3_alloc_ckpoint before
  both finishing crystallization and grafting.

- lfs3_file_flush_ and lfs3_file_flushonce_ now call lfs3_alloc_ckpoint
  at the beginning of each loop iteration.

  This may be redundant on some iterations but that's ok.

- lfs3_file_write does _not_ call lfs3_alloc_ckpoint, this is all
  handled in lfs3_file_flush_ now.

- lfs3_file_truncate/fruncate still call lfs3_alloc_ckpoint, but just
  before lfs3_file_graft.

  This matches the lfs3_alloc_ckpoint pattern used for most
  lfs3_mdir_commit calls, i.e. checkpoint just before to make it easier
  to audit the logic.

- Also moved the pre-fragment crystallization out of the fragment loop,
  we should only crystallize once and this makes the code a bit more
  readable.

  I think this is the source of the extra 8 bytes of stack, but that's
  small enough to consider compiler noise.

It's not the biggest problem to not call lfs3_alloc_ckpoint everytime
all blocks are at rest, but it does risk a premature ENOSPC error when
it's still possible to make progress.

This gets more complicated with lazy crystallization/grafting, as block
allocations can end up deferred to operations you might not expect
(lfs3_file_read for example).

Adds a bit of code, but is in theory more correct:

           code          stack          ctx
  before: 37888           2416          636
  after:  37920 (+0.1%)   2424 (+0.3%)  636 (+0.0%)
2025-06-27 13:26:45 -05:00
Christopher Haster
8cc81aef7d scripts: Adopt __get__ binding for write/writeln methods
This actually binds our custom write/writeln functions as methods to the
file object:

  def writeln(self, s=''):
      self.write(s)
      self.write('\n')
  f.writeln = writeln.__get__(f)

This doesn't really gain us anything, but is a bit more correct and may
be safer if other code messes with the file's internals.
2025-06-27 12:56:03 -05:00
Christopher Haster
8b6e51d54e Fixed assert with branches in lfs3_file_traverse_
This was modified incorrectly for LFS3_2BONLY. We do actually end up
with non-bptr non-data tags here when we encounter btree inner nodes.

Code changes:

           code          stack          ctx
  before: 37864           2416          636
  after:  37888 (+0.1%)   2416 (+0.0%)  636 (+0.0%)
2025-06-26 07:26:17 -05:00
Christopher Haster
d183a88c58 Fixed uninit warning, gave up on err < 0 compiler guidance
In lfs3_mdir_namelookup, when compiling with LFS3_2BLOCK, there was an
uninitialized variable warning that just wouldn't go away (temporarily
disabled with the x=x hack).

So, giving up on the err < 0 compiler guidance since it apparently
doesn't work. Instead lfs3_rbyd_namelookup and lfs3_btree_namelookupleaf
unconditionally initialize the problematic variables before their main
loops.

This adds a bit of code, but fighting the compiler just isn't worth the
headache:

           code          stack          ctx
  before: 37836           2416          636
  after:  37864 (+0.1%)   2416 (+0.0%)  636 (+0.0%)
2025-06-26 07:26:17 -05:00
Christopher Haster
ccfc74a547 Added LFS3_2BONLY for a small 2-block configuration
Like LFS3_RDONLY and LFS3_KVONLY, LFS3_2BONLY opts-out of all of the
logic necessary for filesystems larger than 2-blocks (the mimimum size
of a mutable littlefs image).

This has potential for some pretty big savings:

- No block allocation
- No lookahead buffer
- No btrees (but yes bshrubs)
- No bptrs
- No mtree traversal

Which is I guess ~1/4 of the codebase:

            code           stack           ctx
  default: 37836            2416           636
  2bonly:  27704 (-26.8%)   1872 (-22.5%)  592 (-6.9%)

This can be combined with LFS3_KVONLY for a small key-value store
compatible with the full littlefs driver:

                  code           stack           ctx
  default:       37836            2416           636
  kvonly:        30792 (-18.6%)   2168 (-10.3%)  636 (+0.0%)
  kvonly+2bonly: 22900 (-39.5%)   1736 (-28.1%)  592 (-6.9%)

It may be possible to optimize this further, but, as is the case with
LFS3_KVONLY, balancing config-specific optimization vs maintainability
is tricky.

---

I'm not sure why, but this also reduced the default build's size a bit.
Compiler noise?

           code          stack          ctx
  before: 37860           2416          636
  after:  37836 (-0.1%)   2416 (+0.0%)  636 (+0.0%)
2025-06-26 07:22:47 -05:00
Christopher Haster
2c27c61f25 kv: Added LFS3_KVONLY to opt-out of advanced file operations
One of the ideas behind the key-value API is that it is potentially much
cheaper than a full file API. With the key-value API, we get the
guarantee that all data must fit in RAM, and avoid headaches like
random reads/writes and needing to broadcast file state.

For an example of just how much complexity is avoided, the see the
difference between lfs3_file_flushonce_ vs the mess that is
lfs3_file_flush_ + lfs3_file_crystallize + lfs3_file_graft.

However, littlefs is designed around files, and a couple design
decisions hold back how much code saving is possible:

1. littlefs's shrubs are designed around being enrolled in the omdir
   linked-list, so internally we still have most of the file open/close
   code lumbering around.

2. Directories and traversals still exist, so we'd need the omdir
   linked-list anyways, and we still need to broadcast _some_ changes.

3. Despite being intended for small amounts of data, lfs3_set/get can
   still be used to create arbitrarily large files. So we still need all
   of the bshrub/btree logic.

   Which we still need for the mtree anyways, so this isn't really that
   much of a downside.

It also may be possible to save more code by aggressively rewriting the
_entire_ read/write path for lfs3_set/get, to not reuse any of the
existing file logic in LFS3_KVONLY mode. But I decided against this due
to concerns around maintainability.

The duplicate lfs3_file_read + lfs3_file_readonce and lfs3_file_flush_ +
lfs3_file_flushonce_ are already enough of a concern.

Anyways, here's LFS3_KVONLY:

                  code           stack           ctx
  default:       37824            2416           636
  kvonly:        30936 (-18.2%)   2168 (-10.3%)  636 (+0.0%)

LFS3_RDONLY + LFS3_KVONLY is also interesting:

                  code           stack           ctx
  rdonly:        10776             856           508
  rdonly+kvonly:  9904 (-8.1%)     888 (+3.7%)   508 (+0.0%)

---

This also added some noise to the default build's code, mainly due to
tweaks in lfs3_file_readnext to allow better reuse in LFS3_KVONLY:

           code          stack          ctx
  before: 37824           2416          636
  after:  37860 (+0.1%)   2416 (+0.0%)  636 (+0.0%)
2025-06-24 16:14:02 -05:00
Christopher Haster
213dba6f6d scripts: test.py/bench.py: Added ifndef attribute for tests/benches
As you might expect, this is the inverse of ifdef, and is useful for
supporting opt-out flags.

I don't think ifdef + ifndef is powerful enough to handle _all_
compile-time corner cases, but they at least provide convenient handling
for the most common flags. Worst case, tests/benches can always include
explicit #if/#ifdef/#ifndef statements in the code itself.
2025-06-24 15:17:04 -05:00
Christopher Haster
db1f941e90 Slightly reworked lfs3_file_opencfg's mid reservation path
And tried to more consistently use lfs3_path_namelen.

In a perfect world we would just use lfs3_path_namelen everywhere and
let the compiler figure it out, but unfortunately this leads to poor
code generation in some places, even with __attribute__((pure)) hacks.

Code changes:

           code          stack          ctx
  before: 37832           2416          636
  after:  37824 (-0.0%)   2416 (+0.0%)  636 (+0.0%)
2025-06-24 15:16:55 -05:00
Christopher Haster
1b76bd04ce kv: Some minor file cache_buffer tweaks
- Unconditionally pass buffer as cache_buffer in lfs3_set now that we
  rely on LFS3_o_WRSET

- Swapped true -> 1 for non-null don't-care buffer pointer

Saved one instruction as expected for the conditional assignment, but
added a bit of stack. Weird, but probably just compiler noise:

           code          stack          ctx
  before: 37836           2408          636
  after:  37832 (-0.0%)   2416 (+0.3%)  636 (+0.0%)
2025-06-22 15:55:14 -05:00
Christopher Haster
e7c7a81cfe Revisited zero-length file sync path
This needed a second pass. Changes:

- Small file flushes are no longer limited to LFS3_o_UNFLUSH, which
  should avoid bshrubs/btrees being written for small files with
  complicated seek+writes. Now, any file small enough is converted
  to a small file when we would need to flush.

  This does _not_ flush small unsync files that don't need to be
  flushed, though I'm not exactly sure how that would happen (broadcast
  from file with a different cache size?)

  I think this was a regression from previous logic.

- discardbshrub/discardbleaf moved into lfs3_file_sync_, otherwise
  we risk discarding the bshrub/bleaf without setting UNSYNC.

  This keeps all the state changing logic together.

- We now use lfs3_file_size_ == 0 as the decision for committing bnulls.

  size_ == 0 implies bnull, and this avoids the extra headache of
  checking for pending small file flush.

Note the ultimate decision on if the file is small is still left up to
lfs3_file_sync. lfs3_file_sync_ just relies on the UNFLUSH + UNCRYST +
UNGRAFT checks to do the last minute small file flush (aside from
asserts).

The UNFLUSH + UNCRYST + UNGRAFT checks look a bit messy, but keep in
mind these optimize to a single bitmask.

Saves a tiny bit of code:

           code          stack          ctx
  before: 37856           2416          636
  after:  37836 (-0.1%)   2408 (-0.3%)  636 (+0.0%)
2025-06-22 15:37:53 -05:00
Christopher Haster
7a6aad3cc8 Cleaned up potential lfs3_mdir_commit dedup TODOs
Unfortunately neither of these were actually deduplicatable:

1. We can't easily move dir update logic into lfs3_mdir_commit, because
   lfs3_mdir_commit has no knowledge of the current did.

   Maybe we can add did-related nudge functions, but the logic would
   still need to be external to lfs3_mdir_commit. lfs3_mdir_commit only
   understands mids.

2. lfs3_alloc_ckpoint continues to be enticing, but fortunately a
   previous commit reminded me that we explicitly need to _not_ call
   lfs3_alloc_ckpoint before the lfs3_mdir_commit in
   lfs3_bshrub_commitroot_.

   In theory we could add lfs3_mdir_commit and lfs3_mdir_commit_ to
   make lfs3_alloc_ckpoint opt-out, but the lfs3_mdir_commit is already
   a bit of a mess. And maybe keeping the lfs3_alloc_ckpoint calls
   explicit is a good thing. It's better to ENOSPC than double alloc a
   block.
2025-06-22 15:37:47 -05:00
Christopher Haster
2d39a7e9c5 make: Adopted consistent codemap dimensions
Tweaked: 1400x750 -> 1125x525 (1.5x codemapsvg.py's default)

This is now derived (1.5x) from the default dimensions in codemapsvg.py.
This matches the dimensions that ended up used for the preliminary v3
benchmarks, which are a bit more convenient on devices with smaller
screens.

As for where the 750x350 resolution came from, I'm not entirely sure.
Maybe a random Matplotlib example? It approximates a 2:1 aspect ratio
but with 25 pixels carved out for margins.

Note we like wide aspect ratios over pretty aspect ratios like 16:9,
golden ratio, etc, here:

1. We often cram things into the margins (legends, stack usage, etc)

2. English text is much wider than it is tall (this commit message has
   an aspect ration of ~3:1), so wider aspect ratios help readability
2025-06-22 15:37:40 -05:00
Christopher Haster
d6a713f147 make: ctags: Limited prototype tags to header files
Jumping to prototypes in header files is extremely useful, because
that's usually where all the documentation is. But jumping to prototypes
in C files is a bit much. These are usually just uncomment definitions
to keep the compiler happy, and make navigation a bit of a pain.

Unfortunately it doesn't seem like ctags supports per-file-type tag
kinds (at least I couldn't find it in the documentation), but running
ctags twice with the --append flag seems to work.
2025-06-22 15:37:35 -05:00
Christopher Haster
f967cad907 kv: Adopted LFS3_o_WRSET for better key-value API integration
This adds LFS3_o_WRSET as an internal-only 3rd file open mode (I knew
that missing open mode would come in handy) that has some _very_
interesting behavior:

- Do _not_ clear the configured file cache. The file cache is prefilled
  with the file's data.

- If the file does _not_ exist and is small, create it immediately in
  lfs3_file_open using the provided file cache.

- If the file _does_ exist or is not small, do nothing and open the file
  normally. lfs3_file_close/sync can do the rest of the work in one
  commit.

This makes it possible to implement one-commit lfs3_set on top of the
file APIs with minimal code impact:

- All of the metadata commit logic can be handled by lfs3_file_sync_, we
  just call lfs3_file_sync_ with the found did+name in lfs3_file_opencfg
  when WRSET.

- The invariant that lfs3_file_opencfg always reserves an mid remains
  intact, since we go ahead and write the full file if necessary,
  minimizing the impact on lfs3_file_opencfg's internals.

This claws back most of the code cost of the one-commit key-value API:

              code          stack          ctx
  before:    38232           2400          636
  after:     37856 (-1.0%)   2416 (+0.7%)  636 (+0.0%)

  before kv: 37352           2280          636
  after kv:  37856 (+1.3%)   2416 (+6.0%)  636 (+0.0%)

---

I'm quite happy how this turned out. I was worried there for a bit the
key-value API was going to end up an ugly wart for the internals, but
with LFS3_o_WRSET this integrates quite nicely.

It also raises a really interesting question, should LFS3_o_WRSET be
exposed to users?

For now I'm going to play it safe and say no. While potentially useful,
it's still a pretty unintuitive API.

Another thing worth mentioning is that this does have a negative impact
on compile-time gc. Duplication adds code cost when viewing the system
as a whole, but tighter integration can backfire if the user never calls
half the APIs.

Oh well, compile-time opt-out is always an option in the future, and
users seem to care more about pre-linked measurements, probably because
it's an easier thing to find. Still, it's funny how measuring code can
have a negative impact on code. Something something Goodhart's law.
2025-06-22 15:37:07 -05:00
Christopher Haster
92844cce3e kv: Added *_set_zero and *_set_null tests
These are high-risk corner cases for the key-value API, so we should
test them.

At one point I was relying on an optional buffer parameter in
lfs3_file_sync_, but that would have broken if lfs3_set's buffer was
NULL.
2025-06-22 15:36:53 -05:00