littlefs

Author	SHA1	Message	Date
Christopher Haster	0828fd9bf3	Reverted LFS3_CKDATACKSUMREADS -> LFS3_CKDATACKSUMS LFS3_CKDATACKSUMREADS is just too much. The downside is it may not be clear how LFS3_CKDATACKSUMREADS interacts with the future planned LFS3_CKREADS (LFS3_CKREADS implies LFS3_CKDATACKSUMS + LFS3_CKMETAREDUND), but on the flip side you may actually be able to type LFS3_CKDATACKSUMS on the first try.	2025-07-16 14:25:20 -05:00
Christopher Haster	17cefcdd42	Dropped LFS3_FORCEINLINE from lfs3_data_slice This use to save code/stack, but apparently not anymore: code stack ctx before: 36960 2392 652 after: 36936 (-0.1%) 2384 (-0.3%) 652 (+0.0%) code stack ctx ckdatacksumreads before: 38368 2720 660 ckdatacksumreads after: 38024 (-0.9%) 2624 (-3.5%) 660 (+0.0%) The stack hot-path has changed significantly since then, with many functions adopting LFS3_NOINLINE to get off the stack hot-path. Not sure if that's related. I'm also starting to think LFS3_FORCEINLINE is a symptom of over-optimization. We shouldn't be doing the compilers job, if it can't figure out the best inlining strategy so be it.	2025-07-16 14:20:26 -05:00
Christopher Haster	dbad3e6863	Prefer lfs3_data_slice over LFS3_DATA_SLICE macro Maybe it's because they are relatively new, but compound literals seem to do more harm then good. I'm still keeping the LFS3_DATA_SLICE macro around in case it's useful (for tests?), but now prefering lfs3_data_slice where possible. --- This doesn't really impact the default build, but it saves a big chunk of code/stack when compiling with LFS3_CKDATACKSUMREADS: code stack ctx before: 36956 2392 652 after: 36960 (+0.0%) 2392 (+0.0%) 652 (+0.0%) code stack ctx ckdatacksumreads before: 38576 2744 660 ckdatacksumreads after: 38368 (-0.5%) 2720 (-0.9%) 660 (+0.0%) LFS3_CKDATACKSUMREADS adds cksize/cksum fields to lfs3_data_t, so it's very sensitive lfs3_data_t function changes. Though to be far, at 5-words, lfs3_data_t really shouldn't be a pass-by-value struct. We only keep lfs3_data_t a pass-by-value struct because LFS3_CKDATACKSUMREADS is low-priority/best-effort and it would make the codebase a mess.	2025-07-16 14:16:07 -05:00
Christopher Haster	bf3078b7bd	Dropped LFS3_DATA_TRUNCATE/FRUNCATE These can be accomplished with LFS3_DATA_SLICE, and I think the TRUNCATE/FRUNCATE variants just muddy things and make the math harder to read. LFS3_DATA_TRUNCATE is already basically a noop. The only non-trivial transformation is LFS3_DATA_FRUNCATE, and LFS3_DATA_FRUNCATE is the confusing one. --- I have no idea why _removing_ code is adding so much stack. This needs investigation: code stack ctx before: 36944 2384 652 after: 36956 (+0.0%) 2392 (+0.3%) 652 (+0.0%)	2025-07-16 14:01:43 -05:00
Christopher Haster	7b7dbae1df	Simplified fragment coalescing bounds logic I think these were copied from the initial fragment slice calculation, but we're already checking for <=fragment_size, so the extra lfs3_min is unnecessary. Saves a bit of code: code stack ctx before: 36952 2376 652 after: 36944 (-0.0%) 2384 (+0.3%) 652 (+0.0%) Not sure why this added stack, compiler noise?	2025-07-16 13:59:34 -05:00
Christopher Haster	2d10a61732	Reverted bptr -> bptr_ in mtree traverse/gc functions This was missed when reverting the trailing underscores_ in other unconditional out-pointers. The trailing underscore now just hints at the parameter being an out-pointer, optionality is no longer implied.	2025-07-16 12:53:53 -05:00
Christopher Haster	55cc661283	Tweaked LFS3_DBGRBYDBALANCE, adopted lfs3_rheight_t This tweaks LFS3_DBGRBYDBALANCE to be a bit less intrusive, by putting the relevant heights in the single lfs3_rheight_t struct. Also added ifdefs to lfs3_rbyd_lookupnext_ just to make it clear this code is opt-in. No code changes.	2025-07-16 12:50:09 -05:00
Christopher Haster	7c1fe0f199	btree: Tried to better deduplicate split commit building logic This may have changed during some refactor, but we can reuse the entire right branch logic, and at least deduplicate the lfs3_data_frombranch call on the left branch. Saves a nice bit of code: code stack ctx before: 37020 2392 652 after: 36952 (-0.2%) 2376 (-0.7%) 652 (+0.0%) Also deduplicating the lfs3_data_t allocations saved stack, though that is more concerning than anything else... Also adopted l/r_buf names in lfs3_bcommit_t. This better matches names in lfs3_file_graft_ and elsewhere.	2025-07-15 21:44:27 -05:00
Christopher Haster	3e47304e9b	btree: Adopted LFS3_ERR_EXIST for terminating at shrubs A bit of an abuse of this error code, but this is more explicit than the previous rattr_count > 0 condition. Forgetting to set rattr_count=0 on a normal exit has introduced bugs before. --- Though I'm not sure why this adds code. Somehow, _removing_ the rattr_count=0 statements when lfs3_btree_commit_ collapses the root added code? code stack ctx before: 36996 2392 652 after: 37020 (+0.1%) 2392 (+0.0%) 652 (+0.0%) Seriously, add bcommit->rattr_count = 0 to lfs3_btree_commit_ and the lfs3_btree_commit_'s code cost shrinks by 8 bytes. Is the compiler hiding stuff in bcommit? I'm just going to chalk this up to compiler noise for now...	2025-07-15 20:53:26 -05:00
Christopher Haster	6d003543d8	btree: Moved internal commit state into new lfs3_bcommit_t struct This somewhat replaces lfs3_bctx_t. Really lfs3_bctx_t consumed the previously separate bid, rattr, and rattr_count out-pointers and underwent a slight name change. The previous contents of lfs3_bctx_t are all available under bcommit.ctx, with some minor tweaks. The main motivation for this was to get rid of the mess that was the bid/rattr out-pointers. They represent a side-channel of internal btree state that is probably better implemented as a single struct. Hopefully this makes the logic of lfs3_btree_commit_ callers -- and expected action on non-zero rattr_count -- more obvious. --- Some other tweaks: - Separated ctx.buf into bcommit.ctx.branch_l_buf/branch_r_buf. I realized this informs the compiler that the lfs3_data_frombranch calls should not overflow. This may need to be reverted if we ever commit different data types in lfs3_btree_commit_, but that's not the end of the world. Right now this is bound to whatever split needs (2 branches + name). - Added rattr_count <= rattrs assert after each btree commit builder. These asserts were just adopted after the btree code was written. The extra safeguards are good to have in case of future refactor. Shaves off a bit more code/stack while also (hopefully) improving code readability: code stack ctx before: 37048 2416 652 after: 36996 (-0.1%) 2392 (-1.0%) 652 (+0.0%)	2025-07-15 20:52:55 -05:00
Christopher Haster	794bd3df61	btree: Slightly tweaked lfs3_btree_commit_'s internal gotos This moves the default recurse logic (previously the commit label) back up before the compact/relocate/split/merge branches. I know the general rule is to try to limit gotos to foward jumps, but in this case, placing the default recurse logic at the end of lfs3_btree_commit_ disrupts the default "happy" path and makes refactoring more difficult than it needs to be. Contextually, the default recurse logic is a part of the default commit logic, and split, merge, etc, are exceptional branches that just happen to sometimes converge. --- I think the real problem is that all of the gotos in lfs3_btree_commit_ are modeling mutually recursive functions, but in a context where we can't actually recurse. _Technically_, it is possible to transform any tail-recursive function into loops and if statements (structured program theorem), but doing so risks significant code duplication. We could duplicate this recurse logic everywhere it's needed for example. But this is also something we want to avoid in littlefs. So goto soup it is. --- Some code changes, but probably just compiler noise: code stack ctx before: 37052 2416 652 after: 37048 (-0.0%) 2416 (+0.0%) 652 (+0.0%) I also added some more informative-only labels now that we've adopted -Wno-unused-label. These are useful for documenting independent chunks of logic in a large function like this, and as debugging targets.	2025-07-15 20:52:44 -05:00
Christopher Haster	0364ed5011	attr: Fixed custom attrs overflowing rattr.count Not sure how this was missed. The whole tradeoff of shrinking rattr.count was that by default lfs3_rattr_t would take up less space, but user-provided buffers would need an indirect lfs3_data_t to support arbitrary buffer sizes. This managed to scrape by with a 16-bit count (15-bit really), but fortunately failed test_attrs_fattr_resync_receive with an 8-bit count. And only barely! 256 is the smallest possible custom attr that overflows. I guess a point towards making internal limitation as tight as possible to catch mistakes like these earlier. --- Added test_attrs_setattr_big and test_attrs_fattr_big to catch this in the future. Note that while this added some code, stack is unaffected. This is because custom attribute handling is off the hot-path, which is why the lfs3_rattr_t -> lfs3_rattr_t+lfs3_data_t split is worth it: code stack ctx before: 37016 2416 652 after: 37052 (+0.1%) 2416 (+0.0%) 652 (+0.0%)	2025-07-15 16:50:11 -05:00
Christopher Haster	5b0ec8090a	Adopted rattr.from for simpler appendrattr_ lazy encoding This breaks down the previously 16-bit rattr.count field into two 8-bit rattr.from and rattr.count fields. Now, instead of using a mixture of rattr.tag and sign(rattr.count) to determine rattr encoding, we just jump based on rattr.from: lfs3_rattr_t: .---+---+---+---. \| tag \|frm\|cnt\| -+-> 16-bit tag - on-disk encoding + rbyd flags +---+---+---+---+ +-> 8-bit from - in-RAM encoding \| weight \| '-> 8-bit count - from-specific count +---+---+---+---+ \| ptr \| '---+---+---+---' The internal appendrattr_ ctx also saw a bit of rework, and now uses a big union with multiple buffers instead of stacking a ridiculous number of LFS_MAX calls. Expanding the LFS_MAX stack grows O(n^2), so this is probably good for compile times. And all rattr.from branches now generate an lfs3_data_t. This was already a side-effect of all the internal lfs3_data_from functions, and it simplifies the tail end of appendrattr_. No more relying on data_count's sign bit. Also rearranged rattr.from encoders to match source code order. --- Unfortunately, while this did simplify the source code, it didn't really lead to much improvement in code size: code stack ctx before: 37024 2416 652 after: 37016 (-0.0%) 2416 (+0.0%) 652 (+0.0%) I guess jump tables are more a performance optimization than a code size one. That and the benefit of cheaper appendrattr_ logic is likely overshadowed by the extra constants needed to populate rattr.from in every LFS3_RATTR_* macro. Also test_attrs_fattr_resync_receive is now failing, but I think that's just because of an unrelated bug exposed by the shrinking count field. In theory rattr.count should be limited to internal fixed-size buffers.	2025-07-15 16:50:11 -05:00
Christopher Haster	0bed3867d8	Adopted more single-char field names Limited to nested struct fields where the names don't really matter: - bptr.data -> bptr.d - mdir.rbyd -> mdir.r Ok it actually just ended up those two. This is on the tail end of some optimization work that ended up abandoned because of maintainability concerns. But it did highlight that struct nesting gets a bit out-of-control when trying to both optimize stack allocations and respect C99's strict aliasing. Consider further fragmenting lfs3_rbyd_t for fine-grain stack allocations: typedef struct lfs3_rbyd { struct lfs3_rtrunkcksum { struct lfs3_rtrunk { lfs3_rid_t weight; struct lfs3_rtrunktrunk { lfs3_block_t blocks[2]; lfs3_size_t trunk; } rtrunktrunk; } rtrunk; uint32_t cksum; } rtrunkcksum; lfs3_size_t eoff; } lfs3_rbyd_t; Accessing fields just starts to get silly: rbyd.rtrunkcksum.rtrunk.trunktrunk.trunk At least single-char field names keeps a little bit of readability: rbyd.ck.t.t.trunk Or for some real examples: - file->b.o.mdir.rbyd.weight -> file->b.o.mdir.r.weight - bptr->data.u.disk.block -> bptr->d.u.disk.block	2025-07-15 16:50:06 -05:00
Christopher Haster	29e1701964	scripts: gdb: Globbed all dbg scripts into dbg.gdb.py This goes ahead and makes all dbg scripts available in dbg.gdb.py, via the magic of globbing __file__ relative, and dynamic python class generation. Probably one of the more evil scripts I've written, but this means we don't need to worry about dbg.gdb.py falling out-of-date when adding new dbg scripts. Not all of the dbg scripts are useful inside gdb, but most of them are. After all, what's cooler than this! (gdb) dbgrbyd -b4096 "disk" -t \ file->b.shrub.blocks[0] \ --trunk lfs3_rbyd_trunk(&file->b.shrub) rbyd 0x46.23a w2048, rev 00000000, size 629, cksum 8f5169e1 00000004: .-> 0-334 data w335 0 00000009: .-+-> 335 data w1 1 71 0000000e: \| .-> 336 data w1 1 67 00000013: .-+-+-> 337 data w1 1 66 ... 00000144: \| \| \| \| .-> 350 data w1 1 74 0000019a: \| \| \| \| .-+-> 351 data w1 1 78 000001f5: \| \| \| \| \| .-> 352-739 data w388 1 76 00000258: +-+-+-+-+-+-> 740-2047 data w1308 1 6c Note some tricks to help interact with bash and gdb: - Flags are passed as is (-b4096, -t, --trunk) - All non-flags are parsed as expressions (file->b.shrub.blocks[0]) - String expressions may be useful for paths and stuff ("./disk")	2025-07-04 18:55:46 -05:00
Christopher Haster	090611af14	scripts: dbgflags.py: Tweaked internals for readability Mainly just using 'P_NAME' instead of 'P', 'NAME' in the FLAGS table, every bit of horizontal spacing helps with these definitions.	2025-07-04 18:08:11 -05:00
Christopher Haster	19747f691e	scripts: dbgflags.py: Reimplemented filters as flags So instead of: $ ./scripts/dbgflags.py o 0x10000003 The filter is now specified as a normal(ish) argparse flag: $ ./scripts/dbgflags.py --o 0x10000003 This is a bit easier to interop with in dbg.gdb.py, and I think a bit more readable. Though -a and --a now do _very_ different things. I'm sure that won't confuse anyone...	2025-07-04 18:08:11 -05:00
Christopher Haster	0c19a68536	scripts: test.py/bench.py: Added support for multiple header files Like test.py --gdb-script, being able to specify multiple header files seems useful and is easy enough to add. --- Note that the default is only used if no other header files are specified, so this _replaces_ the default header file: $ ./scripts/test.py --include=my_header.h If you don't want to replace the default header file, you currently need to specify it explicitly: $ ./scripts/test.py \ --include=runners/test_runner.h \ --include=my_header.h	2025-07-04 18:08:11 -05:00
Christopher Haster	0b804c092b	scripts: gdb: Added some useful GDB scripts to test.py --gdb These just invoke the existing dbg*.py python scripts, but allow quick references to variables in the debugginged process: (gdb) dbgflags o file->b.o.flags LFS3_O_RDWR 0x00000002 Open a file as read and write LFS3_o_REG 0x10000000 Type = regular-file LFS3_o_UNSYNC 0x01000000 File's metadata does not match disk Quite neat and useful! This works by injecting dbg.gdb.py via gdb -x, which includes the necessary python hooks to add these commands to gdb. This can be overridden/extended with test.py/bench.py's --gdb-script flag. Currently limited to scripts that seem the most useful for process internals: - dbgerr - Decode littlefs error codes - dbgflags - Decode littlefs flags - dbgtag - Decode littlefs tags	2025-07-04 18:08:04 -05:00
Christopher Haster	b700c8c819	Dropped fragmenting blocks > 1 fragment So we now keep blocks around until they can be replaced with a single fragment. This is simpler, cheaper, and reduces the number of commits needed to graft (though note arbitrary range removals still keep this unbounded). --- So, this is a delicate tradeoff. On one hand, not fully fragmenting blocks risks keeping around bptrs containing very little data, depending on fragment_size. On the other hand: - It's expensive, and disk utilization during random _deletes_ is not the biggest of concerns. Note our crystallization algorithm should still clean up partial blocks _eventually_, so this doesn't really impact random writes. The main concerns are lfs3_file_truncate/fruncate, and in the future collapserange/punchhole. - Fragmenting bptrs introduces more commits, which have their own prog/erase cost, and it's unclear how this impacts logging operations. There's no point in fragmenting blocks at the head of a log if we're going to fruncate them eventually. I figure lets err on minimizing complexity/code size for now, and if this turns out to be a mistake, we can always revert or introduce fragmenting >1 fragment blocks as an optional feature in the future. --- Saves a big chunk of code, stack, and even some ctx (no more fragment_thresh): code stack ctx before: 37504 2448 656 after: 37024 (-1.3%) 2416 (-1.3%) 652 (-0.6%)	2025-07-03 19:46:18 -05:00
Christopher Haster	3f2e8b53c5	Manually inlined lfs3_file_crystallize into lfs3_file_flush_ This was the main culprit behind our stack increase. Inlining lfs3_file_crystallize into lfs3_file_flush_ adds a bit of code, but as a tradeoff: - Keeps all lfs3_file_crystallization_ calls at the same abstraction level, which is generally easier to reason about and avoids issues with things like lfs3_alloc_ckpoints. - Makes some low-level interactions, such as LFS3_o_UNCRYST masking, more obvious. - Reduces the stack hot-path by the cost of lfs3_file_flush_ Saves some stack at a code cost: code stack ctx before: 37492 2464 656 after: 37504 (+0.0%) 2448 (-0.6%) 656 (+0.0%) Now that the dust has settled a bit, we can also compare the lazy grafting vs lazy crystallization builds: code stack ctx lazy-graft: 38020 2456 656 lazycryst: 37504 (-1.4%) 2448 (-0.3%) 656 (+0.0%)	2025-07-03 18:55:28 -05:00
Christopher Haster	35e407372c	Adopted similar mark-if-truncate-to-zero logic for file caches It worked well for file leaves, so we might as well adopt the same post-truncate/fruncate logic for caches. This moves checks for cache.size==0 from lfs3_file_write into lfs3_file_truncate/fruncate. Note that lfs3_file_truncate/fruncate are the only functions (for now) that can reduce the size of a file. Adds a bit of code, which is probably why this wasn't adopted earlier, but it reduces the state we need to worry about and makes things easier to understand: code stack ctx before: 37468 2464 656 after: 37492 (+0.1%) 2464 (+0.0%) 656 (+0.0%)	2025-07-03 18:55:20 -05:00
Christopher Haster	8365b27dea	Reworked lfs3_file_truncate/fruncate to simplify crystallize Now that we don't need to worry about losing data due to ungrafted state, we can decide whether or not to discard leaves after truncate/fruncate. This simplifies lfs3_file_truncate/fruncate (and makes them much more readable as a plus), but also lets us simplify lfs3_file_crystallize since we no longer need to worry about implicit flushes. lfs3_file_crystallize's call sites: - lfs3_file_flush_ - We've already committed to flushing, so opportunistically clearing LFS3_o_UNFLUSH has no effect. lfs3_file_flush_'s logic should already take advantage of possible flushes anyways. - lfs3_file_flush - We only call lfs3_file_crystallize _after_ lfs3_file_flush_, so this has no effect. This saves a bit more code and stack: code stack ctx before: 37588 2472 656 after: 37468 (-0.3%) 2464 (-0.3%) 656 (+0.0%)	2025-07-03 18:46:32 -05:00
Christopher Haster	e443af800b	Adopted compiler friendly generalized lfs3_file_crystallize_ API Seeing as the generalized lfs3_file_crystallize_ API had a much lower cost than I thought, we might as well keep it around a bit longer. Though I at least tweaked it to hopefully be easier for compilers to optimize: By accepting crystal_min=-1 as an alias for crystal_min=crystal_max, compilers should always by able to const propagate this. --- Note sure why this still adds 8 bytes of code, it just looks like compiler noise in lfs3_file_crystallize__? Is the LFS3_NOINLINE attribute messing with compiler optimizations? code stack ctx before: 37580 2472 656 after: 37588 (+0.0%) 2472 (+0.0%) 656 (+0.0%)	2025-07-03 18:39:43 -05:00
Christopher Haster	d6f332fa9f	Dropped the generalized lfs3_file_crystallize_ API Eventually the generalized crystallize API may be useful again for the "eager crystallization" write strategy, but the codebase has drifted apart enough already that this will require some reimplementation anyways (review the commit history!). Might as well clean up API weirdness we're not using. Saves surprisingly little code. I guess the compiler was able to optimize out the duplicated args once the logic was a bit simpler? code stack ctx before: 37588 2472 656 after: 37580 (-0.0%) 2472 (+0.0%) 656 (+0.0%)	2025-07-03 18:07:32 -05:00
Christopher Haster	a85f08cfe3	Dropped lazy grafting, but kept lazy crystallization This merges LFS3_o_GRAFT into LFS3_o_UNCRYST, simplifying the file write path and avoiding the mess that is ungrafted leaves. --- This goes for a different lazy crystallization/grafting strategy that was overlooked before. Instead of requiring all leaves to be both crystallized and grafted, we allow leaves to be uncrystallied, but they _must_ be grafted (in-tree) at all times. This gets us most of the rewrite preformance of lazy-crystallization, without needing to worry about out-of-date file leaves. Out-of-date file leaves were a headache for both code cost and concerns around confusing filesystem states and related bugs. Note LFS3_o_UNCRYST gets some extra behavior here: - LFS3_o_UNCRYST indicates when crystallization is _necessary_, and no longer when crystallization is _possible_. We already keep track of when crystallization is _possible_ via bptr's erased-state, and this lets us control recrystallization in lfs3_file_flush_ without erased-state-clearing hacks (which probably wouldn't work with the future ddtree). - We opportunistically clear the UNCRYST flag if it's not possible for future lfs3_file_crystallize_ calls to make progress: - When we crystallize a full block - When we hit the end of the file - When we hit a hole - When we hit an unaligned block --- Note this does impact performance! Unlike true lazy grafting, eagerly grafting means we're always committing to the bshrub/btree more than is strictly necessary, and this translates to more frequent btree node erases/compactions. Current simulated benchmarks show a ~3x increase (~20us -> ~60us) in write times for linear file writes on NOR flash. However: - The moment you need unaligned progs, this performance optimization goes out the window, as we need to graft bptrs before any padding fragments. - This only kicks in once we start crystallizing. So any writes < crystal_thresh (both in new files and in between blocks) are forced to commit to the bshrub/btree every flush. This risks a difficult to predict performance characteristic. - If you sync frequently (logging), we're forced to crystallize/graft anyways. - The performance hit can be alleviated with either larger writes or larger caches, though I realize this goes against littlefs's "RAM-not-required" mantra. Worst case, we can always bring back "lazy grafting" as a high-performance option in the future. Though note the above concerns around in-between/pre crystallization performance. This may only make sense when cache_size >= both prog_size and crystal_thresh. And of course, there's a significant code tradeoff! code stack ctx before: 38020 2456 656 after: 37588 (-1.1%) 2472 (+0.7%) 656 (+0.0%) Uh, ignore that stack cost. The simplified logic leads to more functions being inlined, which makes a mess of our stack measurements because we don't take shrinkwrapping into account.	2025-07-03 18:04:18 -05:00
Christopher Haster	eb884011ec	Reworked the read path to use a single flush The motivation for this comes from the observation that lfs3_file_flush already implies lfs3_file_crystallize, so most of the time the isuncryst check in lfs3_file_readnext is useless. We _do_ hit the isuncryst check when bypassing the cache, but the situation where we bypass the cache, on a read-write file, _and_ can avoid crystallization, seems too niche to care about. So this reworks lfs3_file_read to prevent cache bypassing until pending data is at least crystallized. This mirrors how we force flushing in lfs3_file_write. lfs3_file_read: \|<------------------------------------------------------------. v \| data in cache? --> read from cache -------------------------------->\| \| n y \| v \| data in btree? --> crystallized? --> bypass? --> read from disk --->\| \| n y \| n y \| n y \| \| \| v \| \| \| flushed? --> read into cache ->\| \| \| \| n y \| \| \| v \| \| '-------> flush cache -------------------->\| v \| fill with zeros ----------------------------------------------------' lfs3_file_write: \|<------------------------------------. v \| flushed? --> bypass? --> write to disk ->\| \| n y \| n y \| \| v \| \| move cache \| v v \| aligned? --> write into cache ---------->\| \| n y \| v \| flush cache -----------------------------' --- As a part of the rework, I also manually inlined lfs3_file_readnext into lfs3_file_readget_. This duplicates some logic (not code cost!), but helps clean up some of the ifdef soup in lfs3_file_readnext. I also tried to refactor lfs3_file_readnext to better match lfs3_file_read and lfs3_file_write's logic, but I'm not it actually gained us anything. lfs3_file_readnext: \|<----------------------------. v \| data in leaf? --> read from leaf \| \| n y \| \| v v \| data in hole? --> fill with zeros \| \| n y \| \| v \| \| fetch leaf -------------\|----------' v done! Saves a bit of code: code stack ctx before: 38060 2456 656 after: 38020 (-0.1%) 2456 (+0.0%) 656 (+0.0%) Also likely eliminates lfs3_file_readnext from ever becoming the stack hot-path again.	2025-07-03 15:54:05 -05:00
Christopher Haster	b6a36e75cf	Limited graft traversal scope to lfs3_alloc This drops the LFS3_TSTATE_GRAFT state for just explicitly iterating over graft state in lfs3_alloc. This is cheaper as long as lfs3_alloc is the only traversal we trigger while grafting. We already rely on the lfs3_alloc-specific behavior of never touching cksize/cksum fields anyways. Note both lfs3_alloc_markinuse and lfs3_alloc_markinuse_ already have multiple call sites and can't be inlined due to lookahead population in lfs3_mtree_gc. We also don't need to worry about graft state there as incremental traversals only make progress when bshrubs are at rest. Saves a bit of code: code stack ctx before: 38092 2456 656 after: 38060 (-0.1%) 2456 (+0.0%) 656 (+0.0%) before graft: 37936 2456 636 after graft: 38060 (+0.3%) 2456 (+0.0%) 656 (+3.1%) Actually, surprisingly little code, but anything that simplifies lfs3_mtree_traverse_ is welcome.	2025-07-01 14:19:48 -05:00
Christopher Haster	1bf2a4b520	Fixed grafting allocator checkpoint hole This was quite a deep bug. We don't track the original bshrub when grafting, so it was possible to realloc those blocks even when we need their contents to finish the graft operation. This was found while experimenting with eager leaf grafting, but can also occur when grafting data fragments. --- In theory, the block allocator's checkpoint mechanism protects against this. Before we alloc, we set a checkpoint with lfs3_alloc_ckpoint. This marks the position of the block allocator before allocation, so if we loop around the entire block device we don't double alloc any in-flight blocks: ckpoint lookahead v .---'---. [mm---ddd-d---d-------\|dd--d-ddd\|--------d-----d-] '---.---' in-flight allocations But this only protects _new_ blocks, _old_ blocks can be anywhere on disk and are unprotected. In theory again, old blocks are always tracked via copy-on-write snapshots, but this is not the case for bshrubs while grafting! Grafting is unfortunately a multi-commit operation (we may remove multiple fragments that span different btree nodes), and each bshrub commit discards the old snapshot. This creates a window where old blocks can be double alloced _while grafting_, leading to corrupted data. You may wonder why are we discarding the old snapshot? Why not keep track of it until the grafting completes? The problem there is that we need the intermediate snapshot in order for shrubs to survive compactions. We really have 3 states: old -> mid-graft -> new And the only one we don't need to fallback to is the old state. --- A couple solutions: 1. Track all three states This would add complexity increase the cost of every lfs3_file_t. 2. Open a temporary file to track the old state This would add complexity and a big chunk of stack to what is already one of the critical functions on our stack hot-path. 3. Carefully make sure graft commits don't lose track of in-flight data until an atomic commit This doesn't work when you're trying to coalesce two data fragments in two different btree nodes. At least not without completely restructuring the btree commit logic. 4. Just explicitly track in-flight graft state out-of-band This goes with option no 4., adding lfs3->graft and lfs3->graft_count to track in-flight graft state when we're grafting. lfs3_mtree_traverse_ can include the relevant blocks during traversals, effectively masking out graft state from the lookahead buffer. This adds a bit of code/ctx, but is probably the cheapest option: code stack ctx before: 37936 2456 636 after: 38092 (+0.4%) 2456 (+0.0%) 656 (+3.1%)	2025-07-01 14:02:45 -05:00
Christopher Haster	13fbd2f006	Slightly reworked btree staging in lfs3_btree_commit_ This applies the same pattern of taking both the old + staging btree as arguments to try to avoid redundant stack allocations. Extra appealing is being able to reuse the staging shrubs in bshrubs for btree commits. However, it doesn't work out so well for the btree logic: code stack ctx before: 37936 2424 636 after: 37936 (+0.0%) 2456 (+1.3%) 636 (+0.0%) A couple reasons: - Passing staging references limits what the compiler can optimize, compilers aren't great at cross-function optimization - These staging references push struct allocation upwards, which risks pushing them onto the stack hot-path. Gah, again this is likely not a real issue, just a failure of our tooling to take stack shrinkwrapping into account. - The extra arguments adds stack overhead to the call frame. It's just one word, but this can add up. I should probably revert this, but I'm going to keep it around for a bit: - It's only 32 bytes (1 rbyd + 1 pointer + compiler noise). Is 32 bytes enough to really care about? - I'm not sure how much weight to put into our stack measurements at the moment. They don't take shrinkwrapping into account that create a weird bias. - This internal API better conveys how it behaves w.r.t. atomic updates and errors. - The API may also lead to better stack usage in the future.	2025-07-01 14:02:40 -05:00
Christopher Haster	8ee08a5b89	Slightly reworked mdir staging in in lfs3_mdir_commit_ I've noticed a common pattern where we tend to create copies in multiple function frames in order to allow fallback in case of errors. This risks redundant stack allocations across layers. To avoid this, this commit adopts old + staging arguments for most of the internal mdir commit functions: static int lfs3_mdir_commit_(lfs3_t lfs3, lfs3_mdir_t mdir_, lfs3_mdir_t *mdir, ...); We already needed this for lfs3_mdir_compact__, so hey, points for consistency. Saves a tiny bit of code: code stack ctx before: 37964 2424 636 after: 37936 (-0.1%) 2424 (+0.0%) 636 (+0.0%)	2025-07-01 13:59:06 -05:00
Christopher Haster	4747477057	Tweaked lfs3_btree/bshrub_traverse to include weight Not sure why we weren't already, it doesn't really make sense to return bid without weight, and this matches lfs3_btree/bshrub_lookupnext. Sure we don't need weight currently, but this is useful to include in case we need it in the future (lfs3_bptr_fetch during traversal?). And while we're not using it, the compiler is happy to optimize it out, so no code changes: code stack ctx before: 37964 2424 636 after 37964 (+0.0%) 2424 (+0.0%) 636 (+0.0%)	2025-06-28 19:08:42 -05:00
Christopher Haster	10c0a60ced	Tried to dedup bptr/data fetching Like the bshrub/btree dedup, this add lfs3_bptr_fetch to help dedup bptr/data fetching. The original plan was to eliminate bptrs from lfs3_file_lookupnext and lfs3_file_traverse, and just return tagged data like the other lookup/traverse functions. But this didn't work out very well. We return arbitrary attrs from lfs3_file_traverse, so all this would've accomplished is making every lfs3_file_lookupnext call messier. But I think I'm still going to keep lfs3_bptr_fetch around as it provides a nice place to deduplicate some other bits of logic: - It makes sense to limit bptrs to compressed weights here, as opposed to the somewhat arbitrary lfs3_file_lookupnext function. - And it would be a bit silly to not put the bptr's LFS3_CKFETCHES logic in lfs3_bptr_fetch. This may fetch more than previously (during crystallization pokes?), but better safe than sorry. LFS3_CKFETCHES will likely be a relatively niche feature anyways. As for lfs3_file_traverse, I got rid of it completely. We already have special logic in lfs3_mtree_traverse_ and lfs3_file_ck for bptrs anyways, since bptrs, unlike data fragments, reference actual blocks. And this disentangles lfs3_mtree_traverse_ from the file APIs, which was a bit of an awkward design. --- This adds a bit of code to the default build, but I think it's worth it for the better code organization: code stack ctx before: 37896 2424 636 after: 37964 (+0.2%) 2424 (+0.0%) 636 (+0.0%) It also saves some code in LFS3_CKFETCHES mode, thanks to deduping all the fetch ckfetches fetch checkhes: code stack ctx ckfetches before: 38144 2464 636 ckfetches after: 38072 (-0.2%) 2472 (+0.3%) 636 (+0.0%)	2025-06-28 18:50:57 -05:00
Christopher Haster	d2847f5f0e	Deduped bshrub/btree fetching This adds lfs3_bshrub_fetch to better deduplicate the common pattern of fetching either a bshrub or btree based on tag. The API ends up a bit funny because of how mdirs are attached to specific mids. All we need is the relevant mdir object, and we can do a single masked mdir lookup to find any bshrubs/btrees. Saves a little bit of code: code stack ctx before: 37920 2424 636 after: 37896 (-0.1%) 2424 (+0.0%) 636 (+0.0%) Also flipped around some lfs3_data_read* parameters to better match common tag+weight+data ordering in lfs3_*_lookup functions.	2025-06-28 18:50:29 -05:00
Christopher Haster	f39f2812af	Renamed lfs3_file_readonce/flushonce_ -> readget_/flushset_ This just makes the purpose of these functions a bit more clear, and matches LFS3_o_WRSET.	2025-06-27 14:14:36 -05:00
Christopher Haster	2ebb8a301b	Attempted better allocator checkpoints This tries to call lfs3_alloc_ckpoint in more correct positions, and fixes a bug where we _never_ called lfs3_alloc_ckpoint before finishing crystallization in lfs3_file_readnext and lfs3_file_truncate/fruncate: - lfs3_file_crystallize now implicitly calls lfs3_alloc_ckpoint before both finishing crystallization and grafting. - lfs3_file_flush_ and lfs3_file_flushonce_ now call lfs3_alloc_ckpoint at the beginning of each loop iteration. This may be redundant on some iterations but that's ok. - lfs3_file_write does _not_ call lfs3_alloc_ckpoint, this is all handled in lfs3_file_flush_ now. - lfs3_file_truncate/fruncate still call lfs3_alloc_ckpoint, but just before lfs3_file_graft. This matches the lfs3_alloc_ckpoint pattern used for most lfs3_mdir_commit calls, i.e. checkpoint just before to make it easier to audit the logic. - Also moved the pre-fragment crystallization out of the fragment loop, we should only crystallize once and this makes the code a bit more readable. I think this is the source of the extra 8 bytes of stack, but that's small enough to consider compiler noise. It's not the biggest problem to not call lfs3_alloc_ckpoint everytime all blocks are at rest, but it does risk a premature ENOSPC error when it's still possible to make progress. This gets more complicated with lazy crystallization/grafting, as block allocations can end up deferred to operations you might not expect (lfs3_file_read for example). Adds a bit of code, but is in theory more correct: code stack ctx before: 37888 2416 636 after: 37920 (+0.1%) 2424 (+0.3%) 636 (+0.0%)	2025-06-27 13:26:45 -05:00
Christopher Haster	8cc81aef7d	scripts: Adopt __get__ binding for write/writeln methods This actually binds our custom write/writeln functions as methods to the file object: def writeln(self, s=''): self.write(s) self.write('\n') f.writeln = writeln.__get__(f) This doesn't really gain us anything, but is a bit more correct and may be safer if other code messes with the file's internals.	2025-06-27 12:56:03 -05:00
Christopher Haster	8b6e51d54e	Fixed assert with branches in lfs3_file_traverse_ This was modified incorrectly for LFS3_2BONLY. We do actually end up with non-bptr non-data tags here when we encounter btree inner nodes. Code changes: code stack ctx before: 37864 2416 636 after: 37888 (+0.1%) 2416 (+0.0%) 636 (+0.0%)	2025-06-26 07:26:17 -05:00
Christopher Haster	d183a88c58	Fixed uninit warning, gave up on err < 0 compiler guidance In lfs3_mdir_namelookup, when compiling with LFS3_2BLOCK, there was an uninitialized variable warning that just wouldn't go away (temporarily disabled with the x=x hack). So, giving up on the err < 0 compiler guidance since it apparently doesn't work. Instead lfs3_rbyd_namelookup and lfs3_btree_namelookupleaf unconditionally initialize the problematic variables before their main loops. This adds a bit of code, but fighting the compiler just isn't worth the headache: code stack ctx before: 37836 2416 636 after: 37864 (+0.1%) 2416 (+0.0%) 636 (+0.0%)	2025-06-26 07:26:17 -05:00
Christopher Haster	ccfc74a547	Added LFS3_2BONLY for a small 2-block configuration Like LFS3_RDONLY and LFS3_KVONLY, LFS3_2BONLY opts-out of all of the logic necessary for filesystems larger than 2-blocks (the mimimum size of a mutable littlefs image). This has potential for some pretty big savings: - No block allocation - No lookahead buffer - No btrees (but yes bshrubs) - No bptrs - No mtree traversal Which is I guess ~1/4 of the codebase: code stack ctx default: 37836 2416 636 2bonly: 27704 (-26.8%) 1872 (-22.5%) 592 (-6.9%) This can be combined with LFS3_KVONLY for a small key-value store compatible with the full littlefs driver: code stack ctx default: 37836 2416 636 kvonly: 30792 (-18.6%) 2168 (-10.3%) 636 (+0.0%) kvonly+2bonly: 22900 (-39.5%) 1736 (-28.1%) 592 (-6.9%) It may be possible to optimize this further, but, as is the case with LFS3_KVONLY, balancing config-specific optimization vs maintainability is tricky. --- I'm not sure why, but this also reduced the default build's size a bit. Compiler noise? code stack ctx before: 37860 2416 636 after: 37836 (-0.1%) 2416 (+0.0%) 636 (+0.0%)	2025-06-26 07:22:47 -05:00
Christopher Haster	2c27c61f25	kv: Added LFS3_KVONLY to opt-out of advanced file operations One of the ideas behind the key-value API is that it is potentially much cheaper than a full file API. With the key-value API, we get the guarantee that all data must fit in RAM, and avoid headaches like random reads/writes and needing to broadcast file state. For an example of just how much complexity is avoided, the see the difference between lfs3_file_flushonce_ vs the mess that is lfs3_file_flush_ + lfs3_file_crystallize + lfs3_file_graft. However, littlefs is designed around files, and a couple design decisions hold back how much code saving is possible: 1. littlefs's shrubs are designed around being enrolled in the omdir linked-list, so internally we still have most of the file open/close code lumbering around. 2. Directories and traversals still exist, so we'd need the omdir linked-list anyways, and we still need to broadcast _some_ changes. 3. Despite being intended for small amounts of data, lfs3_set/get can still be used to create arbitrarily large files. So we still need all of the bshrub/btree logic. Which we still need for the mtree anyways, so this isn't really that much of a downside. It also may be possible to save more code by aggressively rewriting the _entire_ read/write path for lfs3_set/get, to not reuse any of the existing file logic in LFS3_KVONLY mode. But I decided against this due to concerns around maintainability. The duplicate lfs3_file_read + lfs3_file_readonce and lfs3_file_flush_ + lfs3_file_flushonce_ are already enough of a concern. Anyways, here's LFS3_KVONLY: code stack ctx default: 37824 2416 636 kvonly: 30936 (-18.2%) 2168 (-10.3%) 636 (+0.0%) LFS3_RDONLY + LFS3_KVONLY is also interesting: code stack ctx rdonly: 10776 856 508 rdonly+kvonly: 9904 (-8.1%) 888 (+3.7%) 508 (+0.0%) --- This also added some noise to the default build's code, mainly due to tweaks in lfs3_file_readnext to allow better reuse in LFS3_KVONLY: code stack ctx before: 37824 2416 636 after: 37860 (+0.1%) 2416 (+0.0%) 636 (+0.0%)	2025-06-24 16:14:02 -05:00
Christopher Haster	213dba6f6d	scripts: test.py/bench.py: Added ifndef attribute for tests/benches As you might expect, this is the inverse of ifdef, and is useful for supporting opt-out flags. I don't think ifdef + ifndef is powerful enough to handle _all_ compile-time corner cases, but they at least provide convenient handling for the most common flags. Worst case, tests/benches can always include explicit #if/#ifdef/#ifndef statements in the code itself.	2025-06-24 15:17:04 -05:00
Christopher Haster	db1f941e90	Slightly reworked lfs3_file_opencfg's mid reservation path And tried to more consistently use lfs3_path_namelen. In a perfect world we would just use lfs3_path_namelen everywhere and let the compiler figure it out, but unfortunately this leads to poor code generation in some places, even with __attribute__((pure)) hacks. Code changes: code stack ctx before: 37832 2416 636 after: 37824 (-0.0%) 2416 (+0.0%) 636 (+0.0%)	2025-06-24 15:16:55 -05:00
Christopher Haster	1b76bd04ce	kv: Some minor file cache_buffer tweaks - Unconditionally pass buffer as cache_buffer in lfs3_set now that we rely on LFS3_o_WRSET - Swapped true -> 1 for non-null don't-care buffer pointer Saved one instruction as expected for the conditional assignment, but added a bit of stack. Weird, but probably just compiler noise: code stack ctx before: 37836 2408 636 after: 37832 (-0.0%) 2416 (+0.3%) 636 (+0.0%)	2025-06-22 15:55:14 -05:00
Christopher Haster	e7c7a81cfe	Revisited zero-length file sync path This needed a second pass. Changes: - Small file flushes are no longer limited to LFS3_o_UNFLUSH, which should avoid bshrubs/btrees being written for small files with complicated seek+writes. Now, any file small enough is converted to a small file when we would need to flush. This does _not_ flush small unsync files that don't need to be flushed, though I'm not exactly sure how that would happen (broadcast from file with a different cache size?) I think this was a regression from previous logic. - discardbshrub/discardbleaf moved into lfs3_file_sync_, otherwise we risk discarding the bshrub/bleaf without setting UNSYNC. This keeps all the state changing logic together. - We now use lfs3_file_size_ == 0 as the decision for committing bnulls. size_ == 0 implies bnull, and this avoids the extra headache of checking for pending small file flush. Note the ultimate decision on if the file is small is still left up to lfs3_file_sync. lfs3_file_sync_ just relies on the UNFLUSH + UNCRYST + UNGRAFT checks to do the last minute small file flush (aside from asserts). The UNFLUSH + UNCRYST + UNGRAFT checks look a bit messy, but keep in mind these optimize to a single bitmask. Saves a tiny bit of code: code stack ctx before: 37856 2416 636 after: 37836 (-0.1%) 2408 (-0.3%) 636 (+0.0%)	2025-06-22 15:37:53 -05:00
Christopher Haster	7a6aad3cc8	Cleaned up potential lfs3_mdir_commit dedup TODOs Unfortunately neither of these were actually deduplicatable: 1. We can't easily move dir update logic into lfs3_mdir_commit, because lfs3_mdir_commit has no knowledge of the current did. Maybe we can add did-related nudge functions, but the logic would still need to be external to lfs3_mdir_commit. lfs3_mdir_commit only understands mids. 2. lfs3_alloc_ckpoint continues to be enticing, but fortunately a previous commit reminded me that we explicitly need to _not_ call lfs3_alloc_ckpoint before the lfs3_mdir_commit in lfs3_bshrub_commitroot_. In theory we could add lfs3_mdir_commit and lfs3_mdir_commit_ to make lfs3_alloc_ckpoint opt-out, but the lfs3_mdir_commit is already a bit of a mess. And maybe keeping the lfs3_alloc_ckpoint calls explicit is a good thing. It's better to ENOSPC than double alloc a block.	2025-06-22 15:37:47 -05:00
Christopher Haster	2d39a7e9c5	make: Adopted consistent codemap dimensions Tweaked: 1400x750 -> 1125x525 (1.5x codemapsvg.py's default) This is now derived (1.5x) from the default dimensions in codemapsvg.py. This matches the dimensions that ended up used for the preliminary v3 benchmarks, which are a bit more convenient on devices with smaller screens. As for where the 750x350 resolution came from, I'm not entirely sure. Maybe a random Matplotlib example? It approximates a 2:1 aspect ratio but with 25 pixels carved out for margins. Note we like wide aspect ratios over pretty aspect ratios like 16:9, golden ratio, etc, here: 1. We often cram things into the margins (legends, stack usage, etc) 2. English text is much wider than it is tall (this commit message has an aspect ration of ~3:1), so wider aspect ratios help readability	2025-06-22 15:37:40 -05:00
Christopher Haster	d6a713f147	make: ctags: Limited prototype tags to header files Jumping to prototypes in header files is extremely useful, because that's usually where all the documentation is. But jumping to prototypes in C files is a bit much. These are usually just uncomment definitions to keep the compiler happy, and make navigation a bit of a pain. Unfortunately it doesn't seem like ctags supports per-file-type tag kinds (at least I couldn't find it in the documentation), but running ctags twice with the --append flag seems to work.	2025-06-22 15:37:35 -05:00
Christopher Haster	f967cad907	kv: Adopted LFS3_o_WRSET for better key-value API integration This adds LFS3_o_WRSET as an internal-only 3rd file open mode (I knew that missing open mode would come in handy) that has some _very_ interesting behavior: - Do _not_ clear the configured file cache. The file cache is prefilled with the file's data. - If the file does _not_ exist and is small, create it immediately in lfs3_file_open using the provided file cache. - If the file _does_ exist or is not small, do nothing and open the file normally. lfs3_file_close/sync can do the rest of the work in one commit. This makes it possible to implement one-commit lfs3_set on top of the file APIs with minimal code impact: - All of the metadata commit logic can be handled by lfs3_file_sync_, we just call lfs3_file_sync_ with the found did+name in lfs3_file_opencfg when WRSET. - The invariant that lfs3_file_opencfg always reserves an mid remains intact, since we go ahead and write the full file if necessary, minimizing the impact on lfs3_file_opencfg's internals. This claws back most of the code cost of the one-commit key-value API: code stack ctx before: 38232 2400 636 after: 37856 (-1.0%) 2416 (+0.7%) 636 (+0.0%) before kv: 37352 2280 636 after kv: 37856 (+1.3%) 2416 (+6.0%) 636 (+0.0%) --- I'm quite happy how this turned out. I was worried there for a bit the key-value API was going to end up an ugly wart for the internals, but with LFS3_o_WRSET this integrates quite nicely. It also raises a really interesting question, should LFS3_o_WRSET be exposed to users? For now I'm going to play it safe and say no. While potentially useful, it's still a pretty unintuitive API. Another thing worth mentioning is that this does have a negative impact on compile-time gc. Duplication adds code cost when viewing the system as a whole, but tighter integration can backfire if the user never calls half the APIs. Oh well, compile-time opt-out is always an option in the future, and users seem to care more about pre-linked measurements, probably because it's an easier thing to find. Still, it's funny how measuring code can have a negative impact on code. Something something Goodhart's law.	2025-06-22 15:37:07 -05:00
Christopher Haster	92844cce3e	kv: Added _set_zero and _set_null tests These are high-risk corner cases for the key-value API, so we should test them. At one point I was relying on an optional buffer parameter in lfs3_file_sync_, but that would have broken if lfs3_set's buffer was NULL.	2025-06-22 15:36:53 -05:00

1 2 3 4 5 ...

2369 Commits