littlefs

Author	SHA1	Message	Date
Christopher Haster	fe772e08cd	Added dbgcat.py for extracted raw data from block devices dbgcat.py is basically the same as dbgblock.py except: - dbgcat.py pipes the block's contents directly to stdout. The intention is to enable external tools with Unix pipes while letting dbgcat.py take care of block address decoding: $ ./scripts/dbgcat.py disk -b4096 0x1.8 -n8 \| grep littlefs littlefs - Unlike dbgblock.py, dbgcat.py accepts multiple block addresses, concatenating the data that resides at those blocks. This is dbgCAT.py after all.	2024-04-01 16:28:51 -05:00
Christopher Haster	9905bd397a	Extended crc32c.py to support hex sequences and strings So now the following forms are supported: $ ./scripts/crc32c.py -x 41 42 43 44 fb9f8872 $ ./scripts/crc32c.py -s abcd fb9f8872 $ echo '00: 41 42 43 44' \| xxd -r \| ./scripts/crc32c.py fb9f8872 Hopefully this will make crc32c.py more useful. It hasn't seen very much use, though that may just be because of the difficulty marshalling data into a format crc32c.py can operate on. That and dbgblock.py's -x/--cksum flag covering one of the main use cases.	2024-04-01 16:27:59 -05:00
Christopher Haster	54a03cfe3b	Enabled both pruning/non-pruning dbg reprs, -t/--tree and -R/--rbyd Now that altns/altas are more important structurally, including them in our dbg script's tree renderers is valuable for debugging. On the other hand, they do add quite a bit of visual noise when looking at large multi-rbyd trees topologically. This commit gives us the best of both worlds by making both tree renderings available under different options: -t/--tree, a simplified rbyd tree renderer with altn/alta pruning: .-> 0 reg w1 4 .-+-> uattr 0x01 2 \| .-> uattr 0x02 2 .---+-+-> uattr 0x03 2 \| .-> uattr 0x04 2 \| .-+-> uattr 0x05 2 \| .-+---> uattr 0x06 2 +-+-+-+-+-> 1 reg w1 4 \| \| '-> 2 reg w1 4 \| '---> uattr 0x01 2 '---+-+-+-> uattr 0x02 2 \| \| '-> uattr 0x03 2 \| '-+-> uattr 0x04 2 \| '-> uattr 0x05 2 \| .-> uattr 0x06 2 \| .-+-> uattr 0x07 2 \| \| .-> uattr 0x08 2 '-+-+-> uattr 0x09 2 -R/--rbyd, a full rbyd tree renderer: .---> 0 reg w1 4 .---+-+-> uattr 0x01 2 \| .---> uattr 0x02 2 .-+-+-+-+-> uattr 0x03 2 \| .---> uattr 0x04 2 \| .-+-+-> uattr 0x05 2 \| .-+---+-> uattr 0x06 2 +---+-+-+-+-+-> 1 reg w1 4 \| \| '-> 2 reg w1 4 \| '-----> uattr 0x01 2 '-+-+-+-+-+-+-> uattr 0x02 2 \| \| '---> uattr 0x03 2 \| '---+-+-> uattr 0x04 2 \| '---> uattr 0x05 2 \| .---> uattr 0x06 2 \| .-+-+-> uattr 0x07 2 \| \| .-> uattr 0x08 2 '-----+---+-> uattr 0x09 2 And of course -B/--btree, a simplified B-tree renderer (more useful for multi-rbyds): +-> 0 reg w1 4 \| uattr 0x01 2 \| uattr 0x02 2 \| uattr 0x03 2 \| uattr 0x04 2 \| uattr 0x05 2 \| uattr 0x06 2 \|-> 1 reg w1 4 '-> 2 reg w1 4 uattr 0x01 2 uattr 0x02 2 uattr 0x03 2 uattr 0x04 2 uattr 0x05 2 uattr 0x06 2 uattr 0x07 2 uattr 0x08 2 uattr 0x09 2	2024-04-01 16:23:31 -05:00
Christopher Haster	abe68c0844	rbyd-rr: Reworking rbyd range removal to try to preserve rby structure This is the start of (yet another) rework of rybd range removals, this time in an effort to preserve the rby structure that maps to a balanced 2-3-4 tree. Specifically, the property that all search paths have the same number of black edges (2-3-4 nodes). This is currently incomplete, as you can probably tell from the mess, but this commit at least gets a working altn/alta encoding in place necessary for representing empty 2-3-4 nodes. More on that below. --- First the problem: My assumption, when implementing the previous range removal algorithms, was that we only needed to maintain the existing height of the tree. The existing rbyd operations limit the height to strictly log n. And while we can't _reduce_ the height to maintain perfect balance, we can at least avoid _increasing_ the height, which means the resulting tree should have a height <= log n. Since our rbyds are bounded by the block_size b, this means worst case our rbyd can never exceed a height <= log b, right? Well, not quite. This is true the instance after the remove operation. But there is an implicit assumption that future rbyd operations will still be able to maintain height <= log n after the remove operation. This turns out to not be true. The problem is that our rbyd appends only maintain height <= log n if our rby structure is preserved. If the rby structure is broken, rbyd append assumes an rby structure that doesn't exist, which can lead to an increasingly unbalanced tree. Consider this happily balanced tree: .-------o-------. .--------o .---o---. .---o---. .---o---. \| .-o-. .-o-. .-o-. .-o-. .-o-. .-o-. \| .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. \| a b c d e f g h i j k l m n o p => a b c d e f g h i '------+------' remove After a range removal it looks pretty bad, but note the height is still <= log n (old n not the new n). We are still <= log b. But note what happens if we start to insert attrs into the short half of the tree: .--------o .---o---. \| .-o-. .-o-. \| .o. .o. .o. .o. \| a b c d e f g h i .-----o .--------o .-+-r .---o---. \| \| \| \| .-o-. .-o-. \| \| \| \| .o. .o. .o. .o. \| \| \| \| a b c d e f g h i j'k'l' .-------------o .---o .---+-----r .--------o .-o .-o .-o .-+-r .---o---. \| \| \| \| \| \| \| \| \| \| .-o-. .-o-. \| \| \| \| \| \| \| \| \| \| .o. .o. .o. .o. \| \| \| \| \| \| \| \| \| \| a b c d e f g h i j'k'l'm'n'o'p'q'r' Our right side is generating a perfectly balanced tree as expected, but the left side is suddenly twice as far from the root! height(r')=3, height(a)=6! The problem is when we append l', we don't really know how tall the tree is. We only know l' has one black edge, which assuming rby structure is preserved, means all other attrs must have one black edge, so creating a new root is justified. In reality this just makes the tree grow increasingly unbalanced, increasing the height of the tree by worst case log n every range removal. --- It's interesting to note this was discovered while debugging test_fwrite_overwrite, specifically: test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10 It turns out the append fragments -> delete fragments -> append/carve block + becksum loop contains the perfect sequence of attrs necessary to turn this tree inbalance into a linked-list! .-> 0 data w1 1 .-b-> 1 data w1 1 \| .-> 2 data w1 1 .-b-b-> 3 data w1 1 \| .-> 4 data w1 1 \| .-b-> 5 data w1 1 \| \| .-> 6 data w1 1 .---b-b-b-> 7 data w1 1 \| .-> 8 data w1 1 \| .-b-> 9 data w1 1 \| \| .-> 10 data w1 1 \| .-b-b-> 11 data w1 1 \| .-b-----> 12 data w1 1 .-y-y-------> 13 data w1 1 \| .-> 14 data w1 1 .-y---------y-> 15 data w1 1 \| .-> 16 data w1 1 .-y-----------y-> 17 data w1 1 \| .-> 18 data w1 1 .-y-------------y-> 19 data w1 1 \| .-> 20 data w1 1 .-y---------------y-> 21 data w1 1 \| .-> 22 data w1 1 .-y-----------------y-> 23 data w1 1 \| .-> 24 data w1 1 .-y-------------------y-> 25 data w1 1 \| .---> 26 data w1 1 \| \| .-> 27-2047 block w2021 10 b-------------------r-b-> becksum 5 Note, to reproduce this you need to step through with a breakpoint on lfsr_bshrub_commit. This only shows up in the file's intermediary btree, which at the time of writing ends up at block 0xb8: $ ./scripts/test.py \ test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10 \ -ddisk --gdb -f $ ./scripts/watch.py -Kdisk -b \ ./scripts/dbgrbyd.py -b4096 disk 0xb8 -t (then b lfsr_bshrub_commit and continue a bunch) --- So, we need to preserve the rby structure. Note pruning red/yellow alts is not an issue. These aren't black, so we aren't changing the number of black edges in the tree. We've just effectively reduced a 3/4 node into a 2/3 node: .-> a .---b-> b .-> a <- 2 black \| .---> c .-b-> b \| \| .-> d \| .-> c b-r-b-> e <- rm => b-b-> d <- 2 black The tricky bit is pruning black alts. Naively this changes the number of black edges/2-3-4 nodes in the tree, which is bad: .-> a .-b-> b .-> a <- 2 black \| .-> c .-b-> b b-b-> d <- rm => b---> c <- 1 black It's tempting to just make the alt red at this point, effectively merging the sibling 2-3-4 node. This maintains balance in the subtree, but still removes a black edge, causing problems for our parent: .-> a .-b-> b .-> a <- 3 black \| .-> c .-b-> b .-b-b-> d \| .-> c \| .-> e .-b-b-> d \| .-b-> f \| .---> e \| \| .-> g \| \| .-> f b-b-b-> h <- rm => b-r-b-> g <- 2 black In theory you could propagate this all the way up to the root, and this _would_ probably give you a perfect self-balancing range removal algorithm... but it's recursive... and littlefs can't be recursive... .-> s .-b-> t .-> s \| .-> u .-----b-> t .-b-b-> v \| .-> u \| .-> w \| .---b-> v \| .-b-> x \| \| .---> w \| \| \| \| .-> y \| \| \| \| \| \| \| .-> x b-b- ... b-b-b-> z <- rm => r-b-r-b- ... r-b-r-b-> y So instead, an alternative solution. What if we allowed black alts that point nowhere? A sort of noop 2-3-4 node that serves only to maintain the rby structure? .-> a .-b-> b .-> a <- 2 black \| .-> c .-b-> b b-b-> d <- rm => b-b-> c <- 2 black I guess that would technically make this 1-2-3-4 tree. This does add extra overhead for writing noop alts, which are otherwise useless, but it seems to solve most of our problems: 1. does not increase the height of the tree, 2. maintains the rby structure, 3. tail-recursive. And, thanks to the preserved rby structure, we can say that in the worst case our rbyds will never exceed height <= log b again, even with range removals. If we apply this strategy to our original example, you can see how the preserved rby structure sort of "absorbs" new red alts, preventing further unbalancing: .-------o-------. .--------o .---o---. .---o---. .---o---. o .-o-. .-o-. .-o-. .-o-. .-o-. .-o-. o .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. o a b c d e f g h i j k l m n o p => a b c d e f g h i '------+------' remove Reinserting: .--------o .---o---. o .-o-. .-o-. o .o. .o. .o. .o. o a b c d e f g h i .----------------o .---o---. o .-o-. .-o-. .------o .o. .o. .o. .o. .o. .-+-r a b c d e f g h i j'k'l'm' .----------------------------o .---o---. .-------------o .-o-. .-o-. .---o .---+-----r .o. .o. .o. .o. .-o .-o .-o .-o .-+-r a b c d e f g h i j'k'l'm'n'o'p'q'r's' Much better! --- This commit makes some big steps towards this solution, mainly codifying a now-special alt-never/alt-always (altn/alta) encoding to represent these noop 1 nodes. Technically, since null (0) tags are not allowed, these already exist as altle 0/altgt 0 and don't need any extra carve-out encoding-wise: LFSR_TAG_ALT 0x4kkk v1dc kkkk -kkk kkkk LFSR_TAG_ALTN 0x4000 v10c 0000 -000 0000 LFSR_TAG_ALTA 0x6000 v11c 0000 -000 0000 We actually already used altas to terminate unreachable tags during range removals, but this behavior was implicit. Now, altns have very special treatment as a part of determining bounds during appendattr (both unreachable gt/le alts are represented as altns). For this reason I think the new names are warranted. I've also added these encodings to the dbg*.py scripts for, well, debuggability, and added a special case to dbgrby.py -j to avoid unnecessary altn jump noise. As a part of debugging, I've also extended dbgrbyd.py's tree renderer to show trivial prunable alts. Unsure about keeping this. On one hand it's useful to visualize the exact alt structure, on the other hand it likely adds quite a bit of noise to the more complex dbg scripts. The current state of things is a mess, but at least tests are passing! Though we aren't actually reclaiming any altns yet... We're definitely _not_ preserving the rby structure at the moment, and if you look at the output from the tests, the resulting tree structure is hilarious bad. But at least the path forward is clear.	2024-04-01 16:23:14 -05:00
Christopher Haster	9b4e1b4cb7	Replace assert(!err) with assert(err == 0) in tests This plays better with prettyasserts.py, which prints the err value on failure. We _could_ extend prettyasserts.py to print the contents of !err patterns, but this risks making the error message more confusing when the target is an actual boolean expression. Keep in mind prettyasserts.py is purely syntactical and doesn't really know the expression's type.	2024-03-23 16:27:19 -05:00
Christopher Haster	16ca642508	Simplified the diverged state machine in lfsr_rbyd_appendattr Just made d_pruned its own variable. Encoding d_pruned into the state machine is overkill. We're not on the hot-path, so stack usage is not a premium, and even if we were this is a single bool that could probably fit in some padding somewhere. And the compiler is going to likely be better at optimizing with a simpler encoding. Code changes: code stack before: 33988 2864 after: 33968 (+0.0%) 2864 (+0.0%)	2024-03-21 13:26:27 -05:00
Christopher Haster	531c2bcc4c	Quieted test.py/bench.py status when stdout is aimed at stdout This is a condition for specifically the -O- pattern. Doing anything fancier would be too much, so anything clever such as -O/dev/stdout will still be clobbered. This was a common enough pattern and the status updates clobbering stdout was annoying enough that I figured this warranted a special case.	2024-03-20 13:58:22 -05:00
Christopher Haster	76593711ab	Added -f/--fail to test.py/bench.py This just tells test.py/bench.py to pretend the test failed and trigger any conditional utilities. This can be combined with --gdb to easily inspect a test that isn't actually failing. Up until this point I've just been inserting assert(false) when needed, which is clunky.	2024-03-20 13:50:04 -05:00
Christopher Haster	62de865103	Eliminated null tag reachability in dbg scripts This was throwing off tree rendering in dbglfs.py, we attempt to lookup the null tag because we just want to first tag in the tree to stitch things together. Null tag reachability is tricky! You only notice if the tree happens to create a hole, which isn't that common. I think all lookup implementations should have this max(tag, 1) pattern from now on to avoid this. Note that most dbg scripts wouldn't run into this because we usually use the traversal tag+1 pattern. Still, the inconsistency in impl between the dbg scripts and lfs.c is bad.	2024-03-20 13:31:16 -05:00
Christopher Haster	3eb4ccdde7	Fixed block/becksum related tree rendering in dbglfs.py The were a couple issues mixing high-level and low-level bptrs representations: 1. The high-level vs low-level block representation needed to have ordering priority over the actual tag in order for inner-node tree renderings to make sense. 2. We need to flatten all tags to BLOCK/DATA/other data tags when _not_ rendering inner-nodes so interleaved becksums/other util tags don't mess with the tree rendering. Now things look like this: littlefs v0.0 4096x256 0x{0,1}.a0c, rev 15, weight 0.512 {0000,0001}: -1.1 hello reg 1113, btree 0x27.8d8 0000.0a11: + 0-1112 btree w1113 9 0027.08d8: \| .-+ 0-1112 block w1113 11 '-+-\| > becksum 5 '-> 0-1112 block w1113 0x95.0 1113 Maybe a bit weird looking at first, but correct.	2024-03-20 13:30:33 -05:00
Christopher Haster	3d61030ccc	Mark the on-disk version as experimental Just in case...	2024-03-20 01:37:29 -05:00
Christopher Haster	9cf115685b	Don't duplicate config across all mroots, only magic So, duplicating the config across multiple mroots allows a better chance of manual recovery if things go wrong, right? .--------. .--------. .\|littlefs\|->\|littlefs\| \|\|bs=4096 \| \|\|bs=4096 \| \|\|bc=256 \| \|\|bc=256 \| \|\|crc32c \| \|\|root dir\| \|\| \| \|\|crc32c \| \|'--------' \|'--------' '--------' '--------' Well, this was the original thinking. But now I'm starting to think the duplicated config isn't actually all that useful: 1. The config may be out-of-date, since only the last mroot is mutable. This is becoming more common as littlefs evolves (on-disk version bumps, compat flag changes, fs_grow, etc). 2. The most important bit of information is where the mtree is. And this information is _only_ available on the last mroot because of mutability requirements. At the very least, all mroots point to the mtree (not Rome). So finding the mtree may not be that hard if you find any mroot. But this also gives you all the config sooo... 3. gstate is going to be hard (impossible?) to reconstruct anyways. Though for reading this may only be an issue for interrupted grms. 4. Let's be honest, manual recovery is not going to be a common occurence for these devices. Point 1. is a the main issue and actually highlights a real risk with duplicated config: It's easy to pick up out-of-date config. In other implementations this risks easy mistakes that are hard to notice until complex filesystem states. Assuming the first mroot contains up-to-date config for the obvious example. Even in our current implementation, duplicated config already poses some tricky hard-to-get-right problems. What happens if we run into an unknown compat flag on a not-last mroot? Hint, our implementation was broken! --- So this commit changes mroot extension to _not_ duplicate config, but instead just rewrite the magic string and mroot chain to the new mroot anchor: .--------. .--------. .\|littlefs\|->\|littlefs\| \|\|crc32c \| \|\|bs=4096 \| \|\| \| \|\|bc=256 \| \|\| \| \|\|root dir\| \|\| \| \|\|crc32c \| \|'--------' \|'--------' '--------' '--------' This leads to a nice simplification in lfsr_mdir_commit, so that's a plus. It does make lfsr_mount a bit trickier, since we need to figure out which mroot is the last mroot before checking for config. But we would need something trickier in lfsr_mount anyways to handle the above out-of-date issues anyways. The current implementation just does a redundant mroot lookup to figure out if the current mroot is the last mroot. In theory this could be avoided, but I couldn't figure out how to without making the code unreasonably complex (lfsr_mount is already intertwined with lfsr_traversal_read). The end result is a bit of code and stack savings, thanks to lfsr_mdir_commit being on the stack-depth hot-path (deep-path?): code stack before: 34060 2880 after: 33996 (-0.2%) 2864 (-0.6%) It's also worth noting that there are plans to add block-level redundancy at some point. Maybe it's best to leave recovering from missing blocks to block-level redundancy which is actually designed for this, and let the mroot chain do what the mroot chain does best: allowing the mroots to participate in wear-leveling.	2024-03-20 00:38:54 -05:00
Christopher Haster	c71725d627	Added type info to dsize comments This is mostly just a lot of leb128s, though we do use be16 for tags and le32 for cksums and revision counts. There are several places we use single-byte leb128s, which really are u8s with the top bit reserved. Still, notating this as leb128 indicates that the top bit really is reserved, even if you don't need full leb128 encoding/decoding in practice.	2024-03-19 15:03:03 -05:00
Christopher Haster	2564100eaa	Tweaked name/sizelimit parsing for consistency, default to 0xff/0x7fffffff This is mostly just for consistency with changes to parsing other parts of the fs config attrs. This changes name/size limits to default to namelimit=0xff and sizelimit=0x7fffffff. These are reasonable defaults for 32-bit systems, which was the original use case for littlefs. Though, with the diversity of embedded device, I suspect these will be overridden more often than not. For this reason the 0xff/0x7fffffff case is not treated specially during lfs_format and these limits are always written. Though this may change in the future. The intention behind these defaults is to align with other limits that may be introduced in the future. Any new artificial limits will necessarily require defaulting to their existing values for backwards compatibility, so hopefully this allows all limits to be handled consistently. If a future use-case-specific implementation of littlefs can benefit from assuming these defaults, that's a nice plus. Name/size-limit attr encodings: .---+---+---+---. tag (0x000c): 1 be16 2 bytes \| x000c \| 0 \|siz\| weight (0): 1 leb128 1 byte +---+---+---+---+ size: 1 leb128 1 byte \| name_limit \| name_limit: 1 leb128 <=4 bytes '---+- -+- -+- -' total: <=8 bytes .---+---+---+---. tag (0x000d): 1 be16 2 bytes \| x000d \| 0 \|siz\| weight (0): 1 leb128 1 byte +---+---+---+---+- -. size: 1 leb128 1 byte \| size_limit \| size_limit: 1 leb128 <=5 bytes '---+- -+- -+- -+- -' total: <=9 bytes Code changes: code stack before: 34040 2880 after: 34060 (+0.1%) 2880 (+0.0%)	2024-03-19 15:02:29 -05:00
Christopher Haster	9366674416	Replaced separate BLOCKSIZE/BLOCKCOUNT attrs with single GEOMETRY attr This saves a bit of rbyd overhead, since these almost always come together. Perhaps more interesting, it carves out space for storing mroot-anchor redundancy information. This uses the lowest two bits of the GEOMETRY tag to indicate how many redundant blocks belong to the mroot-anchor: LFSR_TAG_GEOMETRY 0x0008 v--- ---- ---- 1-rr This solves a bit of a hole in our redundancy encoding. The plan is for this info to be stored in the lowest two bits of every pointer, but the mroot-anchor doesn't really have a pointer. Though this is just future plans. Right now the redundancy information is unused. Current implementations should use the GEOMETRY tag 0x0009, which you may notice implied redundancy level-1. This matches our current 2-block per mdir default. Geometry attr encoding: .---+---+---+---. tag (0x0008+r): 1 be16 2 bytes \|x0008+r\| 0 \|siz\| weight (0): 1 leb128 1 byte +---+---+---+---+ size: 1 leb128 1 byte \| block_size \| block_size: 1 leb128 <=4 bytes +---+- -+- -+- -+- -. \| block_count \| block_count: 1 leb128 <=5 bytes '---+- -+- -+- -+- -' total: <=13 bytes Code changes: code stack before: 34092 2880 after: 34040 (-0.2%) 2880 (+0.0%)	2024-03-19 15:02:02 -05:00
Christopher Haster	796be705ac	Simplified how we check on-disk versions a bit We don't really need to do full leb128 decoding since our version numbers are unlikely to ever actually exceed v127.127. Worst case, if they do, the version that exceeds v127.127 can switch to using leb128 decoding without breaking backwards compatibility. Version attr encoding: .---+---+---+---+---+---. tag (0x0004): 1 be16 2 bytes \| x0004 \| 0 \| 2 \|maj\|min\| weight (0): 1 leb128 1 byte '---+---+---+---+---+---' size (2): 1 leb128 1 byte major_version: 1 leb128 1 byte minor_version: 1 leb128 1 byte total: 6 bytes Code changes: code stack before: 34124 2880 after: 34092 (-0.1%) 2880 (+0.0%)	2024-03-19 15:01:51 -05:00
Christopher Haster	130281ac05	Reworked compat flags a bit Now with a bit more granularity for possibly-future-optional on-disk data structures: LFSR_RCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) LFSR_RCOMPAT_MLEAF 0x0002 ---- ---- ---- --1- LFSR_RCOMPAT_MSHRUB 0x0004 ---- ---- ---- -1-- (reserved) LFSR_RCOMPAT_MTREE 0x0008 ---- ---- ---- 1--- LFSR_RCOMPAT_BSPROUT 0x0010 ---- ---- ---1 ---- LFSR_RCOMPAT_BLEAF 0x0020 ---- ---- --1- ---- LFSR_RCOMPAT_BSHRUB 0x0040 ---- ---- -1-- ---- LFSR_RCOMPAT_BTREE 0x0080 ---- ---- 1--- ---- LFSR_RCOMPAT_GRM 0x0100 ---- ---1 ---- ---- LFSR_WCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) LFSR_OCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) This adds a couple reserved flags: - LFSR_COMPAT_NONSTANDARD - This flag will never be set by a standard version of littlefs. The idea is to allow implementations with non-standard extensions a way to signal potential compatibility issues without worrying about future compat flag conflicts. This is limited to a single bit, but hey, it's not like it's possible to predict all future extensions. If a non-standard extension needs more granularity, reservations of standard compat flags can always be requested, even if they don't end up implemented in standard littlefs. (Though such reservations will need a strong motivation, it's not like these flags are free). - LFSR_RCOMPAT_MSHRUB - In theory littlefs supports a shrubbed mtree, where the root is inlined into the mroot. But in practice this turned out to be more complicated than it was worth. Still, a future implementation may find an mshrub useful, so preserving a compat flag for such a case makes sense. That being said, I have no plans to add support for mshrubs even in the dbg scripts. I would like the expected feature-set for debug tools to be well-defined, but also conservative. This gets a bit tricky with theoretical features like the mshrubs, but until mshrubs are actually implemented in littlefs, I would like to consider them non-standard. The implication of this is that, while LFSR_RCOMPAT_MSHRUB is currently "reserved", it may be repurposed for some other meaning in the future. These changes also rename COMPATFLAGS -> COMPAT, and reorder the tags by decreasing importance. This ordering seems more valuable than the original intention of making rcompat/wcompat a single bit flip. Implementation-wise, it's interesting to note the internal-only LFSR_COMPAT_OVERFLOW flag. This gets set when out-of-range bits are set on-disk, and allows us to detect unrepresentable compat flags without too much extra complexity. The extra encoding/decoding overhead does add a bit of cost though: code stack before: 33944 2880 after: 34124 (+0.5%) 2880 (+0.0%)	2024-03-16 17:26:04 -05:00
Christopher Haster	76721638db	Renamed lfsr_rbyd_p_red -> lfsr_rbyd_p_recolor Recolor is actually a verb for one.	2024-03-15 00:30:59 -05:00
Christopher Haster	85e43d51ba	Found a balance-preserving solution to tail-recursive range recoloring It feels a bit clumsy, but by using an additional bit of state to keep track of if the last alt was pruned, we can cancel recolorings that may risk recursion. If we look at how this plays out on the underlying 2-3-4 tree: .-----. .-------. .-------. .-------. \|.a.h.\| \|.a.c.h.\| \|.a.c.h.\| \|.a.c.h.\| '\|-\|-\|' '\|-\|-\|-\|' '\|-\|-\|-\|' '\|-\|-\|-\|' \| .-' '-. .--' '--. .-' '-. v v v v v v v .-------. .---. .---. .---. .-------. .---. .---. \|.b.c.g.\| => \|.b.\| \|.g.\| => \|.b.\| \|.d.e.f.\| => \|.b.\| \|.e.\| '\|-\|-\|-\|' '\|-\|' '\|-\|' '\|-\|' '\|-\|-\|-\|' '\|-\|' '\|-\|' \| x \| x .-' '-. v v v v .-------. .-------. .---. .---. \|.d.e.f.\| \|.d.e.f.\| \|.d.\| \|.f.\| '\|-\|-\|-\|' '\|-\|-\|-\|' '\|-\|' '\|-\|' Note the important property that no nodes ended up at a height _worse_ than where they started. It's interesting to note this is equivalent to splitting the nodes _before_ prunning: .-----. .-------. .-------. .-------. \|.a.h.\| \|.a.c.h.\| \|.a.c.h.\| \|.a.c.h.\| '\|-\|-\|' '\|-\|-\|-\|' '\|-\|-\|-\|' '\|-\|-\|-\|' \| .-' '-. .-' '--. .-' '-. v v v v v v v .-------. .---. .---. .---. .-----. .---. .---. \|.b.c.g.\| => \|.b.\| \|.g.\| => \|.b.\| \|.e.g.\| => \|.b.\| \|.e.\| '\|-\|-\|-\|' '\|-\|' '\|-\|' '\|-\|' '\|-\|-\|' '\|-\|' '\|-\|' \| x \| x .---' \| x .-' '-. v v v v v v .-------. .-------. .---. .---. .---. .---. \|.d.e.f.\| \|.d.e.f.\| \|.d.\| \|.f.\| \|.d.\| \|.f.\| '\|-\|-\|-\|' '\|-\|-\|-\|' '\|-\|' '\|-\|' '\|-\|' '\|-\|' Which is probably why most of our 2-3-4 tree invariants hold. In the actual implementation, we encode the current pruned state as a part of our diverging state machine, since we don't non-trivially prune outside of diverging trunks. This ends up with the following, slightly-extended, diverging state machine: diverge possible? diverge not possible? \| \| v \| DIVERGINGLOWER-------------------. \| \| \| \| v v v DIVERGEDLOWER<->PRUNEDLOWER NOTDIVERGING \| .------------' \| v v \| DIVERGINGUPPER \| \| \| v \| DIVERGEDUPPER<->PRUNEDUPPER \| '------------. \| .----------' v v v leaf stuff Writing out the state machine like this actually highlights the slightly annoying transition from PRUNEDUPPER to leaf stuff, which was buggy in the first impl. We also encode some common information (lower/upper, pruned, etc) in the state machine's bit encoding to try to avoid too many if statements. Though this impl does seem a bit heavy handed. The additional complexity results in of course more code cost, but as a trade-off our range recoloring should be a bit more sturdy and provably preserves the h=2log2(b) worst case height of our tree: code stack broken recoloring: 33880 2880 unbalanced recoloring: 33912 (+0.1%) 2880 (+0.0%) balanced recoloring: 33944 (+0.2%) 2880 (+0.0%)	2024-03-15 00:30:53 -05:00
Christopher Haster	0a89d0c254	Fixed recoloring tail-recursion violations during range removals I spoke too soon and made a mistake when reenabling color preservation during range removals. I assumed, that thanks to replacing the diverging alt with a new black alt for stitching together diverging trunks, we would avoid the issue where a deleted diverging alt violates our rbyd's tail-recursive recoloring invariant. Unfortunately, this is not the case. All the stitching alt did was make this violation more difficult to reach, but still reachable. Arguable a worse situation. Now, for this violation to happen, in addition to all of the other requirements, we need the lower-diverging trunk to become empty. This is the only case where we have no stitching alt, because we don't need to stitch an empty trunk. Which means if the upper-diverging trunk has yellow nodes both before and after the diverging alt, our tail-recursive recoloring invariant can break. Here's an example: .-------------r-------------. .-o-. .---+---y----. .-o-. .o. .o. .o. .o. .o. .-y-+-. .o. .o. a a a a a a a a c c c e e e e e e e '--+--' remove Again, this doesn't capture the alt-layout, which _is_ important, so here's the dbgrbyd.py view: .-> aa .-> aa .-b-> a .-b-> a \| .-> a \| .-> a .-----------b-b-> a .-------b-b-> a \| .-> a \| .-> a \| .---------b-> a \| .-b-> a \| \| .-> a \| \| .-> a \| \| .-------b-> a \| .-b-b-> a r-b-y-r-b-----b-> cc -. => y-y-r-b-----> ee <- two yellows! \| \| '-> c + rm \| '-----b-> e different dirs! \| \| .-> c -' \| '-> e should not happen! \| '-y-r-b-> ee \| .-> e \| \| '---> e \| .-b-> e \| '-----> e \| \| .-> e \| .-> e '-----b-b-> e \| .-b-> e \| \| .-> e '---------b-b-> e And the steps in our appendattr algorithm that led to this state, which is insightful: read <r => [<r] read >b => [<r >b] read <r => [<r >b <r] read <r => [<r >b <r <r] ^--^------ red + red implies yellow ysplit => [<r >r <b] reorder => [<r <r >b] ^--^------------- yellow-same-dir invariant held read >b => [<r <r >b >b] diverge => [<r <r >b] read >r => [<r <r >b >r] read >r => <r [<r >b >r >r] ^-----------^-- our 4-alt fifo for flips/coloring ysplit => <r [<r >r >b] reorder => <r [>r >r <b] ^--^---------- yellow-same-dir invariant held ^---^------------- yellow-same-dir invariant NOT held though 2 yellows is also a problem The previous commit fixing this bug for the one-pass algorithm may also be useful. This tree is now tested in test_rbyd_delete_range_rydye and test_rbyd_delete_range_rydye_backwards, though only test_rbyd_delete_range_rydye_backwards reveals the bug, since the bug requires _specifically_ the lower-diverging trunk to become empty (both rydy and rydye now have in-order and backwards tests in case of other chirality issues). --- Taking a step back, and looking at this bug from a higher-level, the core of the issue is that we are somewhat arbitrarily deleting nodes after splitting nodes. This can break our tail-recursive recoloring invariant. What the heck is our tail-recursive recoloring invariant? This is a property of 2-3-4 and greater B-trees, and transitively red-black and red-black-yellow trees, that allows for tail-recursive, self-balancing node insertion. Basically, if you eagerly split any 4-nodes you encounter as you descend down the tree, you will always be guaranteed to have an open slot in your parent, so pushing up split nodes (or recoloring) only ever propagates up a single level: .-----. .-------. .-------. \|.a.h.\| \|.a.c.h.\| \|.a.c.h.\| '\|-\|-\|' '\|-\|-\|-\|' '\|-\|-\|-\|' \| .-' '-. .-' '--. v v v v v .-------. .---. .---. .---. .-----. \|.b.c.g.\| => \|.b.\| \|.g.\| => \|.b.\| \|.e.g.\| '\|-\|-\|-\|' '\|-\|' '\|-\|' '\|-\|' '\|-\|-\|' \| \| .-' '-. v v v v .-------. .-------. .---. .---. \|.d.e.f.\| \|.d.e.f.\| \|.d.\| \|.f.\| '\|-\|-\|-\|' '\|-\|-\|-\|' '\|-\|' '\|-\|' If you lazily split, you aren't guaranteed an open slot in your parent, so you need recursion to solve splits. This is why 2-3 trees, though self-balancing, are not tail-recursive: .-----. .-----. \|.a.h.\| \|.a.h.\| '\|-\|-\|' '\|-\|-\|' \| \| v v .-------. .'''''''''. \|.b.c.g.\| => >.b.c.e.g.< 5!? '\|-\|-\|-\|' '\|.\|.\|.\|.\|' \| .-' '-. v v v .-------. .---. .---. \|.d.e.f.\| \|.d.\| \|.f.\| '\|-\|-\|-\|' '\|-\|' '\|-\|' But if you are eagerly splitting while also deleting nodes: .-----. .-------. .-------. .'''''''''. \|.a.h.\| \|.a.c.h.\| \|.a.c.h.\| 5!? >.a.c.e.h.< '\|-\|-\|' '\|-\|-\|-\|' '\|-\|-\|-\|' '\|.\|.\|.\|.\|' \| .-' '-. .-' '---. .---' \| '---. v v v v v v v v .-------. .---. .---. .---. .-------. .---. .---. .---. \|.b.c.g.\| => \|.b.\| \|.g.\| => \|.b.\| \|.d.e.f.\| => \|.b.\| \|.d.\| \|.g.\| '\|-\|-\|-\|' '\|-\|' '\|-\|' '\|-\|' '\|-\|-\|-\|' '\|-\|' '\|-\|' '\|-\|' \| x \| x v v .-------. .-------. \|.d.e.f.\| \|.d.e.f.\| '\|-\|-\|-\|' '\|-\|-\|-\|' Suddenly, recursion. This is a problem. The workaround implemented here is to check during pruning if our parent may risk recursion, and if so, recolor the last alt so nothing will break. This ends up equivalent to the following transformation: .-----. .-------. .-----. .-----. \|.a.h.\| \|.a.c.h.\| \|.a.c.\| \|.a.c.\| '\|-\|-\|' '\|-\|-\|-\|' '\|-\|-\|' '\|-\|-\|' \| .-' '-. .-' '-. .-' '--. v v v v v v v .-------. .---. .---. .---. .---. .---. .-----. \|.b.c.g.\| => \|.b.\| \|.g.\| => \|.b.\| \|.h.\| => \|.b.\| \|.e.h.\| '\|-\|-\|-\|' '\|-\|' '\|-\|' '\|-\|' '\|-\|' '\|-\|' '\|-\|-\|' \| x \| x \| .-' '-. v v v v v .-------. .-------. .-------. .---. .---. \|.d.e.f.\| \|.d.e.f.\| \|.d.e.f.\| \|.d.\| \|.f.\| '\|-\|-\|-\|' '\|-\|-\|-\|' '\|-\|-\|-\|' '\|-\|' '\|-\|' You may notice this isn't exactly optimal. The >h branch ends up one level lower, making the balance of the tree off by one. But it at least ends up with a functional tree. I may try to find a better solution... --- The test_rbyd_delete_range_rydy/rydye tests should cover the cases where a diverging alt is deleted. I also tried to write tests for the cases where an alt is pruned, the closest I got is in test_rbyd_delete_range_dryy_backwards, but I couldn't actually come up with a sequence that would break our rbyds. In theory it's possible, but it would need this substructure: .-------> c y-r-b-------> c y-r-b-y-r-b-> c or \| \| '-y-r-b-> c \| \| \| \| \| \| \| \| Which, as far as I can tell, can't actually be created with our current algorithm... Note the inverse structure: .---------> c \| .-y-r-b-> c y-r- \| \| Will be pruned before it has a chance to split. So there is no invariant concerns there. We only have issues when it's the tail alts that get pruned, because we decide to split before we know if we are pruning or not. I don't think this can be avoided without additional read-ahead. Also, even if we could create the above substructure, because we are on a diverged trunk, and by definition all alts point the same direction, we would never end up violating our same-dir yellow invariant/assert... Code changes: code stack before: 33880 2880 after: 33912 (+0.1%) 2880 (+0.0%)	2024-03-15 00:30:43 -05:00
Christopher Haster	4d90be94f9	Preserve coloring during range removals This is the real kicker of our new-and-improved range removal algorithm. We can actually preserve the existing tree coloring, and the underlying rbyd invariants. Well, sort of. We preserve the red-follows-yellow and black-follows-red rules, but we don't (can't?) preserve the same-height for all black edges property. But note! The new range removal algorithm never creates _new_ black edges. It can only delete black edges, and otherwise preserves the structure of the underlying 2-3-4 tree. This means that while the resulting tree may not be perfectly balanced with h=2log2n', where n' is the _new_ number of tags, the resulting tree _is_ limited to h=2log2(n) where n is the _old_ number of tags. When applied to our bounded rbyd, with eventual compaction and rebalancing, we end up with the guarantee that the rbyd's height will never exceed h=2*log2(b) where b is the block size, even with arbitrary range removals. This is a great result! --- Note that this algorithm does not suffer from the yellow-diverge-yellow corner case that was an issue for preserving coloring in the previous one-pass stitching algorithm. This is because the one-pass algorithm effectively deleted the diverging alt, breaking the tail-recursive invariant of the underlying 2-3-4 tree. With the new two-pass algorithm, we _replace_ the diverging alt with a black stitching alt to stitch together the diverging trunks, so no tail-recursive invariant breaking. (Also note even if we could preserve coloring in the one-pass algorithm, it would still be breaking invariants by introducing new black edges when it stitches together diverging trunks. Worst case, resulting in ~2x the height, even when stitching with red alts (The red alt stitching brings this cost down from ~4x to ~2x worst case due to blanket recoloring. With yellow alt stitching this could probably be brought down to ~1.3x, but this would still mean every range removal could be increasing the height of the tree, which is not great.).) --- Pruning has to be a bit more complicated now, since we need to be able to recolor skipped red alts. But other than pruning the cost of recoloring vs not recoloring is pretty small: code stack one-pass, blanket recolor: 33852 2880 two-pass, blanket recolor: 33860 (+0.0%) 2880 (+0.0%) two-pass, color preserving: 33880 (+0.1%) 2880 (+0.0%) The non-rigorous random-file-write benchmark I've been using as a litmus test did not really show any improvements, but in hindsight it might have been a bit silly to use a uniform distribution of writes to test for rbyd balancing issues... Building a tree from a uniform distribution already results in a balanced tree without doing anything!	2024-03-14 03:10:14 -05:00
Christopher Haster	39413e7d78	Cleaned up reworked range removal/diverging trunk algorithm Note this is still blanket recoloring alts to black in diverged trunks. So we don't get the main benefit of this rework yet: preserving the rbyd color invariants. But this is a nice checkpoint that shows that our two-pass diverging trunk algorithm works without breaking right-leaning invariants. Before, we made a single pass, and stitched together alternating alts in any diverged trunks in order to remove ranges of tags (ignore coloring for now, coloring _does_ help here): .-----------o .-------o-------. \| o-----------. .---o---. .---o---. \| .-----o \| .-o-. .-o-. .-o-. .-o-. .-o-. \| o-----. .-o-. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .--o--. .o. .o. .o. a b c d e f g h i j k l m n o p => a b c d e f g j k l m n o p '-+-' remove Now, we do two passes: One to write any alts on the lower-diverged trunk. And one to write the non-diverging part of the trunk, stitch in the lower-diverged trunk with an alt, and then write any alts on the upper-diverged trunk. This results in a somewhat complex diverging state machine: diverge possible? diverge not possible? \| \| v \| DIVERGINGLOWER-----------. \| \| \| \| v v v DIVERGEDLOWER NOTDIVERGING \| \| v \| DIVERGINGUPPER \| \| \| v \| DIVERGEDUPPER \| '--------. .--------' v v leaf stuff But the end result is a much more balanced tree, at least on first inspection: .-------o-------. .--o--. .---o---. .---o---. .--------o o--------. .-o-. .-o-. .-o-. .-o-. .-o-. .--o o--. .-o-. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. \| \| .o. .o. .o. a b c d e f g h i j k l m n o p => a b c d e f g j k l m n o p '-+-' remove Unfortunately, this does mean we need, well, two passes, even if the resulting tree doesn't actually end up with a diverging trunk. We can avoid two passes if we know a diverged trunk is impossible, but we only know this in the single attr append case (no negative deltas, no submask, no supmask, etc). At the very least we don't end up _progging_ any more alts than is necessary. Even though we write two trunks, the number of alts after pruning end up the same as in our single-pass algorithm. The only additional cost is a single null tag (4 bytes) to terminate the diverged trunk. This two pass algorithm may sound more complicated, and therefore more costly, but keep in mind this replaces the previous diverged-swapping state, which I would argue resulted in more mess and a harder to understand lfsr_rbyd_appendattr. Now that the dust has settled, we can actually start comparing the relative code costs, though again note we are still blanket recoloring. The result is a surprising basically net-zero code code: code stack one-pass: 33852 2880 two-pass: 33860 (+0.0%) 2880 (+0.0%)	2024-03-14 03:08:53 -05:00
Christopher Haster	9c2c5b2391	Prevented writing useless null tags for unstitched diverged trunks This can happen if we end up pruning all alts in a diverged trunk. Note this is subtly different than finding no diverged trunk, as we still need to switch to the LFSR_D_DIVERGED* state in order to prune alts on the non-empty diverged trunk. We need a special case here, because there's no way to represent an empty trunk without a reachable null tag. But a reachable null tag would violate our rbyd's right-leaning property and break lookupnext. We already had a special case for this situation, which would skip the alt that would stitch the trunks together, but we were still writing out a null tag for the diverged trunk even if it was empty. We don't need this null tag and it turns out not writing the null tag saves a null tag.	2024-03-12 15:46:18 -05:00
Christopher Haster	03954a1ef9	Fixed diverged leaf coloring issue Took some debugging to figure out what was going on, but this was just a refactoring oversight, didn't update the diverged state used to decide how to color the leaves. I guess this would have actually been caught earlier if I cleaned up the code before debugging the test failures. But I wanted to reach proof-of-concept first... Anyways, good news, all tests are passing, so the proof-of-concept new range removal algorithm works.	2024-03-12 15:46:18 -05:00
Christopher Haster	dd31f610b3	Fixed null tags getting stuck in the tree during range removals Please excuse the mess. There is a delicate game going on here with where null tags can appear in rbyd trees. Null tags _can_ appear at the end of the tree, and as a terminator of unreachable trunks. Null tags can _not_ appear inside the tree in reachable trunks, as this would violate our right-leaning property and prevent lookupnext from working correctly. Long story short we need to very careful to ensure the lower diverged trunk's null tag is truely unreachable. Otherwise the null tag necessary to terminate the trunk (so that fetch works) breaks things. The solution seems to be keep track of the last _alt_ on the lower-diverged trunk, and use this alt to stich together the two diverged trunks when writing the non-diverged + upper-diverged trunks. This feels very similar to how you swap to remove from tree heaps, which is interesting. The rbyd tests are passing now, but higher-level tests are failing, which isn't the greatest sign...	2024-03-12 15:46:18 -05:00
Christopher Haster	0b6cf7e9a7	Attempting a different algorithm for rbyd range removals The previous attempt to make range removals more rigorous highlighted a pretty significant design flaw: Every removals risks making the tree ~2x taller. In theory this is offset by the fact that removals, well, remove nodes, shrinking the height of the tree, but this isn't reflected in the underlying red-black-yellow structure. Blanket recoloring breaks the red-black-yellow invariants. This isn't the end of the world, we still rebalance during rbyd compaction, but it would be nice if we had stronger guarantees about the structure of rbyds before compaction. Especially since we rely on tree balance to defend our O(n log n) traversal overhead. --- This attempts to reimplement range removals with two separate passes for diverged trunk. The downside is range removals now need, well, two separate passes, even if we don't actually end up with a diverged trunk. The upside is in theory we can preserve the coloring information and related invariants. I would describe this commit as "almost working" and "a mess". There still seems to be some issues with null tags getting stuck in the tree after diverged trunks. On the plus side, our tests are certainly working...	2024-03-12 15:45:55 -05:00
Christopher Haster	d93dce8db2	Deduplicated rbyd append initialization into lfsr_rbyd_prepareappend Most of the lfsr_rbyd_append* functions have the same necessary prologue in order to check the rbyd is fetched, erased, is prefixed with a revision count, etc. Moving this into a common function saves a bit of code: code stack before: 33912 2880 after: 33852 (-0.2%) 2880 (+0.0%)	2024-03-09 14:09:53 -06:00
Christopher Haster	5ce78799af	Added some extra low-level rbyd append helpers - lfsr_rbyd_appendtag - lfsr_rbyd_appenddata - lfsr_rbyd_appendattr_ The main benefit is readability. The second benefit is minor code deduplication: code stack before: 34024 2880 after: 33912 (-0.3%) 2880 (+0.0%)	2024-03-09 01:37:08 -06:00
Christopher Haster	3942d643e5	Renamed /other_ -> a_/b_ I think this does a better job of indicating that we're operating on two different paths simultaneously. At the very least the prefix other_* was kind of ambiguous...	2024-03-07 01:46:13 -06:00
Christopher Haster	5c45f07f1b	Added LFSR_TAG_DIVERGEDDONE instead of reusing LFSR_TAG_RM in appendattr I think this is a bit more readable. Curiously, the bit flip and bit change resulted in a surprising code cost, even though it removes a couple statements. I guess because the sign bit is that much cheaper to predicate on? code stack before: 33980 2880 after: 34024 (+0.1%) 2880 (+0.0%)	2024-03-07 01:05:07 -06:00
Christopher Haster	989a7007aa	Tweaked diverged red stitching to use lfsr_rbyd_p_red There's a hidden story here where I tried to explore yellow stitching on top of red stiching, which may or may not bring the worst-case 2-3-4 height down from ~2x to ~1.3x. But this made the system more complex and harder to reason about balance-wise (we risk destabilizing the tree if we remove more alts than we stich), so droping for now. May revisit. In theory this saves code, but in practice it does not. Still, I think it's a bit more readable and moves all the recoloring preconditions into one place: code stack before: 33976 2880 after: 33980 (+0.0%) 2880 (+0.0%)	2024-03-06 15:58:25 -06:00
Christopher Haster	f62ae0e8fd	Reroute range removal pruning through diverged path swaps I think this was just an oversight when merging/unmerging pruning operations. Lazily finding alts (eagerly swapping) seems to result in better trees based on some napkin sketches. For example, consider this remove, with lazy alts (eager swaps): .-------o-------. .---o---. .---o---. .-----------------o .-o-. .-o-. .-o-. .-o-. .-o-. .-----------o .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .--------o--------. a b c d e f g h i j k l m n o p => a b c d e f g p '-------+-------' remove h=3 And with eager alts (lazy swaps): .-------o-------. .-----------------o .---o---. .---o---. \| o--------. .-o-. .-o-. .-o-. .-o-. .-o-. .--.--------o \| .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. \| \| a b c d e f g h i j k l m n o p => a b c d e f g p '-------+-------' remove h=4 This isn't really rigorous, but without more evidence lazy alts (eager swaps) seem the best option for now. Note that we do _not_ eagerly swap when pruning yellow alts. The two other continue statements in the appendattr loop, one for pruning yellow alts and one for splitting yellow alts, are bookkeeping operations that don't map to real alt visits. We should pretend these alts don't exist when looking at the tree layout. With diverged recoloring, we can't actually hit the yellow-split case, but we can hit the yellow pruning case since it only relies on unreachability. Code cost, uh, I don't really know why this saved code, it's probably just compiler noise: code stack before: 33992 2880 after: 33976 (+0.0%) 2880 (+0.0%)	2024-03-06 15:28:47 -06:00
Christopher Haster	47416c1115	Switched to recoloring + red stitching removals due to diverged coloring bug This was a nasty bug. I was initially concerned that this slipped through our rbyd tests until I realized how excruciatingly rare it is. If, during a range remove: 1. There is a pending yellow split immediately after the diverging alt 2. There is a pending yellow split immediately before the diverging alt 3. The diverging alt takes a black alt in the yellow split 4. There is a red node before the pending split before the diverging alt 5. The two alts in the red node point in different directions We can end up violating our yellow node both-alts-point-same-direction invariant. The tree looks like this: .-------------r-------------. .-o-. .----y---+---. .-o-. .o. .o. .-+-y-. .o. .o. .o. .o. .o. a a a a a a a c e e e e e e e e e e '+' remove Though this diagram doesn't capture the actual alt-layout, which does matter here, so the dbgrbyd.py rendering may be more useful: .-> aa .-> aa .-b-> a .-b-> a \| .-> a \| .-> a .-----------b-b-> a .-----b-b-> a \| .-----> a \| .-> a \| \| .---> a \| .-----b-> a \| .-y-r-b-> a \| \| .---> a \| \| '-> cc <- rm \| \| \| r-b-y-r-b-----b-> ee => y-y-r-b-r-b-> ee <- two yellows! \| \| \| '-> e \| \| '-> e different dirs! \| \| '-------b-> e \| '-b-b-> e should not happen! \| \| '-> e \| \| '-> e \| '---------b-> e \| '-b-> e \| '-> e \| '-> e \| .-> e \| .-> e \| .-b-> e \| .-b-> e \| \| .-> e \| \| .-> e '---------b-b-> e '-------+-b-> e If all of these conditions are met, and we are preserving coloring, we can end up with two yellow splits without an intermediate black alt, implying recursion. But we're of course not recursive, so things just break. If we look at the trunk that is being built during our range removal: read <r => [<r] read >b => [<r >b] read >r => [<r >b >r] read >r => [<r >b >r >r] ^--^------ red+red implies yellow ysplit => [<r >r >b] reorder => [>r >r <b] ^--^------------- yellow-same-dir invariant held read <b => [>r >r <b <b] diverge => [>r >r <b] read <r => [>r >r <b <r] read <r => >r [>r <b <r <r] ^-----------^-- our 4-alt fifo for flips/coloring ysplit => >r [>r <r <b] reorder => >r [<r <r >b] ^--^---------- yellow-same-dir invariant held ^---^------------- yellow-same-dir invariant NOT held though 2 yellows is also a problem The important thing to note is that the diverging alt is effectively deleted in both search paths. If the diverging alt is between two yellow splits, that's not good. If you think about the mapping to the underlying 2-3-4 tree, append is only guaranteed to be tail-recursive because we eagerly split 4-nodes into 2 2-nodes, ensuring that our parent always has a slot available for a split (this is why 2-3 trees are not tail-recursive). But if we delete one of the 2-nodes, and find another 4-node, the parent's slot has already been taken. This is basically the problem we are running into here. A hypothetical 2-3-4-5 tree however... Probably-isomorphic to a 2-3-4-5 tree, there are a couple of possible solutions to this: 1. Increase the fifo to 5(?) alts and recursively propagate recolorings up 2 nodes. Note this would still be bounded and tail-recursive. Our current implementation is basically an isomorphism of recursively propagating recolorings up 1 node after all, if you want to think about it in about the most complicated way possible... Downsides: The increased fifo size means more RAM cost. And the implementation would be complicated as hell. Not to mention error prone. Imagine ~2x the current 15K lines of rbyd tests. It would be bad. 2. Discard split recolorings after a diverged alt. This would be quite a bit simpler, though would still require some annoying state to know if the previous alt diverged. If this state isn't perfect, the above checklist of conditions would just be incremented by 1, making this bug even harder to track down. I'm starting to think that preserving color during range removals is a bit complicated for its own good. Considering that color-preserving range removals aren't even rigorous and don't guarantee a balanced tree, I think this all just needs to be scrapped until a more rigorous solution is found. --- So this commit drops color-preserving range removals, and moves to a simpler paint it black + stitch together alternating red alt strategy when encountering a diverging range removal. Thanks to the red-stitching, the resulting search path is at least tried to be kept as small as possible. This results in the following, not-broken tree: .-> aa .-> aa .-b-> a .-b-> a \| .-> a \| .-> a .-----------b-b-> a .-----b-b-> a \| .-----> a \| .-------> a \| \| .---> a \| \| .---> a \| .-y-r-b-> a \| \| \| .-> a \| \| '-> cc <- rm \| \| \| \| r-b-y-r-b-----b-> ee => y-r-b-r-b-r-b-> ee \| \| \| '-> e \| \| '-----> e \| \| '-------b-> e \| '-------b-b-> e \| \| '-> e \| \| '-> e \| '---------b-> e \| '-b-> e \| '-> e \| '-> e \| .-> e \| .-> e \| .-b-> e \| .-b-> e \| \| .-> e \| \| .-> e '---------b-b-> e '---------b-b-> e It's interesting to note that this bug is so rare that it was only caught by test_dirs_mv_fuzz after 2180 heuristic powerlosses. But it was caught, so that's a good sign. But it would have been better if this was caught in the rbyd tests. I've gone ahead and added a specialized test, test_rbyd_delete_range_rry (and a few other), to prevent a regression, which is very likely. It's more likely than not we'll revisit range removals in the future. On the plus side, since recoloring is simpler than color-preservation, this means less code: code stack before: 34072 2880 after: 33992 (-0.2%) 2880 (+0.0%)	2024-03-05 15:02:56 -06:00
Christopher Haster	bedb65919c	Opportunistically stitch together range removals with red alts This is intending to improve some balancing issues with range removals in our rbyds. Consider the following range removal, this is our current algorithm: .-----------o .-------o-------. \| o-----------. .---o---. .---o---. \| .-----o \| .-o-. .-o-. .-o-. .-o-. .-o-. \| o-----. .-o-. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .--o--. .o. .o. .o. a b c d e f g h i j k l m n o p => a b c d e f g j k l m n o p '-+-' remove Somehow the height of the tree increased! Even though we are only removing nodes. Not great. The reason this happens is because we are trying to stitch together the two search paths that occur when our range diverges. Naively, with binary nodes, this results in a worst case of ~2x the diverged height. If only there was a way to represent a ternary node... Wait, isn't this what our red alts are for? Recall that in a red-black(-yellow) tree, red edges are a coloring that represent a 2-3-4 node with 3 branches. If, as we stitch together our two search paths, we alternate between red and black alts, we can avoid a height increase in the underlying 2-3-4 tree! .-------o-------. .---o---. .---o---. .-----------r-----------. .-o-. .-o-. .-o-. .-o-. .-o-. .-----r-----. .-o-. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .--o--. .o. .o. .o. a b c d e f g h i j k l m n o p => a b c d e f g j k l m n o p '-+-' remove This works great if all our nodes are black. Unfortunately, if we already have red alts, this doesn't always work. We can't connect red alts with red alts, or we risk breaking invariants: .---+-------r .-------o-------. \| \| r-------+---. .---r----. .----r---. \| \| .-+--r \| \| .o. .o. .-r-. .-r-. .o. .o. .o. .o. \| \| o--+-. .o. .o. a b c d e f g h i j k l m n => a b c d e f i j k l m n '-+-' remove If we ignore red alts, and pretend they are black during range removals, we just end up with a slightly permuted tree: .-----------r-----------. .-------o-------. \| .-------r-------. \| .---r----. .----r---. \| \| .----r----. \| \| .o. .o. .-r-. .-r-. .o. .o. .o. .o. \| .--o--. \| .o. .o. a b c d e f g h i j k l m n => a b c d e f i j k l m n '-+-' remove It's tempting to try to stitch red nodes together with yellow alts, but this breaks the invariant that the our parent is never yellow during append, forcing append to be potentially recursive if we encounter a naturally occuring yellow alt. With a hypothetical 5-branch node however... But at least this delays unbalancing when black alts are present. And, since we downgrade any red alts when pruning during range removals, we should end up with more black alts available for opportunistic stitching than in the original tree. --- Surprisingly the code cost ended up breaking even, probably because of some minor code cleanup in the function: before: 34072 2880 after: 34072 (+0.0%) 2880 (+0.0%) Measuring performance with a quick file random-write benchmark showed a noticable but tiny improvement. Though it may be 1. too close of to the noise floor to be trustworthy, 2. not really rigorous, and 3. file random-write may not hit dgenerate range removals. But hey, at least it doesn't show a negative impact on performance.	2024-03-04 12:41:50 -06:00
Christopher Haster	34be5055b4	Fixed mdir drop during compaction breaking fixorphan loop The core problem is that we weren't updating dropped mdirs with weight=0 if the mdir was compacted at the same time. This is hard to notice, because most operations that can drop don't care about the mdir afterwards, but in lfsr_fs_fixorphans this caused the fixorphan loop to think it might still have orphans it could remove. The implementation is very subtle here: - In lfsr_mdir_commit_, if an error occurs during lfsr_mdir_compact__, we need to revert to the original mdir state to allow fallback to mdir split. - In lfsr_mdir_commit_, if an error occurs during lfsr_mdir_commit__ (even after a compact), we need to update the mdir in case a drop reduced the mdir weight to zero. We also need to update the mdir for things like erased state, but this doesn't come into play in the compaction route. Fixed the bug by updating the mdir copy before lfsr_mdir_commit__. Also added asserts to all insert/delete operations in test_mtree.toml. We already had drop-during-compaction tests, but these didn't check that the mdir was updated correctly. The new asserts catch this bug and should prevent a regression.	2024-03-03 13:25:46 -06:00
Christopher Haster	a6a5f43027	Preserve coloring on range removals This rearranges diverged pruning in lfsr_rbyd_appendattr a bit to try to avoid unnecessary r->b recoloring during range removals. Two tweaks: 1. lfsr_rbyd_appendattr now only alternates diverged paths on a black edge. This is equivalent to alternating on the underlying 2-3-4 tree, and means we don't have to worry about overlapping red edges from the two diverged paths interacting with each other in weird ways. 2. Thanks to alternating on a black edge, we can now prune diverged paths before applying our red-black-yellow operations. This means we can avoid the somewhat-hack that was r->b recoloring (if you paint it black, red-yellow operations are skipped so nothing breaks trivially, but you also break your tree balance invariants). So the two diverged paths should remain strictly 2(log n)+1, at least in isolation. A nice side-effect of moving the diverged-prune code is we can deduplicate pruning with yellow-edge prunning. This saves a branch of code, though it will makes things a bit more confusing if anyone tries to use lfsr_rbyd_appendattr as a template for an rbyd implementation without range removals. The new pruning conditions are more complex, resulting in more code cost, but this will be worth it if it results in better tree balance after range removals: before: 33912 2880 after: 34064 (+0.4%) 2880 (+0.0%) Some quick, non-rigorous benchmarks showed a noticable, but tiny improvement in random write performance. Though I'm not sure random writes are likely to hit degenerate range removals...	2024-03-01 21:11:56 -06:00
Christopher Haster	907d6b7038	Reverted the improved (5/2)t+2 rbyd compaction algorithm Now back to our naive compaction algorithm: a_0 = 3t + 4 Why? The main reason is simplicity: - In order to connect the first layer, we need to maintain some state external to lfsr_rbyd_appendcompactattr, mainly rbyd.trunk which needs to be zeroed before compaction. This is a great recipe for bugs. - lfsr_rbyd_appendcompaction needs a couple hacks to avoid redundantly connecting the first layer, while also not missing any lingering attrs/trunks in non-power-of-two trees. Though these hacks at least fall into the stupid + works = not stupid category. - Our interaction with the rcache is bad. Before, our compaction algorithm sort of operated in two modes: 1. Append attrs - Without any rbyd structure, we only need the pcache to append, allowing the rcache to be used entirely to, you know, cache whatever is sourcing our attrs. 2. Build the balanced rbyd tree - At this point we need the rcache to read rbyd trunks to connect everything, but we are also probably done with the attr source. But our improved compaction algorithm muddies step 1., now we need the rcache to connect layer 1, which may thrash our attr source. To be clear, this only happens because we reread the previous trunk for the tag and weight. If we cached everything needed to build the alt-pointer in RAM, we could avoid this, but we currently don't have enough space in our rbyd struct. So I'm reverting for now. Though I've left a comment noting some theoreticals. I think a better design would be to change lfsr_rbyd_appendcompactattr to take an optional array of trunks (along with tags+weights), which could be user configurable to trade RAM cost for metadata density. It would be extra interesting to see such a configurable array tuned to the current maximum stack depth, in theory making it effectively free. But I think this is out-of-scope for now. Now that tag-estimate has gotten complicated, here's a quick matrix to note how attr-estimate changes with this revert (a_0): block-size a_0 a_1 a_inf 512B => 31 25 18 (bytes) 16KiB => 34 27 20 (bytes) 2MiB => 37 30 22 (bytes) 256MiB => 40 32 24 (bytes) Code changes: code stack before: 34132 2880 after: 33912 (-0.6%) 2880 (+0.0%)	2024-02-27 11:19:13 -06:00
Christopher Haster	21d9535a05	Added size-limit to our runtime dependent attr estimate This shouldn't impact most systems, but it's not unreasonable to allow size-limit to be tweaked for the sole purpose of better metadata density. As a plus, if we mount a smaller filesystem, say a 16-bit littlefs, we'd naturaly inherit its attr estimate/metadata density. This is probably the more important side-effect. This math is complicated enough to come with a bit of a code cost: code stack before: 34104 2880 after: 34132 (+0.1%) 2880 (+0.0%) Though if we move size-limit + block-size to compile-time in the future, this actually becomes free.	2024-02-26 17:51:11 -06:00
Christopher Haster	7d232078f2	Limited did generation to 31-bits Otherwise this risks overflowing 31-bit leb128 limits elsewhere in the system. I think at some point I was considering allowing _some_ types to be full 32-bit leb128s, mainly lfs_block_t and lfs_did_t, but at this point it doesn't seem worth the tradeoffs (especially if you can jump to 64-bits/63-bits in the future). It also never worked to be clear.	2024-02-26 17:33:17 -06:00
Christopher Haster	09ebf70bd9	Updated did truncation comment based on the new compaction algorithm Note no code actually changed. The new compaction algorithm _does_ bring the directory estimate down from ~96 -> ~72 bytes, but because we want a power-of-two for cheap division, we floor both of these to ~64 bytes. It's interesting to note that a perfect compaction algorithm _could_ bring the directory estimate down across the power-of-two boundary: ~72 -> ~48 bytes. But fortunately we're not perfect so we don't have care about that.	2024-02-26 17:33:15 -06:00
Christopher Haster	9fcf8e12d8	Adopted a tighter, block-size dependendent attr-estimate The concern right now is small-block filesystems, anything in the 512B to <4KiB range. With such small blocks, and rbyd's relatively high per-attr overhead, there's a real risk that littlefs may just not be able to function without quickly running to metadata limits. I realize these are pretty rare geometries for flash, but they are still common for anything that 1. pretends to be a spinny disks, SD cards, FTLs, eMMCs, etc, and 2. mapping into RAM, which is surprisingly common. It is possible to require this sort of geometry to pretend to be a larger logical block-size, but since this is a regression from the previous version of littlefs, it would be nice to avoid this if possible. Anyways, what actually is this commit. Consider our tag encoding: .---+---+---+- -+- -+- -+- -+---+- -+- -+- -. tag: 2 bytes \| tag \| weight \| size \| weight: <=5 bytes '---+---+---+- -+- -+- -+- -+---+- -+- -+- -' size: <=4 bytes total: <=11 bytes With our current 32-bit (really 31-bit) version of littlefs, the worst case tag encoding is 11 bytes. This doesn't sound that bad, but with our current compaction algorithm we need ~2.5 tags for each attr: 5t 5*11 a_1 = -- + 2 = ---- = 30 bytes 2 2 Are there any additional assumptions we can make to push our attr estimate lower? - tag - Ignoring a complete redesign of our tag encoding (which has already been heavily iterated over), this just needs 2 bytes, which is not that bad. - weight - This is the real painful one because, for the most part, weight=0. But weight _can_ store a full size, in the case it is the root of a file's btree. So this is pretty much stuck at an annoying 5 bytes. I suppose this could be tied to our size-limit. I hadn't thought about that until writing this commit message. Maybe that can be a future improvement, though it won't really have a big effect on most systems. - size/jump - Now this field is interesting. When expressing both the size of tag payloads, and the relative jump offset for alt-pointers, this field should never exceed a single block. We've already pushed this down to 4 bytes at compile time, by assuming at most 28-bit block-sizes, but if we know the block-size, we could in theory push this even lower. This is extra enticing, because the block-sizes where the size/jump field can be shrunk, are _also_ the block-sizes where the metadata density is so critical! So that's what this commit does. For the purpose of compaction estimates (not stack allocations!) we calculate attr estimate based on our runtime-determined block_size. Here are some cutoff points for our new attr estimate: block-size tag-estimate attr-estimate 512B => 9 bytes 25 bytes 16KiB => 10 bytes 27 bytes 2MiB => 11 bytes 30 bytes 256MiB => 12 bytes 32 bytes There is a question of when to actually do this calculation. We always know our block-size, so we could recalculate the attr-estimate every time we need to estimate a compaction. But for now I'm just precalculating the attr estimate in lfs_init and storing in the lfs_t struct. It's only a byte after all. If I did my math correctly, we won't exceed a byte until we have a block-size of 2^1750, at which point we may have other problems. Code changes: code stack lfs_t before: 34068 2880 216 after: 34104 (-0.1%) 2880 (+0.0%) 220 (+1.9%) The jump in lfs_t cost is probably just from a word alignment boundary. In the future, if we have compile-time block-sizes, the entire attr-estimate could even be compile-time.	2024-02-26 17:32:38 -06:00
Christopher Haster	d8d6052d90	Dropped -m/--mleaf-weight from dbg scripts Now that we're assuming a perfect compaction algorithm, and an infinitely compatible mleaf-bits, there really shouldn't be any reason to support non-standard mleaf-bits in our scripts, right? If a configurable mleaf-bits becomes necessary, we can always add this back in the future.	2024-02-26 14:19:27 -06:00
Christopher Haster	23aab1a238	Increased mleaf-bits to account for better compaction algorithms As defined previously, mleaf-bits depended on the attr estimate, which depended on the details of our compaction algorithm: block_size m = ---------- a_0 Assuming t=4, the _minimum_ tag encoding: block_size block_size m = ---------- = ---------- 34 + 4 16 However, with our new compaction algorithm, our attr estimate changes: block_size block_size block_size m = ---------- = ----------- = ---------- a_1 (5/2)4 + 2 12 But tying our mleaf-bits to our attr estimate is a bit fragile. Unlike attr estimate, the calculated mleaf-bits MUST be the same across all littlefs implementations, or else the filesystem may not be mountable. We _could_ store mleaf-bits as an fs attr in the mroot, like we do with name-limit, size-limit, block-size, etc, but I'd prefer to not add fs attrs unless strictly required. Each fs attr adds complexity to mounting, which has a non-zero cost and headache. Instead, we can assume our compaction algorithm is perfect: block_size block_size block_size m = ---------- = ---------- = ---------- a_inf 2*4 8 This isn't actually achievable without unbounded RAM. But just because our current implementation is limited to bounded RAM, does not prevent some other implementation from pushing things further with unbounded RAM. In theory, since this is a perfect compaction algorithm, and builds perfect rbyd trunks, this should be the maximum possible mleaf-bits achievable in littlefs's current design, and should be compatible with any future implementation. --- Worst case, we can always add mleaf-bits as an fs attr retroactively without breaking backwards compatibility. You would just need to assume the above block_size-dependent value if the hypothetical mleaf-bits attr is missing. This is one nice thing about our fs attr system, it's very flexible.	2024-02-26 14:18:04 -06:00
Christopher Haster	d61c7ca407	Improved rbyd compaction algorithm, reduced attr estimate 3t+4 -> (5/2)t+2 The motivation for this is that the rbyd inner node encoding during compaction is kind-of not that great. Our alt encoding is great when the trunk terminates in a tag, which is how it was originally designed to be used: 00000004: data w1 1 61 a <. 00000020: altble 0x300 w1 0x4 -' <-- trunk 00000024: data w1 1 62 b But when used to create an arbitrary binary-tree inner node, the best encoding I can think of is 2 alts + a terminating null tag, which is not that great: 00000004: data w1 1 61 a <. 00000009: data w1 1 62 b <--. 0000000e: altble 0x300 w1 0x4 -' \| <-- trunk 00000012: altble 0x300 w1 0x9 ---' 00000016: null This effects our attr estimate, which is defined as the worst-case on-disk cost of an attr after compaction. This is an important value, as it determines when we split rbyds. And it effectively determines how densely we can store metadata without needing to worry about block overflow issues. In our current compaction algorithm, we connect each attr with a 2 alt + null inner node. Since we are creating a perfectly balanced binary tree, this works out to ~1 inner node per attr. Including the attr's data tag, this gives us: a_0 = 3t + 4 Where t is the tag estimate, currently a 2 byte tag, <=5 byte weight, <=4 byte size, t = 2+5+4 = 11 bytes: a_0 = 311 + 4 = 37 bytes Though this may be vary across different littlefs configurations, 16-bit, 64-bit, etc. --- It would be great if our compaction algorithm could build each trunk perfectly, as each attr is written. In such a case, each attr theoretically only needs ~1 alt and the attr's data tag: a_inf = 2t Or, assuming t = 11 bytes: a_inf = 211 = 22 bytes Unfortunately, as far as I can tell, this fundamentally requires unbounded RAM. You need to keep track of log n previous trunks in order to always build the next trunk perfectly, and log n is > 1. I suppose in theory you could implement a O(n^2) algorithm that repeatedly scans for the previous trunks... But that would be a hilarious regression since the whole point of this work was to reduce compaction from O(n^2) -> O(log n). --- However, we can meet halfway. Consider what happens if we build perfect trunks for only the bottom layer of the rbyd. This may not seem like it will gain much, but remember that in a binary tree, the bottom layer contains ~1/2 of the total nodes in the tree: 3t + 4 2t 5t a_1 = ------ + -- = -- + 2 2 2 2 Or, assuming t = 11 bytes: 5*11 a_1 = ---- + 2 = 30 bytes 2 Not too shabby for a constant amount of RAM. In theory this could be extended to n layers, by keeping a (configurable?) array of previous trunks in RAM during compaction, but this would have diminishing results as we move up the tree. With only needing to keep track of one other trunk, we can even store this in rybd.trunk, which is currently unused during compaction. So zero extra RAM. The resulting compaction looks like the following: 00000004: data w1 1 61 a <. 00000020: altble 0x300 w1 0x4 <--. 00000024: data w1 1 62 b \| 00000029: data w1 1 63 c <. \| 0000002e: altble 0x300 w1 0x29 <. \| 00000032: data w1 1 64 d \| \| 00000037: altble 0x300 w2 0x20 -' \| <-- trunk 0000003b: altble 0x300 w2 0x2e ---' 0000003f: null This does make compaction a bit more complicated, which is reflected in the code size: code stack before: 33856 2880 after: 34068 (+0.6%) 2880 (+0.0%) Some implementation things to note: - Since lfsr_rbyd_appendcompactattr now has state, we need to make sure to zero the trunk before compacting, which adds an annoying bit of bookkeeping everywhere. - I didn't want to add extra state to manage in lfsr_rbyd_appendcompactattr calls, so this implementation only tracks the previous trunk offset, and needs a readtag call to get the tag+weight necessary to build the actual alt pointer. This may have some unpleasant interactions with the rcache, and may be worth revisiting.	2024-02-26 13:40:54 -06:00
Christopher Haster	950285cbd3	Renamed LFSR_RBYD_SHRUB -> LFSR_RBYD_ISSHRUB To match the names of other flags like this, LFSR_DATA_ISIMM, LFSR_MTREE_ISMPTR, LFSR_BSHRUB_ISBNULLORBSPROUTORBPTR, etc. LFSR_RBYD_ISSHRUB was probably just missed in a refactor at some point. The reason for the IS* prefix is to avoid conflicts with related macros.	2024-02-25 22:46:54 -06:00
Christopher Haster	7209ce3bd8	Dropped unused weight tracking in lfsr_rbyd_appendcompactattr I think at some point we were calculating the compact weight in the initial tag layer, but we don't actually use this at all. We recalculate the weight on every layer of our compaction algorithm anyways. Code changes: before: 33864 2880 after: 33856 (-0.0%) 2880 (+0.0%)	2024-02-25 21:47:10 -06:00
Christopher Haster	ea88a48de2	Updated outdated comment on lfsr_data_t's encoding We no longer have a mode field, this has been replaced by the top 2 bits of data.size.	2024-02-25 12:36:22 -06:00
Christopher Haster	692810e18e	Reverted lfsr_data_t lazily encoded leb128s - It didn't save code. - An inlined buffer is potentially more useful, even if only marginally, and, uh, unproven yet. - Requiring lfs_toleb128 in a readonly implementation is a hard ask.	2024-02-25 12:31:32 -06:00
Christopher Haster	415e148f62	Replaced inlined lfsr_data_t with a lazily encoded leb128 The idea is that we can save on the cost of calling lfs_toleb128 everywhere we commit leb128s, by lazily encoding during progdata. I original thought this would have too many small problems, but: 1. We can actually implement slice surprisingly easily by just shifting the internal word 7 bits. This emulates byte-level slicing in the encoded leb128. This enables read/cmp, so we can implement all of the lfsr_data_t functions, though it does make lfs_toleb128 required for a readonly implementation, which isn't great. Sufficient creativity with ifdefs likely makes this a non-problem though. 2. There's really very limited use cases for non-leb128 inlined datas. We can use it to encode the version and compatflags during lfs_format, but that's about it. And lfs_format is definitely not on the stack hot-path, so there's no reason to not use on-stack buffers for these. The original motivation for this change was noticing a surprising amount of code savings related to lazy leb128 encoding in another lfsr_data_t refactor. Unfortunately this savings does not seem reproducible: code stack before: 33864 2880 after: 33912 (+0.1%) 2888 (+0.3%) But that's ok, this is closer to what I expected. The lfs_sizeleb128 call we need to predict the leb128 size is close to the same cost as calling lfs_toleb128 so the savings isn't really that much.	2024-02-25 12:31:28 -06:00
Christopher Haster	5005db2b4e	Moved erase into lfs_alloc, mostly This doesn't really help us all that much right now, but will be useful for the future-planned block map and being able to cache pre-erased blocks. Though the lack of erasing when allocating new mdirs raises some questions... Oh well, future problems. Code changes: code stack before: 33856 2880 after: 33864 (+0.0%) 2880 (+0.0%)	2024-02-25 11:18:17 -06:00

1 2 3 4 5 ...

1443 Commits