littlefs

Author	SHA1	Message	Date
Christopher Haster	81ccfbccd0	Dropped -x/--device from dbg*.py scripts This hasn't really proven useful. At one point showing the cksums in dbgrbyd.py was useful, but this is now possible and easier with dbgblock.py -x/--cksum.	2024-04-28 13:21:46 -05:00
Christopher Haster	86a8582445	Tweaked canonical altn to point to itself By definition, altns should never be followed, so it doesn't really matter where they point. But it's not like they can point literally nowhere, so where should they point? A couple options: 1. jump=jump - Wherever the old alt pointed - Easy, literally a noop - Unsafe, bugs could reveal outdated parts of the tree - Encoding size eh 2. jump=0 - Point to offset=0 - Easier, +0 code - Safer, branching to 0 should assert - Worst possible encoding size 3. jump=itself - Point to itself - A bit tricky, +4 code - Safe, should assert, even without asserts worst case infinite loop - Optimal encoding size An infinite loop isn't the best failure state, but we can catch this with an assert, which we would need for jump=0 anyways. And this is only a concern if there are other fs bugs. jump=0 is actually slightly worse if asserts are disabled, since we'd end up reading the revision count as garbage. Adopting jump=itself gives us the optimal 4-byte encoding: altbn w0 = 40 00 00 00 '-+-' ^ ^ '----\|--\|-- tag = altbn '--\|-- weight = 0 '-- jump = itself (branch - 0) This requires tweaking the alt encoder a bit, to avoid relative encoding jump=0s, but this is pretty cheap: code stack jump=jump: 34068 2864 jump=0: 34068 (+0.0%) 2864 (+0.0%) jump=itself: 34072 (+0.0%) 2864 (+0.0%) I thought we may need to also tweak the decoder, so later trunk copies don't accidentally point to the old location, but humorously our pruning kicks in redundantly to reset altbn's jump=itself on every trunk. Note lfsr_rbyd_lookupnext was also rearranged a bit to make it easier to assert on infinite loops and this also added some code. Probably just due to compiler noise: code stack before: 34068 2864 after: 34076 (+0.0%) 2864 (+0.0%) Also note that we still accept all of the above altbn encoding options. This only affects encoding and dbg scripts.	2024-04-28 13:21:46 -05:00
Christopher Haster	faf8c4b641	Tweaked alt-tag encoding to match color/dir naming order This is mainly to avoid mistakes caused by names/encodings disagreeing: LFSR_TAG_ALT 0x4kkk v1cd kkkk -kkk kkkk ^ ^^ '------+-----' '-\|\|--------\|------- valid bit '\|--------\|------- color '--------\|------- dir '------- key Notably, the LFSR_TAG_ALT() macro has already caused issues by being both 1. ambiguous, and 2. not really type-checkable. It's easy to get the order wrong and things not really break, just behave poorly, it's really not great! To be honest the exact order is a bit arbitrary, the color->dir naming appeared by accident because I guess it felt more natural. Maybe because of English's weird implicit adjective ordering? Maybe because of how often conditions show up as the last part of the name in other instruction sets? At least one plus is that this moves the dir-bit next to the key. This makes it so all of the condition information is encoding is the lowest 13-bits of the tag, which may lead to minor optimization tricks for implementing flips and such. Code changes: code stack before: 34080 2864 after: 34068 (-0.0%) 2864 (+0.0%)	2024-04-28 13:21:41 -05:00
Christopher Haster	37c45e1afc	Fixed coloring conflicts in rbyd tree renderers A bit of a hack, but rather than handling conditional alt branches, our dbg rbyd tree renderers just represent single-pointer alts as an alt with both branches pointing to the place. Unfortunately, the two branches technically have different colors. This resulted in a bit of contention when chosing how to color the tree. Basically Python's dict ordering would determine which color won. Which was a bit confusing when dbgrbyd.py displayed different tree colorings for the same rbyd. dbgrbyd.py should be idempotent! This is solved by adding another hack to check explicitly for same-destination branches.	2024-04-09 20:04:14 -05:00
Christopher Haster	8a646d5b8e	Added dbgtag.py for easy tag decoding on the command-line Example: $ ./scripts/dbgtag.py 0x3001 cksum 0x01 dbgtag.py inherits most of crc32c.py's decoding options. The most useful probably being -x/--hex: $ ./scripts/dbgtag.py -x e1 00 01 8a 09 altbgt 0x100 w1 -1162 dbgtag.py also supports reading from a block device if either -b/--block-size or --off are provided. This is mainly for consistency with the other dbg*.py scripts: $ ./scripts/dbgtag.py disk -b4096 0x2.1e4 bookmark w1 1 This should help when debugging and finding a raw tag/alt in some register. Manually decoding is just an unnecessary road bump when this happens.	2024-04-01 16:29:13 -05:00
Christopher Haster	54a03cfe3b	Enabled both pruning/non-pruning dbg reprs, -t/--tree and -R/--rbyd Now that altns/altas are more important structurally, including them in our dbg script's tree renderers is valuable for debugging. On the other hand, they do add quite a bit of visual noise when looking at large multi-rbyd trees topologically. This commit gives us the best of both worlds by making both tree renderings available under different options: -t/--tree, a simplified rbyd tree renderer with altn/alta pruning: .-> 0 reg w1 4 .-+-> uattr 0x01 2 \| .-> uattr 0x02 2 .---+-+-> uattr 0x03 2 \| .-> uattr 0x04 2 \| .-+-> uattr 0x05 2 \| .-+---> uattr 0x06 2 +-+-+-+-+-> 1 reg w1 4 \| \| '-> 2 reg w1 4 \| '---> uattr 0x01 2 '---+-+-+-> uattr 0x02 2 \| \| '-> uattr 0x03 2 \| '-+-> uattr 0x04 2 \| '-> uattr 0x05 2 \| .-> uattr 0x06 2 \| .-+-> uattr 0x07 2 \| \| .-> uattr 0x08 2 '-+-+-> uattr 0x09 2 -R/--rbyd, a full rbyd tree renderer: .---> 0 reg w1 4 .---+-+-> uattr 0x01 2 \| .---> uattr 0x02 2 .-+-+-+-+-> uattr 0x03 2 \| .---> uattr 0x04 2 \| .-+-+-> uattr 0x05 2 \| .-+---+-> uattr 0x06 2 +---+-+-+-+-+-> 1 reg w1 4 \| \| '-> 2 reg w1 4 \| '-----> uattr 0x01 2 '-+-+-+-+-+-+-> uattr 0x02 2 \| \| '---> uattr 0x03 2 \| '---+-+-> uattr 0x04 2 \| '---> uattr 0x05 2 \| .---> uattr 0x06 2 \| .-+-+-> uattr 0x07 2 \| \| .-> uattr 0x08 2 '-----+---+-> uattr 0x09 2 And of course -B/--btree, a simplified B-tree renderer (more useful for multi-rbyds): +-> 0 reg w1 4 \| uattr 0x01 2 \| uattr 0x02 2 \| uattr 0x03 2 \| uattr 0x04 2 \| uattr 0x05 2 \| uattr 0x06 2 \|-> 1 reg w1 4 '-> 2 reg w1 4 uattr 0x01 2 uattr 0x02 2 uattr 0x03 2 uattr 0x04 2 uattr 0x05 2 uattr 0x06 2 uattr 0x07 2 uattr 0x08 2 uattr 0x09 2	2024-04-01 16:23:31 -05:00
Christopher Haster	abe68c0844	rbyd-rr: Reworking rbyd range removal to try to preserve rby structure This is the start of (yet another) rework of rybd range removals, this time in an effort to preserve the rby structure that maps to a balanced 2-3-4 tree. Specifically, the property that all search paths have the same number of black edges (2-3-4 nodes). This is currently incomplete, as you can probably tell from the mess, but this commit at least gets a working altn/alta encoding in place necessary for representing empty 2-3-4 nodes. More on that below. --- First the problem: My assumption, when implementing the previous range removal algorithms, was that we only needed to maintain the existing height of the tree. The existing rbyd operations limit the height to strictly log n. And while we can't _reduce_ the height to maintain perfect balance, we can at least avoid _increasing_ the height, which means the resulting tree should have a height <= log n. Since our rbyds are bounded by the block_size b, this means worst case our rbyd can never exceed a height <= log b, right? Well, not quite. This is true the instance after the remove operation. But there is an implicit assumption that future rbyd operations will still be able to maintain height <= log n after the remove operation. This turns out to not be true. The problem is that our rbyd appends only maintain height <= log n if our rby structure is preserved. If the rby structure is broken, rbyd append assumes an rby structure that doesn't exist, which can lead to an increasingly unbalanced tree. Consider this happily balanced tree: .-------o-------. .--------o .---o---. .---o---. .---o---. \| .-o-. .-o-. .-o-. .-o-. .-o-. .-o-. \| .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. \| a b c d e f g h i j k l m n o p => a b c d e f g h i '------+------' remove After a range removal it looks pretty bad, but note the height is still <= log n (old n not the new n). We are still <= log b. But note what happens if we start to insert attrs into the short half of the tree: .--------o .---o---. \| .-o-. .-o-. \| .o. .o. .o. .o. \| a b c d e f g h i .-----o .--------o .-+-r .---o---. \| \| \| \| .-o-. .-o-. \| \| \| \| .o. .o. .o. .o. \| \| \| \| a b c d e f g h i j'k'l' .-------------o .---o .---+-----r .--------o .-o .-o .-o .-+-r .---o---. \| \| \| \| \| \| \| \| \| \| .-o-. .-o-. \| \| \| \| \| \| \| \| \| \| .o. .o. .o. .o. \| \| \| \| \| \| \| \| \| \| a b c d e f g h i j'k'l'm'n'o'p'q'r' Our right side is generating a perfectly balanced tree as expected, but the left side is suddenly twice as far from the root! height(r')=3, height(a)=6! The problem is when we append l', we don't really know how tall the tree is. We only know l' has one black edge, which assuming rby structure is preserved, means all other attrs must have one black edge, so creating a new root is justified. In reality this just makes the tree grow increasingly unbalanced, increasing the height of the tree by worst case log n every range removal. --- It's interesting to note this was discovered while debugging test_fwrite_overwrite, specifically: test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10 It turns out the append fragments -> delete fragments -> append/carve block + becksum loop contains the perfect sequence of attrs necessary to turn this tree inbalance into a linked-list! .-> 0 data w1 1 .-b-> 1 data w1 1 \| .-> 2 data w1 1 .-b-b-> 3 data w1 1 \| .-> 4 data w1 1 \| .-b-> 5 data w1 1 \| \| .-> 6 data w1 1 .---b-b-b-> 7 data w1 1 \| .-> 8 data w1 1 \| .-b-> 9 data w1 1 \| \| .-> 10 data w1 1 \| .-b-b-> 11 data w1 1 \| .-b-----> 12 data w1 1 .-y-y-------> 13 data w1 1 \| .-> 14 data w1 1 .-y---------y-> 15 data w1 1 \| .-> 16 data w1 1 .-y-----------y-> 17 data w1 1 \| .-> 18 data w1 1 .-y-------------y-> 19 data w1 1 \| .-> 20 data w1 1 .-y---------------y-> 21 data w1 1 \| .-> 22 data w1 1 .-y-----------------y-> 23 data w1 1 \| .-> 24 data w1 1 .-y-------------------y-> 25 data w1 1 \| .---> 26 data w1 1 \| \| .-> 27-2047 block w2021 10 b-------------------r-b-> becksum 5 Note, to reproduce this you need to step through with a breakpoint on lfsr_bshrub_commit. This only shows up in the file's intermediary btree, which at the time of writing ends up at block 0xb8: $ ./scripts/test.py \ test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10 \ -ddisk --gdb -f $ ./scripts/watch.py -Kdisk -b \ ./scripts/dbgrbyd.py -b4096 disk 0xb8 -t (then b lfsr_bshrub_commit and continue a bunch) --- So, we need to preserve the rby structure. Note pruning red/yellow alts is not an issue. These aren't black, so we aren't changing the number of black edges in the tree. We've just effectively reduced a 3/4 node into a 2/3 node: .-> a .---b-> b .-> a <- 2 black \| .---> c .-b-> b \| \| .-> d \| .-> c b-r-b-> e <- rm => b-b-> d <- 2 black The tricky bit is pruning black alts. Naively this changes the number of black edges/2-3-4 nodes in the tree, which is bad: .-> a .-b-> b .-> a <- 2 black \| .-> c .-b-> b b-b-> d <- rm => b---> c <- 1 black It's tempting to just make the alt red at this point, effectively merging the sibling 2-3-4 node. This maintains balance in the subtree, but still removes a black edge, causing problems for our parent: .-> a .-b-> b .-> a <- 3 black \| .-> c .-b-> b .-b-b-> d \| .-> c \| .-> e .-b-b-> d \| .-b-> f \| .---> e \| \| .-> g \| \| .-> f b-b-b-> h <- rm => b-r-b-> g <- 2 black In theory you could propagate this all the way up to the root, and this _would_ probably give you a perfect self-balancing range removal algorithm... but it's recursive... and littlefs can't be recursive... .-> s .-b-> t .-> s \| .-> u .-----b-> t .-b-b-> v \| .-> u \| .-> w \| .---b-> v \| .-b-> x \| \| .---> w \| \| \| \| .-> y \| \| \| \| \| \| \| .-> x b-b- ... b-b-b-> z <- rm => r-b-r-b- ... r-b-r-b-> y So instead, an alternative solution. What if we allowed black alts that point nowhere? A sort of noop 2-3-4 node that serves only to maintain the rby structure? .-> a .-b-> b .-> a <- 2 black \| .-> c .-b-> b b-b-> d <- rm => b-b-> c <- 2 black I guess that would technically make this 1-2-3-4 tree. This does add extra overhead for writing noop alts, which are otherwise useless, but it seems to solve most of our problems: 1. does not increase the height of the tree, 2. maintains the rby structure, 3. tail-recursive. And, thanks to the preserved rby structure, we can say that in the worst case our rbyds will never exceed height <= log b again, even with range removals. If we apply this strategy to our original example, you can see how the preserved rby structure sort of "absorbs" new red alts, preventing further unbalancing: .-------o-------. .--------o .---o---. .---o---. .---o---. o .-o-. .-o-. .-o-. .-o-. .-o-. .-o-. o .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. o a b c d e f g h i j k l m n o p => a b c d e f g h i '------+------' remove Reinserting: .--------o .---o---. o .-o-. .-o-. o .o. .o. .o. .o. o a b c d e f g h i .----------------o .---o---. o .-o-. .-o-. .------o .o. .o. .o. .o. .o. .-+-r a b c d e f g h i j'k'l'm' .----------------------------o .---o---. .-------------o .-o-. .-o-. .---o .---+-----r .o. .o. .o. .o. .-o .-o .-o .-o .-+-r a b c d e f g h i j'k'l'm'n'o'p'q'r's' Much better! --- This commit makes some big steps towards this solution, mainly codifying a now-special alt-never/alt-always (altn/alta) encoding to represent these noop 1 nodes. Technically, since null (0) tags are not allowed, these already exist as altle 0/altgt 0 and don't need any extra carve-out encoding-wise: LFSR_TAG_ALT 0x4kkk v1dc kkkk -kkk kkkk LFSR_TAG_ALTN 0x4000 v10c 0000 -000 0000 LFSR_TAG_ALTA 0x6000 v11c 0000 -000 0000 We actually already used altas to terminate unreachable tags during range removals, but this behavior was implicit. Now, altns have very special treatment as a part of determining bounds during appendattr (both unreachable gt/le alts are represented as altns). For this reason I think the new names are warranted. I've also added these encodings to the dbg*.py scripts for, well, debuggability, and added a special case to dbgrby.py -j to avoid unnecessary altn jump noise. As a part of debugging, I've also extended dbgrbyd.py's tree renderer to show trivial prunable alts. Unsure about keeping this. On one hand it's useful to visualize the exact alt structure, on the other hand it likely adds quite a bit of noise to the more complex dbg scripts. The current state of things is a mess, but at least tests are passing! Though we aren't actually reclaiming any altns yet... We're definitely _not_ preserving the rby structure at the moment, and if you look at the output from the tests, the resulting tree structure is hilarious bad. But at least the path forward is clear.	2024-04-01 16:23:14 -05:00
Christopher Haster	62de865103	Eliminated null tag reachability in dbg scripts This was throwing off tree rendering in dbglfs.py, we attempt to lookup the null tag because we just want to first tag in the tree to stitch things together. Null tag reachability is tricky! You only notice if the tree happens to create a hole, which isn't that common. I think all lookup implementations should have this max(tag, 1) pattern from now on to avoid this. Note that most dbg scripts wouldn't run into this because we usually use the traversal tag+1 pattern. Still, the inconsistency in impl between the dbg scripts and lfs.c is bad.	2024-03-20 13:31:16 -05:00
Christopher Haster	3eb4ccdde7	Fixed block/becksum related tree rendering in dbglfs.py The were a couple issues mixing high-level and low-level bptrs representations: 1. The high-level vs low-level block representation needed to have ordering priority over the actual tag in order for inner-node tree renderings to make sense. 2. We need to flatten all tags to BLOCK/DATA/other data tags when _not_ rendering inner-nodes so interleaved becksums/other util tags don't mess with the tree rendering. Now things look like this: littlefs v0.0 4096x256 0x{0,1}.a0c, rev 15, weight 0.512 {0000,0001}: -1.1 hello reg 1113, btree 0x27.8d8 0000.0a11: + 0-1112 btree w1113 9 0027.08d8: \| .-+ 0-1112 block w1113 11 '-+-\| > becksum 5 '-> 0-1112 block w1113 0x95.0 1113 Maybe a bit weird looking at first, but correct.	2024-03-20 13:30:33 -05:00
Christopher Haster	9366674416	Replaced separate BLOCKSIZE/BLOCKCOUNT attrs with single GEOMETRY attr This saves a bit of rbyd overhead, since these almost always come together. Perhaps more interesting, it carves out space for storing mroot-anchor redundancy information. This uses the lowest two bits of the GEOMETRY tag to indicate how many redundant blocks belong to the mroot-anchor: LFSR_TAG_GEOMETRY 0x0008 v--- ---- ---- 1-rr This solves a bit of a hole in our redundancy encoding. The plan is for this info to be stored in the lowest two bits of every pointer, but the mroot-anchor doesn't really have a pointer. Though this is just future plans. Right now the redundancy information is unused. Current implementations should use the GEOMETRY tag 0x0009, which you may notice implied redundancy level-1. This matches our current 2-block per mdir default. Geometry attr encoding: .---+---+---+---. tag (0x0008+r): 1 be16 2 bytes \|x0008+r\| 0 \|siz\| weight (0): 1 leb128 1 byte +---+---+---+---+ size: 1 leb128 1 byte \| block_size \| block_size: 1 leb128 <=4 bytes +---+- -+- -+- -+- -. \| block_count \| block_count: 1 leb128 <=5 bytes '---+- -+- -+- -+- -' total: <=13 bytes Code changes: code stack before: 34092 2880 after: 34040 (-0.2%) 2880 (+0.0%)	2024-03-19 15:02:02 -05:00
Christopher Haster	130281ac05	Reworked compat flags a bit Now with a bit more granularity for possibly-future-optional on-disk data structures: LFSR_RCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) LFSR_RCOMPAT_MLEAF 0x0002 ---- ---- ---- --1- LFSR_RCOMPAT_MSHRUB 0x0004 ---- ---- ---- -1-- (reserved) LFSR_RCOMPAT_MTREE 0x0008 ---- ---- ---- 1--- LFSR_RCOMPAT_BSPROUT 0x0010 ---- ---- ---1 ---- LFSR_RCOMPAT_BLEAF 0x0020 ---- ---- --1- ---- LFSR_RCOMPAT_BSHRUB 0x0040 ---- ---- -1-- ---- LFSR_RCOMPAT_BTREE 0x0080 ---- ---- 1--- ---- LFSR_RCOMPAT_GRM 0x0100 ---- ---1 ---- ---- LFSR_WCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) LFSR_OCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) This adds a couple reserved flags: - LFSR_COMPAT_NONSTANDARD - This flag will never be set by a standard version of littlefs. The idea is to allow implementations with non-standard extensions a way to signal potential compatibility issues without worrying about future compat flag conflicts. This is limited to a single bit, but hey, it's not like it's possible to predict all future extensions. If a non-standard extension needs more granularity, reservations of standard compat flags can always be requested, even if they don't end up implemented in standard littlefs. (Though such reservations will need a strong motivation, it's not like these flags are free). - LFSR_RCOMPAT_MSHRUB - In theory littlefs supports a shrubbed mtree, where the root is inlined into the mroot. But in practice this turned out to be more complicated than it was worth. Still, a future implementation may find an mshrub useful, so preserving a compat flag for such a case makes sense. That being said, I have no plans to add support for mshrubs even in the dbg scripts. I would like the expected feature-set for debug tools to be well-defined, but also conservative. This gets a bit tricky with theoretical features like the mshrubs, but until mshrubs are actually implemented in littlefs, I would like to consider them non-standard. The implication of this is that, while LFSR_RCOMPAT_MSHRUB is currently "reserved", it may be repurposed for some other meaning in the future. These changes also rename COMPATFLAGS -> COMPAT, and reorder the tags by decreasing importance. This ordering seems more valuable than the original intention of making rcompat/wcompat a single bit flip. Implementation-wise, it's interesting to note the internal-only LFSR_COMPAT_OVERFLOW flag. This gets set when out-of-range bits are set on-disk, and allows us to detect unrepresentable compat flags without too much extra complexity. The extra encoding/decoding overhead does add a bit of cost though: code stack before: 33944 2880 after: 34124 (+0.5%) 2880 (+0.0%)	2024-03-16 17:26:04 -05:00
Christopher Haster	d8d6052d90	Dropped -m/--mleaf-weight from dbg scripts Now that we're assuming a perfect compaction algorithm, and an infinitely compatible mleaf-bits, there really shouldn't be any reason to support non-standard mleaf-bits in our scripts, right? If a configurable mleaf-bits becomes necessary, we can always add this back in the future.	2024-02-26 14:19:27 -06:00
Christopher Haster	23aab1a238	Increased mleaf-bits to account for better compaction algorithms As defined previously, mleaf-bits depended on the attr estimate, which depended on the details of our compaction algorithm: block_size m = ---------- a_0 Assuming t=4, the _minimum_ tag encoding: block_size block_size m = ---------- = ---------- 34 + 4 16 However, with our new compaction algorithm, our attr estimate changes: block_size block_size block_size m = ---------- = ----------- = ---------- a_1 (5/2)4 + 2 12 But tying our mleaf-bits to our attr estimate is a bit fragile. Unlike attr estimate, the calculated mleaf-bits MUST be the same across all littlefs implementations, or else the filesystem may not be mountable. We _could_ store mleaf-bits as an fs attr in the mroot, like we do with name-limit, size-limit, block-size, etc, but I'd prefer to not add fs attrs unless strictly required. Each fs attr adds complexity to mounting, which has a non-zero cost and headache. Instead, we can assume our compaction algorithm is perfect: block_size block_size block_size m = ---------- = ---------- = ---------- a_inf 2*4 8 This isn't actually achievable without unbounded RAM. But just because our current implementation is limited to bounded RAM, does not prevent some other implementation from pushing things further with unbounded RAM. In theory, since this is a perfect compaction algorithm, and builds perfect rbyd trunks, this should be the maximum possible mleaf-bits achievable in littlefs's current design, and should be compatible with any future implementation. --- Worst case, we can always add mleaf-bits as an fs attr retroactively without breaking backwards compatibility. You would just need to assume the above block_size-dependent value if the hypothetical mleaf-bits attr is missing. This is one nice thing about our fs attr system, it's very flexible.	2024-02-26 14:18:04 -06:00
Christopher Haster	6c9ce4e8f1	Reverted raw-byte comparisons for rbyd/btree namelookups Implementing raw-byte name comparisons ended up having more negative effects on implementation requirements than I thought it would: 1. We would never actually concatenate the did + name, as that would require dynamic memory. Instead we need to express the concatenated relationship using our internal lfsr_data_t representation. I thought this wouldn't be too bad since we already have a concatenated lfsr_data_t representation, but: 1. It was limited in scope, specifically only lfsr_data_prog was supported. It's actually not even possible to implement lfsr_data_read (I think) since we can't mutate the indirect lfsr_data_ts. 2. It's not actually required. We really only use our concatenated representation to coalesce file fragments. You could in theory omit this representation at the cost of not being able to limit inlined shrub overhead. Asking all future littlefs implementations to implement a concatenated data representation (or dynamically allocate D:) for the basic task of file-name lookup is sort of a big ask. 2. A readonly implementation suddenly needs a toleb128 function. Which is an unexpected implication of requiring raw-byte leb128 comparisons for file-name lookup. 3. Raw-byte comparisons require that dids are always stored in their canonical encoding (smallest leb128), though this is probably a good idea anyways. And for what? A theoretical future-planned feature (content-tree)? Let's think about the hypothetical content-tree for a second: 1. It's an advanced, opt-in feature. Which means higher code/storage-cost should be expected. 2. Basicall all littlefs implementations need file-name lookup, so keeping file-name lookup cheap is a much higher priority than the opt-int content-tree. 3. Worst case, the content-tree, and any future named trees, can just set did=0. This will cost one byte per name (and may leave room for future extensions). So I'm reverting this for now. There is still time before stabilization, so if it becomes clear there is a better way to implement name lookups, we can still change this. (Optimistically, the content-tree may be implemented before stabilization, since it currently looks like it's required for data redundancy). Code changes: code stack before: 34292 2896 after: 34028 (-0.8%) 2896 (+0.0%)	2024-02-24 13:55:32 -06:00
Christopher Haster	94f7d2549f	Changed rbyd/btree namelookups to only compare raw bytes This is a simplification of the rbyd/btree layers, but implies behavioral changes to the mtree/mdir layers. Instead of ordering by leb128 did + name: 82 02 61 61 61 < 81 04 62 62 62 (0x102, "aaa") (0x201, "bbb") We now order by the raw encoding, lexicographically: 82 02 61 61 61 > 81 04 62 62 62 (0x102, "aaa") (0x201, "bbb") This may be unintuitive, but note: 1. Files _within_ a directory are still ordered, since they share a did prefix. 2. We don't really care about the relative ordering of dids, just that they are unique. Changing the ordering at this level does not interfere with any of our did-related functions. 3. The only thing we may care about is that the root, did=0, is the first mtree entry. This is still true. No leb128 encoding is < 0x00 even after encoding. The motivation for this change is to allow for other named-btrees in the system that may used non-did-prefixed names. At least one of these makes sense for a sort of "content-tree" (cksum -> data block mapping). As a plus, this change makes it possible to compare names and do btree namelookups without needing to decode the leb128 prefix. Although I'm struggling a bit to figure out exactly where this is useful... One downside, this ordering only works if dids are always stored in their canonical encoding, that is, the smallest leb128 encoding possible for a given did. I think this is a reasonable requirement for just our dids. Another downside is this did add a decent chunk of code. I did try limiting the changes to lfsr_data_namecmp, but it didn't have much impact. I guess most of the cost comes from the reworked lfsr_data_cmp function, which, to be fair, is quite a bit more complicated now (it now supports limited data<=>data comparisons): code stack before: 34148 2896 namecmp: 34324 (+0.5%) 2896 (+0.0%) after: 34340 (+0.6%) 2896 (+0.0%)	2024-02-23 17:00:19 -06:00
Christopher Haster	5128522fe2	Renamed script flag -Z/--depth -> -z/--depth Previously, the intention of upper case -Z was the match -W/--width and -H/--height, which are uppercase to avoid conflicts with -h/--help. But -z/--depth isn't _really_ related to -W/-H. This avoids a conflict with -Z/--lebesgue, but may conflict with -z/--cat. Fortunately we don't currently have any conflicts with the latter. Since -z/--depth and -Z/--lebesgue are both disk-layout related, the risk of conflicts are probably much higher there.	2024-02-14 14:04:45 -06:00
Christopher Haster	2d2c0f19ff	Renamed block-size flag in scripts from -B -> b So now these should be invoked like so: $ ./scripts/dbglfs.py -b4096x256 disk The motivation for this change is to better match other filesystem tooling. Some prior art: - mkfs.btrfs - -n/--nodesize => node size in bytes, power of 2 >= sector - -s/--sectorsize => sector size in bytes, power of 2 - zfs create - -b => block size in bytes - mkfs.xfs - -b => block size in bytes, power of 2 >= sector - -s => sector size in bytes, power of 2 >= 512 - mkfs.ext[234] - -b => block size in bytes, power of 2 >= 1024 - mkfs.ntfs - -c/--cluster-size => cluster size in bytes, power of 2 >= sector - -s/--sector-size => sector size in bytes, power of 2 >= 256 - mkfs.fat - -s => cluster size in sectors, power of 2 - -S => sector size in bytes, power of 2 >= 512 Why care so much about the flag naming for internal scripts? The intention is for external tooling to eventually use the same set of flags. And maybe even create publically consumable versions of the dbg scripts. It's important that if/when this happens flags stay consistent. Everyone familiar with the ssh -p/scp -P situation knows how annoying this can be. It's especially important for littlefs's -b/--block-size flag, since this will likely end up used everywhere. Unlike other filesystems, littlefs can't mount without knowing the block-size, so any tool that mounts littlefs is going to need the -b/--block-size flag. --- The original motivation for -B was to avoid conflicts with the -b/--by flag that was already in use in all of the measurement scripts. But these are internal, and not really littlefs-related, so I don't think that's a good reason any more. Worst case we can just make the --by flag -B, or just not have a short form (--by is only 4 letters after all). Somehow we ended up with no scripts needing both -b/--block-size and -b/--by so far. Some other conflicts/inconsistencies tweaks were needed, here are all the flag changes: - -B/--block-size -> -b/--block-size - -M/--mleaf-weight -> -m/--mleaf-weight - -b/--btree -> -B/--btree - -C/--block-cycles -> -c/--block-cycles (in tracebd.py) - -c/--coalesce -> -S/--coalesce (in tracebd.py) - -m/--mdirs -> -M/--mdirs (in dbgbmap.py) - -b/--btrees -> -B/--btrees (in dbgbmap.py) - -d/--datas -> -D/--datas (in dbgbmap.py)	2024-02-14 12:45:30 -06:00
Christopher Haster	bea13dcf8e	Use sign bit of rbyd.trunk to indicate shrubness of rbyds Shrubness should have always been a property of lfsr_rbyd_t. You know you've made a good design decision when things just sort of fall into place and the code somehow becomes cleaner. The downside of this change is accessing rbyd trunks requires a mask, which is annoying, but the upside is we don't need to signal shrubness via extra booleans in internal functions anymore. The funny thing is, the actual motivation for this change is was just to free up a bit in our tag encoding. Simplifying some of the internal functions was just a nice side effect. code stack before: 33940 2928 after: 33928 (-0.0%) 2912 (-0.5%)	2024-02-03 18:16:45 -06:00
Christopher Haster	15593ccc49	Renamed scratch files -> orphan files I was originally avoiding naming these orphans, as they're _technically_ not orphans. They do exist in the mtree. But the name orphan just describes this types purpose too well. This does lead to some confusing terms, such as the fact that orphan files can be non-orphaned if there are any in-device references. But I think this makes sense? - LFSR_TAG_SCRATCH -> LFSR_TAG_ORPHAN - LFSR_F_UNCREAT -> LFSR_F_ORPHAN - test_fscratch.toml -> test_forphan.toml	2024-02-03 18:15:38 -06:00
Christopher Haster	ba505c2a37	Implemented scratch file basics "Scratch files" are a new file type added to solve the zero-sized file problem. Though they have a few other uses that may be quite valuable. The "zero-sized file problem" is a common surprise for users, where what seems like a simple file create+write operation: lfs_file_open(&lfs, &file, "hi", LFS_O_WRONLY \| LFS_O_CREAT \| LFS_O_EXCL); lfs_file_write(&lfs, &file, "hello!", strlen("hello!")); lfs_file_close(&lfs, &file); Can end up create a zero-sized file under powerloss, breaking user assumptions and their code. The tricky thing is that this is actually correct behavior as defined by POSIX. `open` with O_CREAT creats a file entry immediately, which is initially zero-sized. And the fact that power can be lost between `open` and `close` isn't really avoidable. But this is a common enough footgun that it's probably worth deviating from POSIX here. But how to avoid zero-sized files exactly? First thought: Delay the file creation until sync/close, tracking uncreated files in-device until then. This solves the problem and avoids any intermediary state if we lose power, but came with a number of headaches: 1. Since we delay file creation, we don't immediately write the filename to disk on open. This implies we need to keep the filename allocated in RAM until the first sync/close call. The requirement to keep the filename allocated for new files until first sync/close could be added to open, and with the option to call sync immediately to save the filename (and accept the risk of zero-sized files), I don't think it would be _that_ bad of an API. But it would still be pretty bad. Extra bad because 1. there's no way to warn on misuse at compile-time, 2. use-after-free bugs have a tendency to go unnoticed annoyingly often, 3. it's a regression from the previous API, and 4. who the heck reads the more-or-less same `open` documentation for every filesystem they adopt. 2. Without an allocated mid, tracking files internally gets a lot harder. The best option I could think of was to keep the opened-file linked-list sorted by mid + (in-device) file name. This did not feel like a great solutiona and was going to add more code cost. 3. Handling mdir splits containing uncreated files adds another headache. Complicated lfsr_mdir_estimate further as it needs to decide in which mdir the uncreated files will end up, and potentially split on a filename that isn't even created yet. 4. Since the number of uncreated files can be potentially unbounded, you can't prevent an mdir from filling up with only uncreated files. On disk this ends up looking like an "empty" mdir, which need specially handling in littlefs to reclaim after powerloss. Support for empty mdirs -- the orphaned mdir scan -- was already added earlier. We already scan each mdir to build gstate, so it doesn't really add much cost. Notice that last bullet point? We already scan each mdir during mount. Why not, instead of scanning for orphaned mdirs, scan for orphaned files? So this leads to the idea of "scratch files". Instead of actually delaying file creation, fake it. Create a scratch file during open, and on the first sync/close, convert it to a regular file. If we lose power, scan for scratch files during mount, and remove them on first write. Some tradeoffs: 1. The orphan scan for scratch files is a bit more expensive than for mdirs on storage with large block sizes. We need to look at each file entry vs just each mdir, which pushed the runtime up to O(BlogB) vs O(B). Though if you also consider large mtrees, the worst case is still O(nlogn). 2. Creating intermediate scratch files adds another commit to file creation. This is probably not a big issue for flash, but may be more of a concern on devices with large prog sizes. 3. Scratch files complicate unrelated mkdir/rename/etc code a bit, since we need to consider what happens when the dest is a scratch file. But the end result is simple. And simple is good. Both for implementation headaches, and code size. Even if the on-disk state is conceptually more complicated. You may have noticed these scratch files are basically isomorphic to just setting an "uncreated" flag on the file, and that's true. There may have been a simpler route to end up with the design, but hey, as long as it works. As a plus, scratch files present a solution for a couple other things: 1. Removing an open file can become a scratch file until closed. 2. Scratch files can be used as temporary files. Open a file with O_DESYNC and never call sync and you have yourself a temporary file. Maybe in the future we should add O_TMPFILE to avoid the need for unique filenames, but that is low priority.	2024-02-03 18:15:29 -06:00
Christopher Haster	f29a4982c4	Added block-level erased-state checksums Much like the erased-state checksums in our rbyds (ecksums), these block-level erased-state checksums (becksums) allow us to detect failed progs to erased parts of a block and are key to achieving efficient incremental write performance with large blocks and frequent power cycles/open-close cycles. These are also key to achieving _reasonable_ write performance for simple writes (linear, non-overwriting), since littlefs now relies solely on becksums to efficiently append to blocks. Though I suppose the previous block staging logic used with the CTZ skip-list could be brought back to make becksums optional and avoid btree lookups during simple writes (we do a _lot_ of btree lookups)... I'll leave this open as a future optimization... Unlike in-rbyd ecksums, becksums need to be stored out-of-band so our data blocks only contain raw data. Since they are optional, an additional tag in the file's btree makes sense. Becksums are relatively simple, but they bring some challenges: 1. Adding becksums to file btrees is the first case we have for multiple struct tags per btree id. This isn't too complicated a problem, but requires some new internal btree APIs. Looking forward, which I probably shouldn't be doing this often, multiple struct tags will also be useful for parity and content ids as a part of data redundancy and data deduplication, though I think it's uncontroversial to consider this both heavier-weight features... 2. Becksums only work if unfilled blocks are aligned to the prog_size. This is the whole point of crystal_size -- to provide temporary storage for unaligned writes -- but actually aligning the block during writes turns out to be a bit tricky without a bunch of unecesssary btree lookups (we already do too many btree lookups!). The current implementation here discards the pcache to force alignment, taking advantage of the requirement that cache_size >= prog_size, but this is corrupting our block checksums. Code cost: code stack before: 31248 2792 after: 32060 (+2.5%) 2864 (+2.5%) Also lfsr_ftree_flush needs work. I'm usually open to gotos in C when they improve internal logic, but even for me, the multiple goto jumps from every left-neighbor lookup into the block writing loop is a bit much...	2023-12-14 01:05:34 -06:00
Christopher Haster	c4d75efa40	Added bptr checksums Looking forward, bptr checksums provide an easy mechanism to validate data residing in blocks. This extends the merkle-tree-like nature of the filesystem all the way down to the data level, and is common in other COW filesystems. Two interesting things to note: 1. We don't actually check data-level checksums yet, but we do calculate data-level checksums unconditionally. Writing checksums is easy, but validating checksums is a bit more tricky. This is made a bit harder for littlefs, since we can't hold an entire block of data in RAM, so we have to choose between separate bus transactions for checksum + data reads, or extremely expensive overreads every read. Note this already exists at the metadata-level, the separate bus transactions for rbyd fetch + rbyd lookup means we _are_ susceptible to a very small window where bit errors can get through. But anyways, writing checksums is easy. And has basically no cost since we are already processing the data for our write. So we might as well write the data-level checksums at all times, even if we aren't validating at the data-level. 2. To make bptr checksums work cheaply we need an additional cksize field to indicate how much data is checksummed. This field seems redundant when we already have the bptr's data size, but if we didn't have this field, we would be forced to recalculate the checksum every time a block is sliced. This would be unreasonable. The immutable cksize field does mean we may be checksumming more data than we need to when validating, but we should be avoiding small block slices anyways for storage cost reasons. This does add some stack cost because our bptr struct is larger now: code stack before: 31200 2768 after: 31272 (+0.2%) 2800 (+1.1%)	2023-12-12 12:07:55 -06:00
Christopher Haster	9f02cbb26b	Tweaked mount/format/dbg littlefs info print The info should now be ordered more-or-less by decreasing importance: littlefs v2.0 4096x256 0x{0,1}.36d w12.256 ^ ^ ^ ^ ^ '-.-' ^ ^ ^ '--\|-\|----\|---\|-----\|---\|-----\|---\|-- littlefs '-\|----\|---\|-----\|---\|-----\|---\|-- on-disk major version '----\|---\|-----\|---\|-----\|---\|-- on-disk minor version '---\|-----\|---\|-----\|---\|-- block size '-----\|---\|-----\|---\|-- block count '---\|-----\|---\|-- mroot blocks '-----\|---\|-- mroot trunk '---\|-- mtree weight '-- mweight	2023-12-08 14:23:53 -06:00
Christopher Haster	6ccd9eb598	Adopted different strategy for hypothetical future configs Instead of writing every possible config that has the potential to be useful in the future, stick to just writing the configs that we know are useful, and error if we see any configs we don't understand. This prevents unnecessary config bloat, while still allowing configs to be introduced in a backwards compatible way in the future. Currently unknown configs are treated as a mount error, but in theory you could still try to read the filesystem, just with potentially corrupted data. Maybe this could be behind some sort of "FORCE" mount flag. littlefs must never write to the filesystem if it finds unknown configs. --- This also creates a curious case for the hole in our tag encoding previously taken up by the OCOMPATFLAGS config. We can query for any config > SIZELIMIT with lookupnext, but the OCOMPATFLAGS flag would need an extra lookup which just isn't worth it. Instead I'm just adding OCOMPATFLAGS back in. To support OCOMPATFLAGS littlefs has to do literally nothing, so this is really more of a documentation change. And who know, maybe OCOMPATFLAGS will have some weird use case in the future...	2023-12-08 14:03:56 -06:00
Christopher Haster	337bdf61ae	Rearranged tag encodings to make space for BECKSUM, ORPHAN, etc Also: - Renamed GSTATE -> GDELTA for gdelta tags. GSTATE tags added as separate in-device flags. The GSTATE tags were already serving this dual purpose. - Renamed BSHRUB* -> SHRUB when the tag is not necessarily operating on a file bshrub. - Renamed TRUNK -> BSHRUB The tag encoding space now has a couple funky holes: - 0x0005 - Hole for aligning config tags. I guess this could be used for OCOMPATFLAGS in the future? - 0x0203 - Hole so that ORPHAN can be a 1-bit difference from REG. This could be after BOOKMARK, but having a bit to differentiate littlefs specific file types (BOOKMARK, ORPHAN) from normal file types (REG, DIR) is nice. I guess this could be used for SYMLINK if we ever want symlinks in the future? - 0x0314-0x0318 - Hole so that the mdir related tags (MROOT, MDIR, MTREE) are nicely aligned. This is probably a good place for file-related tags to go in the future (BECKSUM, CID, COMPR), but we only have two slots, so will probably run out pretty quickly. - 0x3028 - Hole so that all btree related tags (BTREE, BRANCH, MTREE) share a common lower bit-pattern. I guess this could be used for MSHRUB if we ever want mshrubs in the future?	2023-12-08 13:28:47 -06:00
Christopher Haster	04c6b5a067	Added grm rcompat flag, dropped ocompat, tweaked compat flags a bit I'm just not seeing a use case for optional compat flags (ocompat), so dropping for now. It seems their *nix equivalent, feature_compat, is used to inform fsck of things, but this doesn't really make since in littlefs since there is no fsck. Or from a different perspective, littlefs is always running fsck. Ocompat flags can always be added later (since they do nothing). Unfortunately this really ruins the alignment of the tag encoding. For whatever reason config limits tend to come in pairs. For now the best solution is just leave tag 0x0006 unused. I guess you can consider it reserved for hypothetical ocompat flags in the future. --- This adds an rcompat flag for the grm, since in theory a filesystem doesn't need to support grms if it never renames files (or creates directories?). But if a filesystem doesn't support grms and a grms gets written into the filesystem, this can lead to corruption. I think every piece of gstate will end up with its own compat flag for this reason. --- Also renamed r/w/oflags -> r/w/ocompatflags to make their purpose clearer. --- The code impact of adding the grm rcompat flag is minimal, and will probably be less for additional rcompat flags: code stack before: 31528 2752 after: 31584 (+0.2%) 2752 (+0.0%)	2023-12-07 15:05:51 -06:00
Christopher Haster	4793d2f144	Fixed new bshrub roots and related bug fixing It turned out by implicitly handling root allocation in lfsr_btree_commit_, we were never allowing lfsr_bshrub_commit to intercept new roots as new bshrubs. Fixing this required moving the root allocation logic up into lfsr_btree_commit. This resulted in quite a bit of small bug fixing because it turns out if you can never create non-inlined bshrubs you never test non-inlined bshrubs: - Our previous rbyd.weight == btree.weight check for if we've reached the root no longer works, changed to an explicit check that the blocks match. Fortunately, now that new roots set trunk=0 new roots are no longer a problematic case. - We need to only evict when we calculate an accurate estimate, the previous code had a bug where eviction occurred early based only on the progged-since-last-estimate. - We need to manually set bshrub.block=mdir.block on new bshrubs, otherwise the lfsr_bshrub_isbshrub check fails in mdir commit staging. Also updated btree/bshrub following code in the dbg scripts, which mostly meant making them accept both BRANCH and SHRUBBRANCH tags as btree/bshrub branches. Conveniently very little code needs to change to extend btree read operations to support bshrubs.	2023-11-21 00:06:08 -06:00
Christopher Haster	6bd00caf93	Reimplemented eager shrub eviction, now with a more reliable heuristic Unfortunately, waiting to evict shrubs until mdir compaction does not work because we only have a single pcache. When we evict a bshrub we need a pcache for writing the new btree root, but if we do this during mdir compaction, our pcache is already busy handling the mdir compaction. We can't do a separate pass for bshrub eviction, since this would require tracking an unbounded number of new btree roots. In the previous shrub design, we meticulously tracked the compacted shrub estimate in RAM, determining exactly how the estimate would change as a part of shrub carve operations. This worked, but was fragile. It was easy for the shrub estimate to diverge from the actual value, and required quite a bit of extra code to maintain. Since the use cases for bshrubs is growing a bit, I didn't want to return to this design. So here's a new approach based on emulating btree compacts/splits inside the shrubs: 1. When a bshrub is fetched, scan the bshrub and calculate a compaction estimate. Store this. 2. On every commit, find the upper bound of new data being progged, and keep track of estimate + progged. We can at least get this relatively easily from commit attr lists. We can't get the amount deleted, which is the problem. 3. When estimate + progged exceeds shrub_size, scan the bshrub again and recalculate the estimate. 4. If estimate exceeds the shrub_size/2, evict the bshrub, converting it into a btree. As you may note, this is very close to how our btree compacts/splits work, but emulated. In particular, evictions/splits occur at (shrub_size/block_size)/2 in order to avoid runaway costs when the bshrub/btree gets close to full. Benefits: - This eviction heuristic is very robust. Calculating the amount progged from the attr list is relatively cheap and easy, and any divergence should be fixed when we recalculate the estimate. - The runtime cost is relatively small, amortized O(log n) which is the existing runtime to commit to rbyds. Downsides: - Just like btree splits, evictions force our bshrub to be ~1/2 full on average. This combined with the 2x cost for mdir pairs, the 2x cost for mdirs being ~1/2 full on average, and the need for both a synced and unsynced copy of file bshrubs brings our file bshrub's overhead up to ~16x, which is getting quite high... Anyways, bshrubs now work, and the new file topology is passing testing. An unfortunate surprise is the jump in stack cost. This seems to come from moving the lfsr_btree_flush logic into the hot-path that includes bshrub commit + mdir commit + all the mtree logic. Previously the separate of btree/shrub commits meant that the more complex block/btree/crystal logic was on a separate path from the mdir commit logic: code stack lfsr_file_t before bshrubs: 31840 2072 120 after bshrubs: 30756 (-3.5%) 2448 (+15.4%) 104 (-15.4%) I _think_ the reality is not actually as bad as measured, most of these flush/carve/commit functions calculate some work and then commit it in seperate steps. In theory GCC's shrinkwrapping optimizations should limit the stack to only what we need as we finish different calculations, but our current stack measurement scripts just add together the whole frames, so any per-call stack optimizations get missed...	2023-11-21 00:04:30 -06:00
Christopher Haster	6b82e9fb25	Fixed dbg scripts to allow explicit trunks without checksums Note this is intentionally different from how lfsr_rbyd_fetch behaves in lfs.c. We only call lfsr_rbyd_fetch when we need validated checksums, otherwise we just don't fetch. The dbg scripts, on the other hand, always go through fetch, but it is useful to be able to inspect the state of incomplete trunks when debugging. This use to be how the dbg scripts behaved, but they broke because of some recent script work.	2023-11-20 23:28:27 -06:00
Christopher Haster	06439f0cc4	Tried to clean up one-line file state in dbglfs.py Before: littlefs v2.0 0x{0,1}.232, rev 99, weight 9.256, bd 4096x256 {00a3,00a4}: 0.1 file0000 reg 32768, trunk 0xa3.a8 32768, btree 0x1a.846 32704 0.2 file0001 reg 32768, trunk 0xa3.16c 32768, btree 0xa2.be1 32704 After: littlefs v2.0 0x{0,1}.232, rev 99, weight 9.256, bd 4096x256 {00a3,00a4}: 0.1 file0000 reg 32768, trunk 0xa3.a8, btree 0x1a.846 0.2 file0001 reg 32768, trunk 0xa3.16c, btree 0xa2.be1 Most files will have both a shrub and a btree, which makes the previous output problematically noisy. Unfortunately, this does lose some information: the size of the shrub/tree, both of which may be less than the full file. But 1. this is _technically_ redundant since you only need the block/trunk to fetch an rbyd (though the weight is useful), and 2. The weight can still be viewed with -s -i.	2023-10-30 15:52:33 -05:00
Christopher Haster	4ecf4cc654	Added dbgbmap.py, tweaked tracebd.py to match dbgbmap.py parses littlefs's mtree/btrees and displays that status of every block in use: $ ./scripts/dbgbmap.py disk -B4096x256 -Z -H8 -W64 bd 4096x256, 7.8% mdir, 10.2% btree, 78.1% data mmddbbddddddmmddddmmdd--bbbbddddddddddddddbbdddd--ddddddmmdddddd mmddddbbddbbddddddddddddddddbbddddbbddddddmmddbbdddddddddddddddd bbdddddddddddd--ddddddddddddddddbbddddmmmmddddddddddddmmmmdddddd ddddddddddbbdddddddddd--ddddddddddddddmmddddddddddddddddddddmmdd ddddddbbddddddddbb--ddddddddddddddddddddbb--mmmmddbbdddddddddddd ddddddddddddddddddddbbddbbdddddddddddddddddddddddddddddddddddddd dddddddddd--ddddbbddddddddmmbbdd--ddddddddddddddbbmmddddbbdddddd ddmmddddddddddmmddddddddmmddddbbbbdddddddd--ddbbddddddmmdd--ddbb (ok, it looks a bit better with colors) dbgbmap.py matches the layout and has the same options as tracebd.py, allowing the combination of both to provide valuable insight into what exactly littlefs is doing. This required a bit of tweaking of tracebd.py to get right, mostly around conflicting order-based arguments. This also reworks the internal Bmap class to be more resilient to out-of-window ops, and adds an optional informative header.	2023-10-30 15:52:33 -05:00
Christopher Haster	46b78de500	Tweaked tracebd.py in a couple of ways, adopted bdgeom/--off/-n - Tried to do the rescaling a bit better with truncating divisions, so there shouldn't be weird cross-pixel updates when things aren't well aligned. - Adopted optional -B<block_size>x<block_count> flag for explicitly specifying the block-device geometry in a way that is compatible with other scripts. Should adopt this more places. - Adopted optional <block>.<off> argument for start of range. This should match dbgblock.py. - Adopted '-' for noop/zero-wear. - Renamed a few internal things. - Dropped subscript chars for wear, this didn't really add anything and can be accomplished by specifying the --wear-chars explicitly. Also changed dbgblock.py to match, this mostly affects the --off/-n/--size flags. For example, these are all the same: ./scripts/dbgblock.py disk -B4096 --off=10 --size=5 ./scripts/dbgblock.py disk -B4096 --off=10 -n5 ./scripts/dbgblock.py disk -B4096 --off=10,15 ./scripts/dbgblock.py disk -B4096 -n10,15 ./scripts/dbgblock.py disk -B4096 0.10 -n5 Also also adopted block-device geometry argument across scripts, where the -B flag can optionally be a full <block_size>x<block_count> geometry: ./scripts/tracebd.py disk -B4096x256 Though this is mostly unused outside of tracebd.py right now. It will be useful for anything that formats littlefs (littlefs-fuse?) and allowing the format everywhere is a bit of a nice convenience.	2023-10-30 15:52:20 -05:00
Christopher Haster	bfc8021176	Reworked config tags, adopted rflags/wflags/oflags The biggest change here is the breaking up of the FLAGS config into RFLAGS/WFLAGS/OFLAGS. This is directly inspired by, and honestly not much more than a renaming, of the compat/ro_compat/incompat flags found in Linux/Unix/POSIX filesystems. I think these were first introduced in ext2? But I need to do a bit more research on that. RFLAGS/WFLAGS/OFLAGS provide a much more flexible, and extensible, feature flag mechanism than the previous minor version bumps. The (re)naming of these flags is intended to make their requirements more clear. In order to do the relevant operation, you must understand every flag set in the relevant flag: - RFLAGS / incompat flags - All flags must be understood to read the filesystem, if not understood the only possible behavior is to fail. - WFLAGS / ro-compat flags - All flags must be understood to write to the filesystem, if not understood the filesystem may be mounted read-only. - OFLAGS / compat flags - Optional flags, if not understood the relevant flag must be cleared before the filesystem can be written to, but other than that these flags can mostly be ignored. Some hypothetical littlefs examples: - RFLAGS / incompat flags - Transparent compression Is this the same as a major disk-version break? Yes kinda? An implementation that doesn't understand compression can't read the filesystem. On the other hand, it's useful to have a filesystem that can read both compressed and uncompressed variants. - WFLAGS / ro-compat flags - Closed block-map The idea behind a closed block-map (currently planned), is that littlefs maintains in global space a complete mapping of all blocks in use by the filesystem. For such a mapping to remain consistent means that if you write to the filesystem you must understand the closed block-map. Or in other words, if you don't understand the closed block-map you must not write to the filesystem. Reading, on the other hand, can ignore many such write-related auxiliary features, so the filesystem can still be read from. - OFLAGS / compat flags - Global checksums Global checksums (currently planned) are extra checksums attached to each mdir that when combined self-validate the filesystem. But if you don't understand global checksums, you can still read and write the filesystem without them. The only catch is that when you write to the filesystem, you may end up invalidating the global checksum. Clearing the global checksum bit in the OFLAGS is a cheap way to signal that the global checksum is no longer valid, allowing you to still write to the filesystem without this optional feature. Other tweaks to note: - Renamed BLOCKLIMIT/DISKLIMIT -> BLOCKSIZE/BLOCKCOUNT Note these are still the _actual_ block_size/block_count minus 1. The subtle difference here was the original reason for the name change, but after working with it for a bit, I just don't think new, otherwise unused, names are worth it. The minus 1 stays, however, since it avoids overflow issues at extreme boundaries of powers of 2. - Introduces STAGLIMIT/SATTRLIMIT, sys-attribute parallels to UTAGLIMIT/UATTRLIMIT. These may be useful if only uattrs are supported, or vice-versa. - Dropped UATTRLIMIT/SATTRLIMIT to 255 bytes. This feels extreme, but matches NAMELIMIT. These _should_ be small, and limiting the uattr/sattr size to a single-byte leads to really nice packing of the utag+uattrsize in a single integer. This can always be expanded in the future if this limit proves to be a problem. - Renamed MLEAFLIMIT -> MDIRLIMIT and (re?)introduced MTREELIMIT. These may be useful to limiting the mtree when needed, though it's not clear the exact use case quite yet.	2023-10-25 12:08:58 -05:00
Christopher Haster	6dcdf1ed61	Renamed BNAME -> NAME, CCKSUM -> CKSUM It's probably better to have a separate names for a tag category and any specific name, but I can't think of a better name for this tag, and I hadn't noticed that I was already ignoring the C prefix for CCKSUM tags in many places. NAME/CKSUM now mean both the specific tag and tag category, which is a bit of a hack since both happen to be the 0th-subtype of their categories.	2023-10-25 01:25:39 -05:00
Christopher Haster	240fe4efe4	Changed CKSUM suptype encoding from 0x2000 -> 0x3000 I may be overthinking things, but I'm guessing of all the possible tag modes we may want to add in the future, we will mostly like want to add something that looks vaguely tag like. Like the shrub tags, for example. It's beneficial, ordering wise, for these hypothetical future tags to come before the cksum tags. Current tag modes: 0x0ttt v--- tttt -ttt tttt normal tags 0x1ttt v--1 tttt -ttt tttt shrub tags 0x3tpp v-11 tttt ---- ---p cksum tags 0x4kkk v1dc kkkk -kkk kkkk alt tags	2023-10-24 23:46:11 -05:00
Christopher Haster	1fc2f672a2	Tweaked tag encoding a bit post-slice to make space for becksum tags	2023-10-24 22:34:21 -05:00
Christopher Haster	35434f8b54	Removed remnants of slice code, and cleaned things up a bit	2023-10-24 22:26:08 -05:00
Christopher Haster	865477d7e1	Changing coalesce strategy, reimplemented shrub/btree carve Note this is already showing better code reuse, which is a good sign, though maybe that's just the benefit of reimplementing similar logic multiple times. Now both reading and carving end up in the same lfsr_btree_readnext and lfsr_btree_buildcarve functions for both btrees and shrubs. Both btrees and shrubs are fundamentally rbyds, so we can share a lot of functionality as long as we redirect to the correct commit function at the last minute. This surprising opportunity for deduplication was noticed while putting together the dbg scripts. Planned logic (not actual function names): lfsr_file_readnext -> lfsr_shrub_readnext \| \| \| v '---------> lfsr_btree_readnext lfsr_file_flushbuffer -> lfsr_shrub_carve ------------. .---------------------' \| v v lfsr_file_flushshrub -> lfsr_btree_carve -> lfsr_btree_buildcarve Though the btree part of the above statement is only a hypothetical at the moment. Not even the shrubs can survive compaction now. The reason is the new SLICE tag which needs low-level support in rbyd compact. SLICE introduces indirect refernces to data located in the same rbyd, which removes any copying cost associated with coalescing. Previously, a large coalesce_size risked O(n^2) runtime when incrementally append small amounts of data, but with SLICEs we can defer coalescing to compaction time, where the copy is effectively free. This compaction-time-coalescing is also hypothetical, which is why our tests are failing. But the theory is promising. I was originally against this idea because of how it crosses abstraction layers, requiring some very low-level code that absolutely can not be omitted in a simpler littlefs driver. But after working on the actual file writing code for a while I've become convinced the tradeoff is worth it. Note coalesce_size will likely still need to be configurable. Data in fragmenting/sparse btrees is still susceptible to coalescing, and it's not clear the impacts of internal fragmentation when data sizes approach the hard block_size/2 limit.	2023-10-17 23:21:18 -05:00
Christopher Haster	fce1612dc0	Reverted to separate BTREE/BRANCH encodings, reordered on-disk structs My current thinking is that these are conceptually different types, with BTREE tags representing the entire btree, and BRANCH tags representing only the inner btree nodes. We already have multiple btree tags anyways: btrees attached to files, the mtree, and in the future maybe a bmaptree. Having separate tags also makes it possible to store a btree in a btree, though I don't think we'll ever use this functionality. This also removes the redundant weight field from branches. The redundant weight field is only a minor cost relative to storage, but it also takes up a bit of RAM when encoding. Though measurements show this isn't really significant. New encodings: btree encoding: branch encoding: .---+- -+- -+- -+- -. .---+- -+- -+- -+- -. \| weight \| \| blocks \| +---+- -+- -+- -+- -+ ' ' \| blocks \| ' ' ' ' +---+- -+- -+- -+- -+ ' ' \| trunk \| +---+- -+- -+- -+- -+ +---+- -+- -+- -+- -' \| trunk \| \| cksum \| +---+- -+- -+- -+- -' '---+---+---+---' \| cksum \| '---+---+---+---' Code/RAM changes: code stack before: 30836 2088 after: 30944 (+0.4%) 2080 (-0.4%) Also reordered other on-disk structs with weight/size, so such structs always have weight/size as the first field. This may enable some optimizations around decoding the weight/size without needing to know the specific type in some cases. --- This change shouldn't have affected functionality, but it revealed a bug in a dtree test, where a did gets caught in an mdir split and the split name makes the did unreachable. Marking this as a TODO for now. The fix is going to be a bit involved (fundamental changes to the opened-mdir list), and similar work is already planned to make removed files work.	2023-10-15 14:53:07 -05:00
Christopher Haster	173de4388b	Added file tags to rendering of inner tree tags in dbglfs.py Now -i/--inner will also show the file tags that reference the underlying data structure. The difference is subtle but useful: littlefs v2.0 0x{0,1}.eee, rev 315, weight 0.256, bd 4096x262144 {0000,0001}: -1.1 hello reg 8192, btree 0x5121.d50 8143 0000.0efc: + 0-8142 btree w8143 11 ... 5121.0d50: \| .-+ 0-4095 block w4096 6 ... \| \| '-> 0-4095 block w4096 0x5117.0 4096 ... '-+-+ 4096-8142 block w4047 6 ... '-> 4096-8142 block w4047 0x5139.0 4047 ...	2023-10-14 04:47:25 -05:00
Christopher Haster	fbb6a27b05	Changed crystallization strategy in btrees to rely on coalescing This is a pretty big rewrite, but is necessary to avoid "dagging". "Dagging" (I just made this term up) is when you transform a pure tree into a directed acyclic graph (DAG). Normally DAGs are perfectly fine in a copy-on-write system, but in littlefs's cases, it creates havoc for future block allocator plans, and it's interaction with parity blocks raises some uncomfortable questions. How does dagging happen? Consider an innocent little btree with a single block: .-----. \|btree\| \| \| '-----' \| v .-----. \|abcde\| \| \| '-----' Say we wanted to write a small amount of data in the middle of our block. Since the data is so small, the previous scheme would simply inline the data, carving the left and right sibling (in the case the same block) to make space: .-----. \|btree\| \| \| '-----' .' v '. \| c' \| '. .' v v .-----. \|ab de\| \| \| '-----' Oh no! A DAG! With the potential for multiple pointers to reference the same block in our btree, some invariants break down: - Blocks no longer have a single reference - If you remove a reference you can no longer assume the block is free - Knowing when a block is free requires scanning the whole btree - This split operation effectively creates two blocks, does that mean we need to rewrite parity blocks? --- To avoid this whole situation, this commit adopts a new crystallization algorithm. Instead of allowing crystallization data to be arbitrarily fragmented, we eagerly coalesce any data under our crystallization threshold, and if we can't coalesce, we compact everything into a block. Much like a Knuth heap, simply checking both siblings to coalesce has the effect that any data will always coalesce up to the maximum size where possible. And when checking for siblings, we can easily find the block alignment. This also has the effect of always rewriting blocks if we are writing a small amount of data into a block. Unfortunately I think this is just necessary in order to avoid dagging. At the very least crystallization is still useful for files not quite block aligned at the edges, and sparse files. This also avoids concerns of random writes inflating a file via sparse crystallization.	2023-10-14 01:25:41 -05:00
Christopher Haster	57aa513163	Tweaked debug prints to show more information during mount Now when you mount littlefs, the debug print shows a bit more info: lfs.c:7881:debug: Mounted littlefs v2.0 0x{0,1}.c63 w43.256, bd 4096x256 To dissassemble this a bit: littlefs v2.0 0x{0,1}.c63 w43.256, bd 4096x256 ^ ^ '-+-' ^ ^ ^ ^ ^ '-\|-----\|----\|---\|---\|--------\|---\|-- major version '-----\|----\|---\|---\|--------\|---\|-- minor version '----\|---\|---\|--------\|---\|-- mroot blocks \| \| \| \| \| (1st is active) '---\|---\|--------\|---\|-- mroot trunk '---\|--------\|---\|-- mtree weight '--------\|---\|-- mleaf weight '---\|-- block size '-- block count dbglfs.py also shows the block device geometry now, as read from the mroot: $ ./scripts/dbglfs.py disk -B4096 littlefs v2.0 0x{0,1}.c63, rev 1, weight 43.256, bd 4096x256 ... This may be over-optimizing for testing, but the reason the mount debug is only one line is to avoid slowing down/messying test output. Both powerloss testing and remounts completely fill the output with mount prints that aren't actually all that useful. Also switching to prefering parens in debug info mainly for mismatched things.	2023-10-14 01:25:26 -05:00
Christopher Haster	5ecd6d59cd	Tweaked config and gstate reprs in dbglfs.py to be more readable Mainly aligning things, it was easy for the previous repr to become a visual mess. This also represents the config more like how we represent other tags, since they've changed from a monolithic config block to separate attributes.	2023-10-14 01:25:20 -05:00
Christopher Haster	b936e33643	Tweaked dbg scripts to resize tag repr based on weight This a compromise between padding the tag repr correctly and parsing speed. If we don't have to traverse an rbyd (for, say, tree printing), we don't want to since parsing rbyds can get quite slow when things get big (remember this is a filesystem!). This makes tag padding a bit of a hard sell. Previously this was hardcoded to 22 characters, but with the new file struct printing it quickly became apparently this would be a problematic limit: 12288-15711 block w3424 0x1a.0 3424 67 64 79 70 61 69 6e 71 gdypainq It's interesting to note that this has only become an issue for large trees, where the weight/size in the tag can be arbitrarily large. Fortunately we already have the weight of the rbyd after fetch, so we can use a heuristic similar to the id padding: tag padding = 21 + nlog10(max(weight,1)+1) --- Also dropped extra information with the -x/--device flag. It hasn't really been useful and was implemented inconsistently. Maybe -x/--device should just be dropped completely...	2023-10-14 01:25:14 -05:00
Christopher Haster	c8b60f173e	Extended dbglfs.py to show file data structures You can now pass -s/--structs to dbglfs.py to show any file data structures: $ ./scripts/dbglfs.py disk -B4096 -f -s -t littlefs v2.0 0x{0,1}.9cf, rev 3, weight 0.256 {0000,0001}: -1.1 hello reg 128, trunk 0x0.993 128 0000.0993: .-> 0-15 shrubinlined w16 16 6b 75 72 65 65 67 73 63 kureegsc .-+-> 16-31 shrubinlined w16 16 6b 65 6a 79 68 78 6f 77 kejyhxow \| .-> 32-47 shrubinlined w16 16 65 6f 66 75 76 61 6a 73 eofuvajs .-+-+-> 48-63 shrubinlined w16 16 6e 74 73 66 67 61 74 6a ntsfgatj \| .-> 64-79 shrubinlined w16 16 70 63 76 79 6c 6e 72 66 pcvylnrf \| .-+-> 80-95 shrubinlined w16 16 70 69 73 64 76 70 6c 6f pisdvplo \| \| .-> 96-111 shrubinlined w16 16 74 73 65 69 76 7a 69 6c tseivzil +-+-+-> 112-127 shrubinlined w16 16 7a 79 70 61 77 72 79 79 zypawryy This supports the same -b/-t/-i options found in dbgbtree.py, with the one exception being -z/--struct-depth which is lowercase to avoid conflict with the -Z/--depth used to indicate the filesystem tree depth. I think this is a surprisingly reasonable way to show the inner structure of files without clobbering the user's console with file contents. Don't worry, if clobbering is desired, -T/--no-truncate still dumps all of the file content. Though it's still up to the user to manually apply the sprout/shrub overlay. That step is still complex enough to not implement in this tool yet. I	2023-10-14 01:25:08 -05:00
Christopher Haster	4996b8419d	Implemented most of file btree reading/writing Still needs testing, though the byte-level fuzz tests were already causing blocks to crystallize. I noticed this because of test failures which are fixed now. Note the block allocator currently doesn't understand file btrees. To get the current tests passing requires -DDISK_SIZE=16777216 or greater. It's probably also worth noting there's a lot that's not implemented yet! Data checksums and write validation for one. Also ecksums. And we should probably have some sort of special handling for linear writes so linear writes (the most common) don't end up with a bunch of extra crystallizing writes. Also the fact that btrees can become DAGs now is an oversight and a bit concerning. Will that work with a closed allocator? Block parity?	2023-10-14 01:12:26 -05:00
Christopher Haster	df32211bda	Changed -t/--dtree to -f/--files in dbglfs.py This flag makes more sense to me and avoids conflicts with the -d/--delta flag used for gstate.	2023-10-14 00:54:06 -05:00
Christopher Haster	ef691d4cfe	Tweaked rbyd lookup/append to use 0 lower rid bias Previously our lower/upper bounds were initialized to -1..weight. This made a lot of the math unintuitive and confusing, and it's not really necessary to support -1 rids (-1 rids arise naturally in order-statistic trees the can have weight=0). The tweak here is to use lower/upper bounds initialized to 0..weight, which makes the math behave as expected. -1 rids naturally arise from rid = upper-1.	2023-10-14 00:52:00 -05:00
Christopher Haster	3fb4350ce7	Updated dbg scripts to support shrub trees - Added shrub tags to tagrepr - Modified dbgrbyd.py to use last non-shrub trunk by default - Tweaked dbgrbyd's log mode to find maximum seen weight for id padding	2023-10-13 23:35:03 -05:00
Christopher Haster	9f0160556f	Made significant progress around inlined-file state during mdir commits The main improvement is moving the special inlined-file compaction logic up into lfsr_mdir_compact__. We only need this logic for files stored in mdirs, and thanks to its recursive nature, we weren't getting any benefit from handling this at a lower level anyways. This is a nice logical restructuring that probably saves a bit of code cost in the end. Another significant improvement is moving the staging copy of the inlined tree's state up into the file struct itself. This solves the problem of needed N copies of temporary inlined state when you have N open files. It also provides a central place to stage changes when compacting inlined trees, which happens across several different places in the mdir commit logic. Though some may see this as more a hack than a feature. Also note-worthy, but minor: these changes required an additional opened-mdir linked-list to know when the mdir is a file and may contain an inlined tree.	2023-10-13 23:19:24 -05:00

1 2 3

119 Commits