littlefs

Author	SHA1	Message	Date
Christopher Haster	738eb52159	Tweaked tag encoding/naming for btrees/branches LFSR_TAG_BNAME => LFSR_TAG_BRANCH LFSR_TAG_BRANCH => LFSR_TAG_BTREE Maybe this will be a problem in the future if our branch structure is not the same as a standalone btree, but I don't really see that happening.	2023-05-30 13:41:28 -05:00
Christopher Haster	4e3dca0b81	Partial implementation of a rudimentary mtree This became surprisingly tricky. The main issue is knowing when to split mdirs, and how to determine this without wasting erase cycles. Unlike splitting btree nodes, we can't salvage failed compacts here. As soon as the salvage commit is written to disk, the commit becomes immediately visibile to the filesystem because it still exists in the mtree. This is a problem if we lose power. We're likely going to need to implement rbyd estimates. This is something I hoped to avoid because it brings in quite a bit of complexity and might lead to an annoying amount of storage waste since our estimates will need to be conservative to avoid unrecoverable situations. --- Also changed the on-disk btree/branch struct to store a copy of the weight. This was already required for the root of the btree, requiring the weight to be stored in every btree pointer allows better code deduplication at the cost of some redundancy on btree branches, where the weight is already implied by the rbyd structure. This weight is usually a single byte for most branches anyways. This may be worth revisiting at some point to see if there's any other unexpected tradeoffs.	2023-05-30 13:28:35 -05:00
Christopher Haster	70a3a2b16e	Rough implementation of lfsr_format/mount/unmount This work already indicates we need more data-related helper functions. We shouldn't need this many function calls to do "simple" operations such as fetch the superconfig if it exists.	2023-05-30 13:16:03 -05:00
Christopher Haster	a511696bad	Added ability to bypass rbyd fetch during B-tree lookups This is an absurd optimization that stems from the observation that the branch encoding for the inner-rbyds in a B-tree is enough information to jump directly to the trunk of the rbyd without needing an lfsr_rbyd_fetch. This results in a pretty ridiculous performance jump from O(m log_m(n/m)) to O(log(m) log_m(n/m)). If the complexity analysis isn't impressive enough, look at some rough benchmarking of read operations for 4KiB-block, 1K-entry B-trees: 12KiB ^ :: :. :: .: .: :. : .: :. : : .. : : . : .: : : : \| .:: .::.::.:: ::.::::::::::::.::::::::.::::::::::::. \| : :::':: ::'::'::':: :' :':: :'::::::::': ::::::': : before \| ::: ::' :' :' :: :' '' ' ' '' : : : '' ' ' ' \| ::: '' \|: 0B :'------------------------------------------------------> .17KiB ^ ............::::::::::::::::::::::::::::: \| . .....:::::''''''''' ' ' ' \| .:::::::::::: after \| :':'' \|.:: .:' 0B :-------------------------------------------------------> 0 1K In order for this to work, the branch encoding did need to be tweaked slightly. Before it stored block+off, now it stores block+trunk where "trunk" is the offset of the entry point into the rbyd tree. Both off and trunk are enough info to know when to stop fetching, if necessary, but trunk allows lookups to jump directly into the branches rbyd tree without a fetch. With the change to trunk, lfsr_rbyd_fetch has also be extended to allow fetching of any internal trunks, not just the last trunk in the commit. This is very useful for dbgrbyd.py, but doesn't currently have a use in littlefs itself. But it's at least valuable to have the feature available in case it does become useful. Note that two cases still requires the slower O(m log_m(n/m)) lookup with lfsr_rbyd_fetch: 1. Name lookups, since we currently use a linear-search O(m) to find names. 2. Validating B-tree rbyd's, which requires a linear fetch O(m) to validate the checksums. We will need to do this at least once after mount. It's also worth mentioning this will likely have a large impact on B-tree traversal speed. Which is huge as I am expecting B-tree traversal to be the main bottleneck once garbage-collection (or its replacement) is involved.	2023-04-14 00:51:34 -05:00
Christopher Haster	7eb0c4763a	Reversed LFSR_ATTR id/tag argument order I've been wanting to make this change for a while now (tag,id => id,tag). The id,tag order matches the common lexicographic order used for sorting tuples. Sorting tag,id tuples by their id first is less common. The reason for this order in the codebase is because all attrs on disk start with their tag first, since its decoding determines the purpose of the id field (keep in mind this includes other non-tree tags such as crcs, alts, etc). But with the move to storing weights instead of tags on disk, this gives us a clear point to switch from tag,w to id,tag ordering. I may be thinking to much about this, but it does affect a significant amount of the codebase.	2023-04-14 00:43:33 -05:00
Christopher Haster	2142b4a09d	Reworked dbgrbyd.py's tree renderer to make more sense While the previous renderer was "technically correct", the attempt to map rotated alts to their nearest neighbor just made the resulting tree an unreadable mess. Now the renderer prunes alts with unreachable edges (like they would be during lfsr_rbyd_append). And aligns all alts with their destination trunk. This results in a much more readable, if slightly less accurate, rendering of the tree. Example: $ ./scripts/dbgrbyd.py -B4096 disk 0 -t rbyd 0x0, rev 1, size 1508, weight 40 off ids tag data (truncated) 0000032a: .-+-> 0 reg w1 1 73 s 00000026: \| '-> 1-5 reg w5 1 62 b 00000259: .-------+---> 6-11 reg w6 1 6f o 00000224: \| .-+-+-> 12-17 reg w6 1 6e n 0000028e: \| \| \| '-> 18 reg w1 1 70 p 00000076: \| \| '---> 19-20 reg w2 1 64 d 0000038f: \| \| .-> 21-22 reg w2 1 75 u 0000041d: \| .---+---+-> 23 reg w1 1 78 x 000001f3: \| \| .-> 24-27 reg w4 1 6d m 00000486: \| \| .-----+-> 28-29 reg w2 1 7a z 000004f3: \| \| \| .-----> 30-31 reg w2 1 62 b 000004ba: \| \| \| \| .---> 32-35 reg w4 1 61 a 0000058d: \| \| \| \| \| .-> 36-37 reg w2 1 65 e 000005c6: +-+-+-+-+-+-> 38-39 reg w2 1 66 f	2023-04-14 00:41:55 -05:00
Christopher Haster	0ccf283321	Changed in-tree tags to store their weights Sorting weights instead of ids just had a number of benefits, suggesting this is a better design: - Calculating the id and delta of each rbyd trunk is surprisingly easier - id is now just lower+w-1, and no extra conditions are needed for unr tags, which just have a weight of zero. - Removes ambiguity around which id unr tags should be assigned to, especially unrs that delete ids. - No more +-1 weirdness when encoding/decoding tag ids - the weight can be written as-is and -1 ids are infered from their weight and position in the tree (lower+w-1 = 0+0-1 = -1). - Weights compress better under leb128 encoding, since they are usually quite small.	2023-04-14 00:32:05 -05:00
Christopher Haster	5a1c36f210	Attempting to add weight changes to every rbyd append This does not work as is due to ambiguity with grows and insertions. Before, these were disambiguated by seperate grow and attr tags. You effectively grew the neighboring id before claiming its weight as yours. But now that the attr itself creates the grow/insertion, it's ambiguous which one is intended.	2023-04-13 18:58:56 -05:00
Christopher Haster	e5cd2904ee	Tweaked always-follow alts to follow even for 0 tags Changed always-follow alts that we use to terminated grow/shrink/remove operations to use `altle 0xfff0` instead of `altgt 0`. `altgt 0` gets the job done as long as you make sure tag 0 never ends up in an rbyd query. But this kept showing up as a problem, and recent debugging revealed some erronous 0 tag lookups created vestigial alt pointers (not necessarily a problem, but space-wasting). Since we moved to a strict 16-bit tag, making these `altle 0xfff0` doesn't really have a downside, and means we can expect rbyd lookups around 0 to behave how one would normally expect. As a (very minor) plus, the value zero usually has special encodings in instruction sets, so being able to use it for rbyd_lookups offers a (very minor) code size saving. --- Sidenote: The reasons altle/altgt is how it is and asymmetric: 1. Flipping these alts is a single bit-flip, which only happens if they are asymmetric (only one includes the equal case). 2. Our branches are biased to prefer the larger tag. This makes traversal trivial. It might be possible to make this still work with altlt/altge, but would require some increments/decrements, which might cause problems with boundary conditions around the 16-bit tag limit.	2023-03-27 02:32:08 -05:00
Christopher Haster	8f26b68af2	Derived grows/shrinks from rbyd trunk, no longer needing explicit tags I only recently noticed there is enough information in each rbyd trunk to infer the effective grow/shrinks. This has a number of benefits: - Cleans up the tag encoding a bit, no longer expecting tag size to sometimes contain a weight (though this could've been fixed other ways). 0x6 in the lower nibble now reserved exclusively for in-device tags. - grow/shrinks can be implicit to any tag. Will attempt to leverage this in the future. - The weight of an rbyd can no longer go out-of-sync with itself. While this _shouldn't_ happen normally, if it does I imagine it'd be very hard to debug. Now, there is only one source of knowledge about the weight of the rbyd: The most recent set of alt-pointers. Note that remove/unreachable tags now behave _very_ differently when it comes to weight calculation, remove tags require the tree to make the tag unreachable. This is a tradeoff for the above.	2023-03-27 01:45:34 -05:00
Christopher Haster	546fff77fb	Adopted full le16 tags instead of 14-bit leb128 tags The main motivation for this was issues fitting a good tag encoding into 14-bits. The extra 2-bits (though really only 1 bit was needed) from making this not a leb encoding opens up the space from 3 suptypes to 15 suptypes, which is nothing to shake a stick at. The main downsides: 1. We can't rely on leb encoding for effectively-infinite extensions. 2. We can't shorten small tags (crcs, grows, shrinks) to one byte. For 1., extending the leb encoding beyond 14-bits is already unpalatable, because it would increase RAM costs in the tag encoder/decoder,` which must assume a worst-case tag size, and would likely add storage cost to every alt pointer, more on this in the next section. The current encoding is quite generous, so I think it is unlikely we will exceed the 16-bit encoding space. But even if we do, it's possible to use a spare bit for an "extended" set of tags in the future. As for 2., the lack of compression is a downside, but I've realized the only tags that really matter storage-wise are the alt pointers. In any rbyds there will be roughly O(m log m) alt pointers, but at most O(m) of any other tags. What this means is that the encoding of any other tag is in the noise of the encoding of our alt pointers. Our alt pointers are already pretty densely packed. But because the sparse key part of alt-pointers are stored as-is, the worst-case encoding of in-tree tags likely ends up as the encoding of our alt-pointers. So going up to 3-byte tags adds a surprisingly large storage cost. As a minor plus, le16s should be slightly cheaper to encode/decode. It should also be slightly easier to debug tags on-disk. tag encoding: TTTTtttt ttttTTTv ^--------^--^^- 4+3-bit suptype '---\|- 8-bit subtype '- valid bit iiii iiiiiii iiiiiii iiiiiii iiiiiii ^- m-bit id/weight llll lllllll lllllll lllllll lllllll ^- m-bit length/jump Also renamed the "mk" tags, since they no longer have special behavior outside of providing names for entries: - LFSR_TAG_MK => LFSR_TAG_NAME - LFSR_TAG_MKBRANCH => LFSR_TAG_BNAME - LFSR_TAG_MKREG => LFSR_TAG_REG - LFSR_TAG_MKDIR => LFSR_TAG_DIR	2023-03-25 14:36:29 -05:00
Christopher Haster	f4e2a1a9b4	Tweaked dbgbtree.py to show higher names when -i is not specified This avoids showing vestigial names in a case where they could be really confusing.	2023-03-21 13:29:55 -05:00
Christopher Haster	89d5a5ef80	Working implementation of B-tree name split/lookup with vestigial names B-trees with names are now working, though this required a number of changes to the B-tree layout: 1. B-tree no-longer require name entries (LFSR_TAG_MK) on each branch. This is a nice optimization to the design, since these name entries just waste space in purely weight-based B-trees, which are probably going to be most B-trees in the filesystem. If a name entry is missing, the struct entry, which is required, should have the effective weight of the entry. The first entry in every rbyd block is expected to be have no name entry, since this is the default path for B-tree lookups. 2. The first entry in every rbyd block _may_ have a name entry, which is ignored. I'm calling these "vestigial names" to make them sound cooler than they actually are. These vestigial names show up in a couple complicated B-tree operations: - During B-tree split, since pending attributes are calculated before the split, we need to play out pending attributes into the rbyd before deciding what name becomes the name of entry in the parent. This creates a vestigial name which we _could_ immediately remove, but the remove adds additional size to the must-fit split operation - During B-tree pop/merge, if we remove the leading no-name entry, the second, named entry becomes the leading entry. This creates a vestigial name that _looks_ easy enough to remove when making the pending attributes for pop/merge, but turns out the be surprisingly tricky if the parent undergoes a split/merge at the same time. It may be possible to remove all these vestigial names proactively, but this adds additional rbyd lookups to figure out the exact tag to remove, complicates things in a fragile way, and doesn't actually reduce storage costs until the rbyd is compacted. The main downside is that these B-trees may be a bit more confusing to debug.	2023-03-21 12:59:46 -05:00
Christopher Haster	a897b875d3	Implemented lfsr_btree_update and added more tests This was a rather simple exercise. lfsr_btree_commit does most of the work already, so all this needed was setting up the pending attributes correctly. Also: - Tweaked dbgrbyd.py's tree rendering to match dbgbtree.py's. - Added a print to each B-tree test to help find the resulting B-tree when debugging.	2023-03-17 14:20:40 -05:00
Christopher Haster	ce599be70d	Added scripts/dbgbtree.py for debugging B-trees, tweaked dbgrbyd.py An example: $ ./scripts/dbgbtree.py -B4096 disk 0xaa -t -i btree 0xaa.1000, rev 35, weight 278 block ids name tag data (truncated) 00aa.1000: +-+ 0-16 branch id16 3 7e d4 10 ~.. 007e.0854: \| \|-> 0 inlined id0 1 73 s \| \|-> 1 inlined id1 1 74 t \| \|-> 2 inlined id2 1 75 u \| \|-> 3 inlined id3 1 76 v \| \|-> 4 inlined id4 1 77 w \| \|-> 5 inlined id5 1 78 x \| \|-> 6 inlined id6 1 79 y \| \|-> 7 inlined id7 1 7a z \| \|-> 8 inlined id8 1 61 a \| \|-> 9 inlined id9 1 62 b ... This added the idea of block+limit addresses such as 0xaa.1000. Added this as an option to dbgrbyd.py along with a couple other tweaks: - Added block+limit support (0x<block>.<limit>). - Fixed in-device representation indentation when trees are present. - Changed fromtag to implicitly fixup ids/weights off-by-one-ness, this is consistent with lfs.c.	2023-03-17 14:20:10 -05:00

15 Commits