littlefs

Author	SHA1	Message	Date
Christopher Haster	6e5d314c20	Tweaked struct tag encoding so b/m tags are earlier These b/m struct tags have a common pattern that would be good to emphasize in the encoding. The later struct tags get a bit more messy as they leave space for future possible extensions. New encoding: LFSR_TAG_STRUCT 0x03tt v--- --11 -ttt ttrr LFSR_TAG_DATA 0x0300 v--- --11 ---- ---- LFSR_TAG_BLOCK 0x0304 v--- --11 ---- -1rr LFSR_TAG_BSHRUB 0x0308 v--- --11 ---- 1--- LFSR_TAG_BTREE 0x030c v--- --11 ---- 11rr LFSR_TAG_MROOT 0x0310 v--- --11 ---1 --rr LFSR_TAG_MDIR 0x0314 v--- --11 ---1 -1rr LFSR_TAG_MSHRUB* 0x0318 v--- --11 ---1 1--- LFSR_TAG_MTREE 0x031c v--- --11 ---1 11rr LFSR_TAG_DID 0x0320 v--- --11 --1- ---- LFSR_TAG_BRANCH 0x032c v--- --11 --1- 11rr * Hypothetical Note that all shrubs currently end with 1---, and all btrees, including the awkward branch tag, end with 11rr. This had no impact on code size: code stack before: 33564 2816 after: 33564 (+0.0%) 2816 (+0.0%)	2024-05-04 17:24:33 -05:00
Christopher Haster	5fa85583cd	Dropped block-level erased-state checksums for RAM-tracked erased-state Unfortunately block-level erased-state checksums (becksums) don't really work as intended. An invalid becksum _does_ signal that a prog has been attempted, but a valid becksum does _not_ prove that a prog has _not_ been attempted. Rbyd ecksums work, but only thanks to a combination of prioritizing valid commits and the use of perturb bits to force erased-state changes. It _is_ possible to end up with an ecksum collision, but only if you 1. lose power before completing a commit, and 2. end up with a non-trivial crc32c collision. If this does happen, at the very least the resulting commit will likely end up corrupted and thrown away later. Block-level becksums, at least as originally designed, don't have either of these protections. To make matters worse, the blocks these becksums reference contain only raw user data. Write 0xffs into a file and you will likely end up with a becksum collision! This is a problem for a couple of reasons: 1. Progging multiple times to erased-state is likely to result in corrupted data, though this is also likely to get caught with validating writes. Worst case, the resulting data looks valid, but with weakened data retention. 2. Because becksums are stored in the copy-on-write metadata of the file, attempting to open a file twice for writing (or more advanced copy-on-write operations in the future) can lead to a situation where a prog is attempted on _already committed_ data. This is very bad and breaks copy-on-write guarantees. --- So clearly becksums are not fit for purpose and should be dropped. What can we replace them with? The first option, implemented here, is RAM-tracked erased state. Give each lfsr_file_t its own eblock/eoff fields to track the last known good erased-state. And before each prog, clear eblock/eoff so we never accidentally prog to the same erased-state twice. It's interesting to note we don't currently clear eblock/eoff in all file handles, this is ok only because we don't currently share eblock/eoff across file handles. Each eblock/eoff is exclusive to the lfsr_file_t and does not appear anywhere else in the system. The main downside of this approach is that, well, the RAM-tracked erase-state is only tracked in RAM. Block-level erased-state effectively does not persist across reboots. I've considered adding some sort of per-file erased-state tracking to the mdir that would need to be cleared before use, but such a mechanism ends up quite complicated. At the moment, I think the best second option is to put erased-state tracking in the future-planned bmap. This would let you opt-in to on-disk tracking of all erased-state in the system. One nice thing about RAM-tracked erased-state is that it's not on disk, so it's not really a compatibility concern and won't get in the way of additional future erased-state tracking. --- Benchmarking becksums vs RAM-tracking has been quite interesting. While in theory becksums can track much more erased-state, it's quite unlikely anything but the most recent erased-state actually ends up used. The end result is no real measurable performance loss, and actually a minor speedup because we don't need to calculate becksums on every block write. There are some pathological cases, such as multiple write heads, but these are out-of-scope right now (note! multiple explicit file handles currently handle this case beautifully because we don't share eblock/eoff!) Becksums were also relatively complicated, and needed extra scaffolding to pass around/propagate as secondary tags alongside the primary bptr. So trading these for RAM-tracking also gives us a nice bit of code/stack savings, albeit at a 2-word RAM cost in lfsr_file_t: code stack structs before: 33888 2864 1096 after: 33564 (-1.0%) 2816 (-1.7%) 1104 (+0.7%) lfsr_file_t before: 104 lfsr_file_t after: 112 (+7.7%)	2024-05-04 17:22:56 -05:00
Christopher Haster	86a8582445	Tweaked canonical altn to point to itself By definition, altns should never be followed, so it doesn't really matter where they point. But it's not like they can point literally nowhere, so where should they point? A couple options: 1. jump=jump - Wherever the old alt pointed - Easy, literally a noop - Unsafe, bugs could reveal outdated parts of the tree - Encoding size eh 2. jump=0 - Point to offset=0 - Easier, +0 code - Safer, branching to 0 should assert - Worst possible encoding size 3. jump=itself - Point to itself - A bit tricky, +4 code - Safe, should assert, even without asserts worst case infinite loop - Optimal encoding size An infinite loop isn't the best failure state, but we can catch this with an assert, which we would need for jump=0 anyways. And this is only a concern if there are other fs bugs. jump=0 is actually slightly worse if asserts are disabled, since we'd end up reading the revision count as garbage. Adopting jump=itself gives us the optimal 4-byte encoding: altbn w0 = 40 00 00 00 '-+-' ^ ^ '----\|--\|-- tag = altbn '--\|-- weight = 0 '-- jump = itself (branch - 0) This requires tweaking the alt encoder a bit, to avoid relative encoding jump=0s, but this is pretty cheap: code stack jump=jump: 34068 2864 jump=0: 34068 (+0.0%) 2864 (+0.0%) jump=itself: 34072 (+0.0%) 2864 (+0.0%) I thought we may need to also tweak the decoder, so later trunk copies don't accidentally point to the old location, but humorously our pruning kicks in redundantly to reset altbn's jump=itself on every trunk. Note lfsr_rbyd_lookupnext was also rearranged a bit to make it easier to assert on infinite loops and this also added some code. Probably just due to compiler noise: code stack before: 34068 2864 after: 34076 (+0.0%) 2864 (+0.0%) Also note that we still accept all of the above altbn encoding options. This only affects encoding and dbg scripts.	2024-04-28 13:21:46 -05:00
Christopher Haster	faf8c4b641	Tweaked alt-tag encoding to match color/dir naming order This is mainly to avoid mistakes caused by names/encodings disagreeing: LFSR_TAG_ALT 0x4kkk v1cd kkkk -kkk kkkk ^ ^^ '------+-----' '-\|\|--------\|------- valid bit '\|--------\|------- color '--------\|------- dir '------- key Notably, the LFSR_TAG_ALT() macro has already caused issues by being both 1. ambiguous, and 2. not really type-checkable. It's easy to get the order wrong and things not really break, just behave poorly, it's really not great! To be honest the exact order is a bit arbitrary, the color->dir naming appeared by accident because I guess it felt more natural. Maybe because of English's weird implicit adjective ordering? Maybe because of how often conditions show up as the last part of the name in other instruction sets? At least one plus is that this moves the dir-bit next to the key. This makes it so all of the condition information is encoding is the lowest 13-bits of the tag, which may lead to minor optimization tricks for implementing flips and such. Code changes: code stack before: 34080 2864 after: 34068 (-0.0%) 2864 (+0.0%)	2024-04-28 13:21:41 -05:00
Christopher Haster	8a646d5b8e	Added dbgtag.py for easy tag decoding on the command-line Example: $ ./scripts/dbgtag.py 0x3001 cksum 0x01 dbgtag.py inherits most of crc32c.py's decoding options. The most useful probably being -x/--hex: $ ./scripts/dbgtag.py -x e1 00 01 8a 09 altbgt 0x100 w1 -1162 dbgtag.py also supports reading from a block device if either -b/--block-size or --off are provided. This is mainly for consistency with the other dbg*.py scripts: $ ./scripts/dbgtag.py disk -b4096 0x2.1e4 bookmark w1 1 This should help when debugging and finding a raw tag/alt in some register. Manually decoding is just an unnecessary road bump when this happens.	2024-04-01 16:29:13 -05:00

5 Commits