littlefs

Author	SHA1	Message	Date
Christopher Haster	1c5adf71b3	Implemented self-validating global-checksums (gcksums) This was quite a puzzle. The problem: How do we detect corrupt mdirs? Seems like a simple question, but we can't just rely on mdir cksums. Our mdirs are independently updateable logs, and logs have this annoying tendency to "rollback" to previously valid states when corrupted. Rollback issues aren't littlefs-specific, but what _is_ littlefs- specific is that when one mdir rolls back, it can disagree with other mdirs, resulting in wildly incorrect filesystem state. To solve this, or at least protect against disagreeable mdirs, we need to somehow include the state of all other mdirs in each mdir commit. --- The first thought: Why not use gstate? We already have a system for storing distributed state. If we add the xor of all of our mdir cksums, we can rebuild it during mount and verify that nothing changed: .--------. .--------. .--------. .--------. .\| mdir 0 \| .\| mdir 1 \| .\| mdir 2 \| .\| mdir 3 \| \|\| \| \|\| \| \|\| \| \|\| \| \|\| gdelta \| \|\| gdelta \| \|\| gdelta \| \|\| gdelta \| \|'-----\|--' \|'-----\|--' \|'-----\|--' \|'-----\|--' '------\|-' '------\|-' '------\|-' '------\|-' '--.------' '--.------' '--.------' '--.------' cksum \| cksum \| cksum \| cksum \| \| \| v \| v \| v \| '---------> xor -------> xor -------> xor -------> gcksum \| v v v =? '---------> xor -------> xor -------> xor ---> gcksum Unfortunately it's not that easy. Consider what this looks like mathematically (g is our gcksum, c_i is an mdir cksum, d_i is a gcksumdelta, and +/-/sum is xor): g = sum(c_i) = sum(d_i) If we solve for a new gcksumdelta, d_i: d_i = g' - g d_i = g + c_i - g d_i = c_i The gcksum cancels itself out! We're left with an equation that depends only on the current mdir, which doesn't help us at all. Next thought: What if we permute the gcksum with a function t before distributing it over our gcksumdeltas? .--------. .--------. .--------. .--------. .\| mdir 0 \| .\| mdir 1 \| .\| mdir 2 \| .\| mdir 3 \| \|\| \| \|\| \| \|\| \| \|\| \| \|\| gdelta \| \|\| gdelta \| \|\| gdelta \| \|\| gdelta \| \|'-----\|--' \|'-----\|--' \|'-----\|--' \|'-----\|--' '------\|-' '------\|-' '------\|-' '------\|-' '--.------' '--.------' '--.------' '--.------' cksum \| cksum \| cksum \| cksum \| \| \| v \| v \| v \| '---------> xor -------> xor -------> xor -------> gcksum \| \| \| \| .--t--' \| \| \| \| '-> t(gcksum) \| v v v =? '---------> xor -------> xor -------> xor ---> t(gcksum) In math terms: t(g) = t(sum(c_i)) = sum(d_i) In order for this to work, t needs to be non-linear. If t is linear, the same thing happens: d_i = t(g') - t(g) d_i = t(g + c_i) - t(g) d_i = t(g) + t(c_i) - t(g) d_i = t(c_i) This was quite funny/frustrating (funnistrating?) during development, because it means a lot of seemingly obvious functions don't work! - t(g) = g - Doesn't work - t(g) = crc32c(g) - Doesn't work because crc32cs are linear - t(g) = g^2 in GF(2^n) - g^2 is linear in GF(2^n)!? Fortunately, powers coprime with 2 finally give us a non-linear function in GF(2^n), so t(g) = g^3 works: d_i = g'^3 - g^3 d_i = (g + c_i)^3 - g^3 d_i = (g^2 + gc_i + gc_i + c_i^2)(g + c_i) - g^3 d_i = (g^2 + c_i^2)(g + c_i) - g^3 d_i = g^3 + gc_i^2 + g^2c_i + c_i^3 - g^3 d_i = gc_i^2 + g^2c_i + c_i^3 --- Bleh, now we need to implement finite-field operations? Well, not entirely! Note that our algorithm never uses division. This means we don't need a full finite-field (+, -, , /), but can get away with a finite-ring (+, -, ). And conveniently for us, our crc32c polynomial defines a ring epimorphic to a 31-bit finite-field. All we need to do is define crc32c multiplication as polynomial multiplication mod our crc32c polynomial: crc32cmul(a, b) = pmod(pmul(a, b), P) And since crc32c is more-or-less just pmod(x, P), this lets us take advantage of any crc32c hardware/tables that may be available. --- Bunch of notes: - Our 2^n-bit crc-ring maps to a 2^n-1-bit finite-field because our crc polynomial is defined as P(x) = Q(x)(x + 1), where Q(x) is a 2^n-1-bit irreducible polynomial. This is a common crc construction as it provides optimal odd-bit/2-bit error detection, so it shouldn't be too difficult to adapt to other crc sizes. - t(g) = g^3 is not the only function that works, but it turns out to be a pretty good one: - 3 and 2^(2^n-1)-1 are coprime, which means our function t(g) = g^3 provides a one-to-one mapping in the underlying fields of all crc rings of size 2^(2^n). We know 3 and 2^(2^n-1)-1 are coprime because 2^(2^n-1)-1 = 2^(2^n)-1 (a Fermat number) - 2^(2^n-1) (a power-of-2), and 3 divides Fermat numbers >=3 (A023394) and is not 2. - Our delta, when viewed as a polynomial in g: d(g) = gc^2 + g^2c + c^3, has degree 2, which implies there are at most 2 solutions or 1-bit of information loss in the underlying field. This is optimal since the original definition already had 2 solutions before we even chose a function: d(g) = t(g + c) - t(g) d(g) = t(g + c) - t((g + c) - c) d(g) = t((g + c) + c) - t(g + c) d(g) = d(g + c) Though note the mapping of our crc-ring to the underlying field already represents 1-bit of information loss. - If you're using a cryptographic hash or other non-crc, you should probably just use an equal sized finite-field. Though note changing from a 2^n-1-bit field to a 2^n-bit field does change the math a bit, with t(g) = g^7 being a better non-linear function: - 7 is the smallest odd-number coprime with 2^n-1, a Fermat number, which makes t(g) = g^7 a one-to-one mapping. 3 humorously divides all 2^n-1 Fermat numbers. - Expanding delta with t(g) = g^7 gives us a 6 degree polynomial, which implies at most 6 solutions or ~3-bits of information loss. This isn't actually the best you can do, some exhaustive searching over small fields (<=2^16) suggests t(g) = g^(2^(n-1)-1) _might_ be optimal, but that's a heck of a lot more multiplications. - Because our crc32cs preserve parity/are epimorphic to parity bits, addition (xor) and multiplication (crc32cmul) also preserve parity, which can be used to show our entire gcksum system preserves parity. This is quite neat, and means we are guaranteed to detect any odd number of bit-errors across the entire filesystem. - Another idea was to use two different addition operations: xor and overflowing addition (or mod a prime). This probably would have worked, but lacks the rigor of the above solution. - You might think an RS-like construction would help here, where g = sum(c_ia^i), but this suffers from the same problem: d_i = g' - g d_i = g + c_ia^i - g d_i = c_ia^i Nothing here depends on anything outside of the current mdir. - Another question is should we be using an RS-like construction anyways to include location information in our gcksum? Maybe in another system, but I don't think it's necessary in littlefs. While our mdir are independently updateable, they aren't _entirely_ independent. The location of each mdir is stored in either the mtree or a parent mdir, so it always gets mixed into the gcksum somewhere. The only exception being the mrootanchor which is always at the fixed blocks 0x{0,1}. - This does _not_ catch "global-rollback" issues, where the most recent commit in the entire filesystem is corrupted, revealing an older, but still valid, filesystem state. But as far as I am aware this is just a fundamental limitation of powerloss-resilient filesystems, short of doing destructive operations. At the very least, exposing the gcksum would allow the user to store it externally and prevent this issue. --- Implementation details: - Our gcksumdelta depends on the rbyd's cksum, so there's a catch-22 if we include it in the rbyd itself. We can avoid this by including it in the commit tags (actually the separate canonical cksum makes this easier than it would have been earlier), but this does mean LFSR_TAG_GCKSUMDELTA is not an LFSR_TAG_GDELTA subtype. Unfortunate but not a dealbreaker. - Reading/writing the gcksumdelta gets a bit annoying with it not being in the rbyd. For now I've extended the low-level lfsr_rbyd_fetch_/ lfsr_rbyd_appendcksum_ to accept an optional gcksumdelta pointer, which is a bit awkward, but I don't know of a better solution. - Unlike the grm, _every_ mdir commit involves the gcksum, which means we either need to propagate the gcksumdelta up the mroot chain correctly, or somehow keep track of partially flushed gcksumdeltas. To make this work I modified the low-level lfsr_mdir_commit__ functions to accept start_rid=-2 to indicate when gcksumdeltas should be flushed. It's a bit of a hack, but I think it might make sense to extend this to all gdeltas eventually. The gcksum cost both code and RAM, but I think it's well worth it for removing an entire category of filesystem corruption: code stack ctx before: 37796 2608 620 after: 38428 (+1.7%) 2640 (+1.2%) 644 (+3.9%)	2025-02-08 14:53:30 -06:00
Christopher Haster	b6ab323eb1	Dropped the q-bit (previous-perturb) from cksum tags Now that we perturb commit cksums with the odd-parity zero, the q-bit no longer serves a purpose other than extra debug info. But this is a double-edged sword, because redundant info just means another thing that can go wrong. For example, should we assert? If the q-bit doesn't reflect the previous-perturb state it's a bug, but the only thing that would break would be the q-bit itself. And if we don't assert what's the point of keeping the q-bit around? Dropping the q-bit avoids answering this question and saves a bit of code: code stack ctx before: 37772 2608 620 after: 37768 (-0.0%) 2608 (+0.0%) 620 (+0.0%)	2025-01-28 14:41:45 -06:00
Christopher Haster	66bf005bb8	Renamed LFSR_TAG_ORPHAN -> LFSR_TAG_STICKYNOTE I've been unhappy with LFSR_TAG_ORPHAN for a while now. While it's true these represent orphaned files, they also represent zombied files. And as long as a reference to the file exists in-RAM, I find it hard to say these files are truely "orphaned". We're also just using the term "orphan" for too many things. Really this tag just represents an mid reservation. The term stickynote works well enough for this, and fits in with the other internal tag, LFSR_TAG_BOOKMARK.	2025-01-28 14:41:45 -06:00
Christopher Haster	62cc4dbb14	scripts: Disabled local import hack on import Moved local import hack behind if __name__ == "__main__" These scripts aren't really intended to be used as python libraries. Still, it's useful to import them for debugging and to get access to their juicy internals.	2025-01-28 14:41:30 -06:00
Christopher Haster	7cfcc1af1d	scripts: Renamed summary.py -> csv.py This seems like a more fitting name now that this script has evolved into more of a general purpose high-level CSV tool. Unfortunately this does conflict with the standard csv module in Python, breaking every script that imports csv (which is most of them). Fortunately, Python is flexible enough to let us remove the current directory before imports with a bit of an ugly hack: # prevent local imports __import__('sys').path.pop(0) These scripts are intended to be standalone anyways, so this is probably a good pattern to adopt.	2024-11-09 12:31:16 -06:00
Christopher Haster	a0ab7bda26	scripts: Avoid rereading shrub blocks This extends Rbyd.fetch to accept another rbyd, in which case we inherit the RAM-backed block without rereading it from disk. This avoids an issue where shrubs can become corrupted if the disk is being simultaneously written and debugged. Normally we can detect the checksum mismatch and toss out the rbyd during fetch, but shrub pointers don't include a checksum since they assume the containing rbyd has already been checksummed. It's interesting to note this even avoids the memory copy thanks to Python's reference counting.	2024-11-08 02:24:56 -06:00
Christopher Haster	0260f0bcee	scripts: Added better branch cksum checks If we're fetching branches anyways, we might as well check that the checksums match. This helps protect against infinite loops in B-tree branches. Also fixed an issue where we weren't xoring perturb state on finding an explicit trunk. Note this is equivalent to LFS_M_CKFETCHES in lfs.c. --- This doesn't mean we always need LFS_M_CKFETCHES. Our dbg scripts just need to be a little bit tougher because 1. running tests with -j creates wildly corrupted and entangled littlefs images, and 2. Rbyd.fetch is almost too forgiving in choosing the nearest trunk.	2024-11-08 02:20:19 -06:00
Christopher Haster	e3fdc3dbd7	scripts: Added simple mroot cycle detectors to dbg scripts These work by keeping a set of all seen mroots as we descend down the mroot chain. Simple, but it works. The downside of this approach is that the mroot set grows unbounded, but it's unlikely we'll ever have enough mroots in a system for this to really matter. This fixes scripts like dbgbmap.py getting stuck on intentional mroot cycles created for testing. It's not a problem for a foreground script to get stuck in an infinite loop, since you can just kill it, but a background script getting stuck at 100% CPU is a bit more annoying.	2024-11-07 11:46:39 -06:00
Christopher Haster	007ac97bec	scripts: Adopted double-indent on multiline expressions This matches the style used in C, which is good for consistency: a_really_long_function_name( double_indent_after_first_newline( single_indent_nested_newlines)) We were already doing this for multiline control-flow statements, simply because I'm not sure how else you could indent this without making things really confusing: if a_really_long_function_name( double_indent_after_first_newline( single_indent_nested_newlines)): do_the_thing() This was the only real difference style-wise between the Python code and C code, so now both should be following roughly the same style (80 cols, double-indent multiline exprs, prefix multiline binary ops, etc).	2024-11-06 15:31:17 -06:00
Christopher Haster	48c2e7784b	scripts: Renamed import math alias m -> mt Mainly to avoid conflicts with match results m, this frees up the single letter variables m for other purposes. Choosing a two letter alias was surprisingly difficult, but mt is nice in that it somewhat matches it (for itertools) and ft (for functools).	2024-11-05 01:58:40 -06:00
Christopher Haster	4d8bfeae71	attrs: Reduced UATTR/SATTR range down to 7-bits It would be nice to have a full 8-bit range for both user attrs and system attrs, for both backwards compatibility and maximizing the available attr space, but I think it just doesn't make sense from an API perspective. Sure we could finagle the user/sys bit into a flags argument, or provide separate lfsr_getuattr/getsattr functions, but asking users to use a 9-bit int for higher-level operations (dynamic attrs, iteration, etc) is a bit much... So this reduces the two attr ranges down to 7-bits, requiring 8-bits total to store all possible attr types in the current system: TAG_ATTR 0x0400 v--- -1-a -aaa aaaa TAG_UATTR 0x04aa v--- -1-- -aaa aaaa TAG_SATTR 0x05aa v--- -1-1 -aaa aaaa This really just affects scripts, since we haven't actually implemented attributes yet. Worst case we still have the 9-bit encoding space carved out, so we can always add an additional set of attrs in the future if we start running into attr pressure. Or, you know, just turn on the subtype leb128 encoding the 8th subtype bit is reserved for. Then you'd only be limited by internal driver details, probably 24-bits per attr range if we make tags 32-bits internally. Though this would probably come with quite a code cost...	2024-08-22 00:59:09 -05:00
Christopher Haster	c00e0b2af6	Fixed explicit trunks messing with canonical checksums Updating the canonical checksum should only depend on if the tag is a trunkish tag (not a checksum tag), and not if the tag is in the current trunk. The trunk parameter to lfsr_rbyd_fetch should have no effect on the canonical checksum. Fixed in boath lfsr_rbyd_fetch and scripts. Curiously no code changes: code stack before: 36416 2616 after: 36416 (+0.0%) 2616 (+0.0%	2024-08-20 12:03:48 -05:00
Christopher Haster	1044c9d2b7	Adopted odd-parity-zero rbyd perturb scheme I've been scratching my head over our rbyd perturb scheme. It's gotten rather clunky with needing to xor valid bits and whatnot. But it's tricky with needing erased-state to be included in parity bits, while at the same time excluded from our canonical checksum. If only there was some way to flip the checksums parity without changing its value... Enter the crc32c odd-parity zero: 0xfca42daf! This bends the definition of zero a bit, but it is one of two numbers in our crc32c-ring with a very interesting property: crc32c(m) == crc32c(m xor 0xfca42daf) xor 0xfca42daf // odd-p zero crc32c(m) == crc32c(m xor 0x00000000) xor 0x00000000 // even-p zero Recall that crc32c's polynomial, 0x11edc6f41, is composed of two polynomials: 0x3, the parity polynomial, and 0xf5b4253f, a maximally sized irreducible polynomial. Because our polynomial breaks down into two smaller polynomials, our crc32c space turns out to not be a field, but rather a ring containing two smaller sub-fields. Because these sub-fields are defined by their polynomials, one is the 31-bit crc defined by the polynomial 0xf5b4253f, while the other is the current parity. We can move in the parity sub-field without changing our position in the 31-bit crc sub-field by xoring with a number that is one in the parity sub-field, but zero in the 31-bit crc sub-field. This number happens to be 0xf5b4253f (0xfca42daf bit-reversed)! (crcs being bit-reversed will never not be annoying) So long story short, xoring any crc32c with 0xfca42daf will change its parity but not its value. --- An that's basically our new perturb scheme. If we need to perturb, xor with 0xfca42daf to change the parity, and after calculating/validating the checksum, xor with 0xfca42daf to get our canonical checksum. Isn't that neat! There was one small hiccup: At first I assumed you could continue including the valid bits in the checksum, which would have been nice for bulk checksumming. But this doesn't work because while valid bits cancel out so the parity doesn't change, changing valid bits _does_ change the underlying 31-bit crc, poisoning our checksum and making everything a mess. So we still need to mask out valid bits, which is a bit annoying. But then I stumbled on the funny realization that by masking our valid bits, we accidentally end up with a fully functional parity scheme. Because valid bits _don't_ include the previous valid bit, we can figure out the parity for not only the entire commit, but also each individual tag: 80 03 00 08 6c 69 74 74 6c 65 66 73 80 ^'----------------.---------------' ^ \| \| \| v + parity = v' Or more simply: 80 03 00 08 6c 69 74 74 6c 65 66 73 80 '----------------.----------------' ^ \| \| parity = v' Double neat! Some other notes: - By keeping the commit checksum perturbed, but not the canonical checksum, the perturb state is self-validating. We no longer need to explicitly check the previous-perturb-bit (q) to avoid the perturb hole we ran into previously. I'm still keeping the previous-perturb-bit (q) around, since it's useful for debugging. We still need to know the perturb state internally at all times in order to xor out the canonical checksum correctly anyways. - Thanks to all of our perturb iterations, we now know how to remove the valid bits from the checksum easily: cksum ^= 0x00000080 & (tag >> 8) This makes the whole omitting-valid-bits thing less of a pain point. - It wasn't actually worth it to perturb the checksum when building commits, vs manually flipping each valid bit, as this would have made our internal appendattr API really weird. At least the perturbed checksum made fetch a bit simpler. Not sure exactly how to draw this with our perturb scheme diagrams, maybe something like this? .---+---+---+---. \ \ \ \ \|v\| tag \| \| \| \| \| +---+---+---+---+ \| \| \| \| \| commit \| \| \| \| \| \| \| +-. \| \| \| +---+---+---+---+ / \| \| \| \| \|v\|qp-------------->p>p-->p . +---+---+---+---+ \| . . . \| cksum \| \| . . . +---+---+---+---+ \| . . . \| padding \| \| . . . \| \| \| . . . +---+---+---+---+ \| \| \| \| \|v------------------' \| \| \| +---+---+---+---+ \| \| \| \| commit \| +-. \| +- rbyd \| \| \| \| \| \| cksum +---+---+---+---+ / \| +-. / \|v----------------------' \| \| +-------+---+---+ / \| \| cksum ----------------' +---+---+---+---+ \| padding \| \| \| +---+---+---+---+ \| erased \| \| \| . . . . --- Code changes were minimal, saving a tiny bit of code: code stack before: 36368 2664 after: 36352 (-0.0%) 2672 (+0.3%) There was a stack bump in lfsr_bd_readtag, but as far as I can tell it's just compiler noise? I poked around a bit but couldn't figure out why it changed...	2024-08-16 01:03:43 -05:00
Christopher Haster	fb73f78c91	Updated comments to prefer "canonical checksum" for rbyd checksums I think this describes the goal of the non-perturbed rbyd checksums decently. At the very least it's less wrong that "data checksum", and calling it the "metadata checksum" would just be confusing. (Would our commit checksum be the "metametadata checksum" then?)	2024-07-31 12:29:13 -05:00
Christopher Haster	c739e18f6f	Renamed LFSR_TAG_NOISE -> LFSR_TAG_NOTE Sort of like SHT_NOTE in elf files, but with no defined format. Using LFSR_TAG_NOTE for additional noise/nonces is still encouraged, but it can also be used to add debug info.	2024-06-20 13:04:20 -05:00
Christopher Haster	ae0e3348fe	Added -l/--list to dbgtag.py Inspired by errno's/dbgerr.py's -l/--list, this gives a quick and easy list of the current tag encodings, which can be very useful: $ ./scripts/dbgtag.py -l LFSR_TAG_NULL 0x0000 v--- ---- ---- ---- LFSR_TAG_CONFIG 0x00tt v--- ---- -ttt tttt LFSR_TAG_MAGIC 0x0003 v--- ---- ---- --11 LFSR_TAG_VERSION 0x0004 v--- ---- ---- -1-- ... snip ... We already need to keep dbgtag.py in-sync or risk a bad debugging experience, so we might as well let it tell us all the information it currently knows. Also yay for self-inspecting code, I don't know if it's bad that I'm becoming a fan of parsing information out of comments...	2024-06-20 13:02:08 -05:00
Christopher Haster	898f916778	Fixed pl hole in perturb logic Turns out there's very _very_ small powerloss hole in our current perturb logic. We rely on tag valid bits to validate perturb bits, but these intentionally don't end up in the commit checksum. This means there will always be a powerloss hole when we write the last valid bit. If we lose power after writing that bit, suddenly the remaining commit and any following commits may appear as valid. Now, this is really unlikely considering we need to lose power exactly when we write the cksum tag's valid bit, and our nonce helps protect against this. But a hole is a hole. The solution here is to include the _current_ perturb bit (q) in the commit's cksum tag, alongside the _next_ perturb bit (p). This will be included in the commit's checksum, but _not_ in the canonical checksum, allowing the commit's checksum validate the current perturb state without ruining our erased-state agnostic checksums: .---+---+---+---. . . .---+---+---+---. \ \ \ \ \|v\| tag \| \|v\| tag \| \| \| \| \| +---+---+---+---+ +---+---+---+---+ \| \| \| \| \| commit \| \| commit \| \| \| \| \| \| \| \| \| +-. \| \| \| +---+---+---+---+ +---+---+---+---+ / \| \| \| \| \|v\|qp-------------. \|v\|qp\| tag \| \| . . . +---+---+---+---+ \| +---+---+---+---+ \| . . . \| cksum \| \| \| cksum \| \| . . . +---+---+---+---+ \| +---+---+---+---+ \| . . . \| padding \| \| \| padding \| \| . . . \| \| \| \| \| \| . . . +---+---+---+---+ \| . +---+---+---+---+ \| \| \| \| \| erased \| +-> \|v------------------' \| \| \| \| \| \| +---+---+---+---+ \| \| \| . . \| \| commit \| +-. \| +- rbyd . . \| \|.----------------. \| \| \| \| cksum \| +\| -+---+---+---+ \| / \| +-. / +-> \|v\|qp\| tag \| '-----' \| \| \| +- ^ ---+---+---+ / \| '------' cksum ----------------' +---+---+---+---+ \| padding \| \| \| +---+---+---+---+ \| erased \| \| \| . . . . (Ok maybe this diagram needs work...) This adds another thing that needs to be checked during rbyd fetch, and note, we _do_ need to explicitly check this, but it solves the problem. If power is loss after v, q would be invalid, and if power is lost after q, our cksum would be invalid. Note this would have also been an issue for the previous cksum + parity perturb scheme. Code changes: code stack before: 33570 2592 after: 33598 (+0.1%) 2592 (+0.0%)	2024-06-07 19:41:47 -05:00
Christopher Haster	8a4f6fcf68	Adopted a simpler rbyd perturb scheme The previous cksum + parity scheme worked, but needing to calculate both cksum + parity on slightly different sets of metadata felt overly complicated. After taking a step back, I've realized the problem is that we're trying to force perturb effects to be implicit via the parity. If we instead actually implement perturb effects explicitly, things get quite a bit simpler... This does add a bit more logic to the read path, but I don't think it's worse than the mess we needed to parse separate cksum + parity. Now, the perturb bit has the explicit behavior of inverting all tag valid bits in the following commit. Which is conveniently the same as xoring the crc32c with 00000080 before parsing each tag: .---+---+---+---. . . .---+---+---+---. \ \ \ \ \|v\| tag \| \|v\| tag \| \| \| \| \| +---+---+---+---+ +---+---+---+---+ \| \| \| \| \| commit \| \| commit \| \| \| \| \| \| \| \| \| +-. \| \| \| +---+---+---+---+ +---+---+---+---+ / \| \| \| \| \|v\|p--------------. \|v\|p\| tag \| \| . . . +---+---+---+---+ \| +---+---+---+---+ \| . . . \| cksum \| \| \| cksum \| \| . . . +---+---+---+---+ \| +---+---+---+---+ \| . . . \| padding \| \| \| padding \| \| . . . \| \| \| \| \| \| . . . +---+---+---+---+ \| . +---+---+---+---+ \| \| \| \| \| erased \| +-> \|v------------------' \| \| \| \| \| \| +---+---+---+---+ \| \| \| . . \| \| commit \| +-. \| +- rbyd . . \| \| \| \| \| \| \| cksum \| +---+---+---+---+ / \| +-. / '-> \|v----------------------' \| \| +---+---+---+---+ / \| \| cksum ----------------' +---+---+---+---+ \| padding \| \| \| +---+---+---+---+ \| erased \| \| \| . . . . With this scheme, we don't need to calculate a separate parity, because each valid bit effectively validates the current state of the perturb bit. We also don't need extra logic to omit valid bits from the cksum, because flipping all valid bits effectively makes perturb=0 the canonical metadata encoding and cksum. --- I also considered only inverting the first valid bit, which would have the additional benefit of allowing entire commits to be crc32ced at once, but since we don't actually track when we've started a commit this turned out to be quite a bit more complicated than I thought. We need someway to validate the first valid bit, otherwise it could be flipped by a failed prog and we'd never notice. This is fine, we can store a copy of the previous perturb bit in the next cksum tag, but it does mean we need to track the perturb bit for the duration of the commit. So we'd end up needing to track both start-of-commit and the perturb bit state, which starts getting difficult to fit into our rbyd struct... It's easier and simpler to just flip every valid bit. As a plus this means every valid bit contributes to validating the perturb bit. --- Also renamed LFSR_TAG_PERTURB -> LFSR_TAG_NOISE just to avoid confusion. Though not sure if this tag should stick around... The end result is a nice bit of code/stack savings, which is what we'd expect with a simpler scheme: code stack before: 33746 2600 after: 33570 (-0.5%) 2592 (-0.3%)	2024-06-07 18:24:13 -05:00
Christopher Haster	2e8012681b	Tweaked dbg script headers to match the mount info log The main difference being rendering the weight with a single letter "w" prefix: $ ./scripts/dbglfs.py disk -b4096 littlefs v0.0 4096x256 0x{1,0}.8b w2.512, rev eb7f2a0d ... This lets us add valuable weight info without too much noise. Adopting this in the dbg scripts is nice for consistency.	2024-05-24 14:56:11 -05:00
Christopher Haster	56b18dfd9a	Reworked revision count logic a bit, block_cycles -> block_recycles The original goal here was to restore all of the revision count/ wear-leveling features that were intentionally ignored during refactoring, but over time a few other ideas to better leverage our revision count bits crept in, so this is sort of the amalgamation of that... Note! None of these changes affect reading. mdir fetch strictly needs only to look at the revision count as a big 32-bit counter to determine which block is the most recent. The interesting thing about the original definition of the revision count, a simple 32-bit counter, is that it actually only needs 2-bits to work. Well, three states really: 1. most recent, 2. less recent, 3. future most recent. This means the remaining bits are sort of up for grabs to other things. Previously, we've used the extra revision count bits as a heuristic for wear-leveling. Here we reintroduce that, a bit more rigorously, while also carving out space for a nonce to help with commit collisions. Here's the new revision count breakdown: vvvvrrrr rrrrrrnn nnnnnnnn nnnnnnnn '-.''----.----''---------.--------' '------\|---------------\|---------- 4-bit relocation revision '---------------\|---------- recycle-bits recycle counter '---------- pseudorandom nonce - 4-bit relocation revision We technically only need 2-bits to tell which block is the most recent, but I've bumped it up to 4-bits just to be safe and to make it a bit more readable in hex form. - recycle-bits recycle counter A user configurable counter, this counter tracks how many times a metadata block has been erased. When it overflows we return the block to the allocator to participate in block-level wear-leveling again. This implements our copy-on-bounded-write strategy. - pseudorandom nonce The remaining bits we fill with a pseudorandom nonce derived from the filesystem's prng. Note this prng isn't the greatest (it's just the xor of all mdir cksums), but it gets the job done. It should also be reproducible, which can be a good thing. Suggested by ithinuel, the addition of a nonce should help with the commit collision issue caused by noop erases. It doesn't completely solve things, since we're only using crc32c cksums not collision resistant cryptographic hashes, but we still have the existing valid/perturb bit system to fall back on. When we allocate a new mdir, we want to zero the recycle counter. This is where our relocation revision is useful for indicating which block is the most recent: initial state: 10101010 10101010 10101010 10101010 '-.' +1 zero random v .----'----..---------'--------. lfsr_rev_init: 10110000 00000011 01110010 11101111 When we increment, we increment recycle counter and xor in a new nonce: initial state: 10110000 00000011 01110010 11101111 '--------.----''---------.--------' +1 xor <-- random v v lfsr_rev_init: 10110000 00000111 01010100 01000000 And when the recycle counter overflows, we relocate the mdir. If we aren't wear-leveling, we just increment the relocation revision to maximize the nonce. --- Some other notes: - Renamed block_cycles -> block_recycles. This is intended to help avoid confusing block_cycles with the actual physical number of erase cycles supported by the device. I've noticed this happening a few times, and it's unfortunately equivalent to disabling wear-leveling completely. This can be improved with better documentation, but also changing the name doesn't hurt. - We now relocate both blocks in the mdir at the same time. Previously we only relocated one block in the mdir per recycle. This was necessary to keep our threaded linked-list in sync, but the threaded linked-list is now no more! Relocating both blocks is simpler, updates the mtree less often, compatible with metadata redundancy, and avoids aliasing issues that were a problem when relocating one block. Note that block_recycles is internally multiplied by 2 so each block sees the correct number of erase cycles. - block_recycles is now rounded down to a power-of-2. This makes the counter logic easier to work with and takes up less RAM in lfs_t. This is a rough heuristic anyways. - Moved the lfs->seed updates into lfsr_mountinited + lfsr_mdir_commit. This avoids readonly operations affecting the seed and should help reproducibility. - Changed rev count in dbg scripts to render as hex, similar to cksums. Now that we using most of the bits in the revision count, the decimal version is, uh, not helpful... Code changes: code stack before: 33342 2640 after: 33434 (+0.3%) 2640 (+0.0%)	2024-05-22 18:49:05 -05:00
Christopher Haster	11c948678f	Renamed size_limit -> file_limit This limits the maximum size of a file, which is also implies the maximum integer size required to mount. The exact name is a bit of a toss-up. I originally went with size_limit to avoid confusion around if file_limit reflected the file size or the number of files, but since this ends up mapping to lfs_off_t and _not_ lfs_size_t, I think size_limit may be a bit of a bad choice.	2024-05-18 13:00:15 -05:00
Christopher Haster	8a75a68d8b	Made rbyd cksums erased-state agnostic Long story short, rbyd checksums are now fully reproducible. If you write the same set of tags to any block, you will end up with the same checksum. This is actually a bit tricky with littlefs's constraints. --- The main problem boils down to erased-state. littlefs has a fairly flexible model for erased-state, and this brings some challenges. In littlefs, storage goes through 2 states: 1. Erase - Prepare storage for progging. Reads after an erase may return arbitrary, but consistent, values. 2. Prog - Program storage with data. Storage must be erased and no progs attempted. Reads after a prog must return the new data. Note in this model erased-state may not be all 0xffs, though it likely will be for flash. This allows littlefs to support a wide range of other storage devices: SD, RAM, NVRAM, encryption, ECC, etc. But this model also means erased-state may be different from block to block, and even different on later erases of the same block. And if that wasn't enough of a challenge, _erased-state can contain perfectly valid commits_. Usually you can expect arbitrary valid cksums to be rare, but thanks to SD, RAM, etc, modeling erase as a noop, valid cksums in erased-state is actually very common. So how do we manage erased-state in our rbyds? First we need some way to detect it, since we can't prog if we're not erased. This is accomplished by the forward-looking erased-state cksum (ecksum): .---+---+---+---. \ \| commit \| \| \| \| \| \| \| \| +---+---+---+---+ +-. \| ecksum -------. \| \| <-- ecksum - cksum of erased state +---+---+---+---+ \| / \| \| cksum --------\|---' <-- cksum - cksum of commit, +---+---+---+---+ \| including ecksum \| padding \| \| \| \| \| +---+---+---+---+ \ \| \| erased \| +-' \| \| / . . . . You may have already noticed the start of our problems. The ecksum contains the erased-state, which is different per-block, and our rbyd cksum contains the ecksum. We need to include the ecksum so we know if it's valid, but this means our rbyd cksum changes block to block. Solving this is simple enough: Stop the rbyd's canonical cksum before the ecksum, but include the ecksum in the actual cksum we write to disk. Future commits will need to start from the canonical cksum, so the old ecksum won't be included in new commits, but this shouldn't be a problem: .---+---+---+---. . . \ . \ . . . . .---+---+---+---. \ \ \| commit \| \| \| \| commit \| \| \| \| \| \| +- rbyd \| \| \| \| \| \| \| \| cksum \| \| \| \| +---+---+---+---+ +-. / +---+---+---+---+ \| \| \| ecksum -------. \| \| \| ecksum \| . . +---+---+---+---+ \| / \| +---+---+---+---+ . . \| cksum --------\|---' \| cksum \| . . +---+---+---+---+ \| +---+---+---+---+ . . \| padding \| \| \| padding \| . . \| \| \| \| \| . . +---+---+---+---+ \ \| . . . . . . . +---+---+---+---+ \| \| \| erased \| +-' \| commit \| \| \| \| \| / \| \| \| +- rbyd . . \| \| \| \| cksum . . +---+---+---+---+ +-. / \| ecksum -------. \| \| +---+---+---+---+ \| / \| \| cksum ------------' +---+---+---+---+ \| \| padding \| \| \| \| \| +---+---+---+---+ \ \| \| erased \| +-' \| \| / . . . . The second challenge is the pesky possibility of existing valid commits. We need some way to ensure that erased-state following a commit does not accidentally contain a valid old commit. This is where are tag's valid bits come into play: The valid bit of each tag must match the parity of all preceding tags (equivalent to the parity of the crc32c), and we can use some perturb bits in the cksum tag to make sure any tags in our erased-state do _not_ match: .---+---+---+---. \ . . . . . .---+---+---+---. \ \ \ \|v\| tag \| \| \|v\| tag \| \| \| \| +---+---+---+---+ \| +---+---+---+---+ \| \| \| \| commit \| \| \| commit \| \| \| \| \| \| \| \| \| \| \| \| +---+---+---+---+ +-----. +---+---+---+---+ +-. \| \| \|v\|p\| tag \| \| \| \|v\|p\| tag \| \| \| \| \| +---+---+---+---+ / \| +---+---+---+---+ / \| \| \| \| cksum \| \| \| cksum \| \| . . +---+---+---+---+ \| +---+---+---+---+ \| . . \| padding \| \| \| padding \| \| . . \| \| \| \| \| \| . . +---+---+---+---+ . . . \| . . +---+---+---+---+ \| \| \| \|v---------------- != --' \|v------------------' \| \| \| erased \| +---+---+---+---+ \| \| . . \| commit \| \| \| . . \| \| \| \| +---+---+---+---+ +-. +-. \|v\|p\| tag \| \| \| \| \| +---+---+---+---+ / \| / \| \| cksum ----------------' +---+---+---+---+ \| \| padding \| \| \| \| \| +---+---+---+---+ \| \|v---------------- != --' \| erased \| . . . . New problem! The rbyd cksum contains the valid bits, which contain the perturb bits, which depends on the erased-state! And you can't just derive the valid bits from the rbyd's canonical cksum. This avoids erased-state poisoning, sure, but then nothing in the new commit depends on the perturb bits! The catch-22 here is that we need the valid bits to both depend on, and ignore, the erased-state poisoned perturb bits. As far as I can tell, the only way around this is to make the rybd's canonical cksum not include the parity bits. Which is annoying, masking out bits is not great for bulk cksum calculation... But this does solve our problem: .---+---+---+---. \ . . . . . .---+---+---+---. \ \ \ \ \|v\| tag \| \| \|v\| tag \| \| \| o o +---+---+---+---+ \| +---+---+---+---+ \| \| \| \| \| commit \| \| \| commit \| \| \| \| \| \| \| \| \| \| \| \| \| \| +---+---+---+---+ +-----. +---+---+---+---+ +-. \| \| \| \|v\|p\| tag \| \| \| \|v\|p\| tag \| \| \| \| . . +---+---+---+---+ / \| +---+---+---+---+ / \| \| . . \| cksum \| \| \| cksum \| \| . . . +---+---+---+---+ \| +---+---+---+---+ \| . . . \| padding \| \| \| padding \| \| . . . \| \| \| \| \| \| . . . +---+---+---+---+ . . . \| . . +---+---+---+---+ \| \| \| \| \|v---------------- != --' \|v------------------' \| o o \| erased \| +---+---+---+---+ \| \| \| . . \| commit \| \| \| +- rbyd . . \| \| \| \| \| cksum +---+---+---+---+ +-. +-. / \|v\|p\| tag \| \| \| o \| +---+---+---+---+ / \| / \| \| cksum ----------------' +---+---+---+---+ \| \| padding \| \| \| \| \| +---+---+---+---+ \| \|v---------------- != --' \| erased \| . . . . Note that because each commit's cksum derives from the canonical cksum, the valid bits and commit cksums no longer contain the same data, so our parity(m) = parity(crc32c(m)) trick no longer works. However our crc32c still does tell us a bit about each tag's parity, so with a couple well-placed xors we can at least avoid needing two parallel calculations: cksum' = crc32c(cksum, m) valid' = parity(cksum' xor cksum) xor valid This also means our commit cksums don't include any information about the valid bits, since we mask these out before cksum calculation. Which is a bit concerning, but as far as I can tell not a real problem. --- An alternative design would be to just keep track of two cksums: A commit cksum and a canonical cksum. This would be much simpler, but would also require storing two cksums in RAM in our lfsr_rbyd_t struct. A bit annoying for our 4-byte crc32cs, and a bit more than a bit annoying for hypothetical 32-byte sha256s. It's also not entirely clear how you would update both crc32cs efficiently. There is a way to xor out the initial state before each tag, but I think it would still require O(n) cycles of crc32c calculation... As it is, the extra bit needed to keep track of commit parity is easy enough to sneak into some unused sign bits in our lfsr_rbyd_t struct. --- I've also gone ahead and mixed in the current commit parity into our cksum's perturb bits, so the commit cksum at least contains _some_ information about the previous parity. But it's not entirely clear this actually adds anything. Our perturb bits aren't _required_ to reflect the commit parity, so a very unlucky power-loss could in theory still make a cksum valid for the wrong parity. At least this situation will be caught by later valid bits... I've also carved out a tag encoding, LFSR_TAG_PERTURB, solely for adding more perturb bits to commit cksums: LFSR_TAG_CKSUM 0x3cpp v-11 cccc -ppp pppp LFSR_TAG_CKSUM 0x30pp v-11 ---- -ppp pppp LFSR_TAG_PERTURB 0x3100 v-11 ---1 ---- ---- LFSR_TAG_ECKSUM 0x3200 v-11 --1- ---- ---- LFSR_TAG_GCKSUMDELTA+ 0x3300 v-11 --11 ---- ---- + Planned This allows for more than 7 perturb bits, and could even mix in the entire previous commit cksum, if we ever think that is worth the RAM tradeoff. LFSR_TAG_PERTURB also has the advantage that it is validated by the cksum tag's valid bit before being included in the commit cksum, which indirectly includes the current commit parity. We may eventually want to use this instead of the cksum tag's perturb bits for this reason, but right now I'm not sure this tiny bit of extra safety is worth the minimum 5-byte per commit overhead... Note if you want perturb bits that are also included in the rbyd's canonical cksum, you can just use an LFSR_TAG_SHRUBDATA tag. Or any unreferenced shrub tag really. --- All of these changes required a decent amount of code, I think mostly just to keep track of the parity bit. But the isolation of rbyd cksums from erased-state is necessary for several future-planned features: code stack before: 33564 2816 after: 33916 (+1.0%) 2824 (+0.3%)	2024-05-04 17:25:01 -05:00
Christopher Haster	c4fcc78814	Tweaked file types/name tag encoding to be a bit less quirky The intention behind the quirky encoding was to leverage bit 1 to indicate if the underlying file type would be backed by the common file B-tree data structure. Looking forward, there may be several of these types, compressed files, contiguous files, etc, that for all intents and purposes are just normal files interpreted differently. But trying to leverage too many bits like this is probably going to give us a sparse, awkward, and confusing tag encoding, so I've reverted to a hopefully more normal encoding: LFSR_TAG_NAME 0x02tt v--- --1- -ttt tttt LFSR_TAG_NAME 0x0200 v--- --1- ---- ---- LFSR_TAG_REG 0x0201 v--- --1- ---- ---1 LFSR_TAG_DIR 0x0202 v--- --1- ---- --1- LFSR_TAG_SYMLINK* 0x0203 v--- --1- ---- --11 LFSR_TAG_BOOKMARK 0x0204 v--- --1- ---- -1-- LFSR_TAG_ORPHAN 0x0205 v--- --1- ---- -1-1 LFSR_TAG_COMPR* 0x0206 v--- --1- ---- -11- LFSR_TAG_CONTIG* 0x0207 v--- --1- ---- -111 * Hypothetical Note the carve-out for the hypothetical symlink tag. Symlinks are actually incredibly low in the priority list, but they are also the only current hypothetical file type that would need to be exposed to users. Grouping these up makes sense. This will get a bit messy if we ever end up with a 4th user-facing type, but there isn't any in POSIX at least (ignoring non-fs types, socket, fifo, character, block, etc). The gap also helps line things up so reg/orphan are a single bit flip, and the non-user facing types all share a bit. This had no impact on code size: code stack before: 33564 2816 after: 33564 (+0.0%) 2816 (+0.0%)	2024-05-04 17:24:48 -05:00
Christopher Haster	6e5d314c20	Tweaked struct tag encoding so b/m tags are earlier These b/m struct tags have a common pattern that would be good to emphasize in the encoding. The later struct tags get a bit more messy as they leave space for future possible extensions. New encoding: LFSR_TAG_STRUCT 0x03tt v--- --11 -ttt ttrr LFSR_TAG_DATA 0x0300 v--- --11 ---- ---- LFSR_TAG_BLOCK 0x0304 v--- --11 ---- -1rr LFSR_TAG_BSHRUB 0x0308 v--- --11 ---- 1--- LFSR_TAG_BTREE 0x030c v--- --11 ---- 11rr LFSR_TAG_MROOT 0x0310 v--- --11 ---1 --rr LFSR_TAG_MDIR 0x0314 v--- --11 ---1 -1rr LFSR_TAG_MSHRUB* 0x0318 v--- --11 ---1 1--- LFSR_TAG_MTREE 0x031c v--- --11 ---1 11rr LFSR_TAG_DID 0x0320 v--- --11 --1- ---- LFSR_TAG_BRANCH 0x032c v--- --11 --1- 11rr * Hypothetical Note that all shrubs currently end with 1---, and all btrees, including the awkward branch tag, end with 11rr. This had no impact on code size: code stack before: 33564 2816 after: 33564 (+0.0%) 2816 (+0.0%)	2024-05-04 17:24:33 -05:00
Christopher Haster	5fa85583cd	Dropped block-level erased-state checksums for RAM-tracked erased-state Unfortunately block-level erased-state checksums (becksums) don't really work as intended. An invalid becksum _does_ signal that a prog has been attempted, but a valid becksum does _not_ prove that a prog has _not_ been attempted. Rbyd ecksums work, but only thanks to a combination of prioritizing valid commits and the use of perturb bits to force erased-state changes. It _is_ possible to end up with an ecksum collision, but only if you 1. lose power before completing a commit, and 2. end up with a non-trivial crc32c collision. If this does happen, at the very least the resulting commit will likely end up corrupted and thrown away later. Block-level becksums, at least as originally designed, don't have either of these protections. To make matters worse, the blocks these becksums reference contain only raw user data. Write 0xffs into a file and you will likely end up with a becksum collision! This is a problem for a couple of reasons: 1. Progging multiple times to erased-state is likely to result in corrupted data, though this is also likely to get caught with validating writes. Worst case, the resulting data looks valid, but with weakened data retention. 2. Because becksums are stored in the copy-on-write metadata of the file, attempting to open a file twice for writing (or more advanced copy-on-write operations in the future) can lead to a situation where a prog is attempted on _already committed_ data. This is very bad and breaks copy-on-write guarantees. --- So clearly becksums are not fit for purpose and should be dropped. What can we replace them with? The first option, implemented here, is RAM-tracked erased state. Give each lfsr_file_t its own eblock/eoff fields to track the last known good erased-state. And before each prog, clear eblock/eoff so we never accidentally prog to the same erased-state twice. It's interesting to note we don't currently clear eblock/eoff in all file handles, this is ok only because we don't currently share eblock/eoff across file handles. Each eblock/eoff is exclusive to the lfsr_file_t and does not appear anywhere else in the system. The main downside of this approach is that, well, the RAM-tracked erase-state is only tracked in RAM. Block-level erased-state effectively does not persist across reboots. I've considered adding some sort of per-file erased-state tracking to the mdir that would need to be cleared before use, but such a mechanism ends up quite complicated. At the moment, I think the best second option is to put erased-state tracking in the future-planned bmap. This would let you opt-in to on-disk tracking of all erased-state in the system. One nice thing about RAM-tracked erased-state is that it's not on disk, so it's not really a compatibility concern and won't get in the way of additional future erased-state tracking. --- Benchmarking becksums vs RAM-tracking has been quite interesting. While in theory becksums can track much more erased-state, it's quite unlikely anything but the most recent erased-state actually ends up used. The end result is no real measurable performance loss, and actually a minor speedup because we don't need to calculate becksums on every block write. There are some pathological cases, such as multiple write heads, but these are out-of-scope right now (note! multiple explicit file handles currently handle this case beautifully because we don't share eblock/eoff!) Becksums were also relatively complicated, and needed extra scaffolding to pass around/propagate as secondary tags alongside the primary bptr. So trading these for RAM-tracking also gives us a nice bit of code/stack savings, albeit at a 2-word RAM cost in lfsr_file_t: code stack structs before: 33888 2864 1096 after: 33564 (-1.0%) 2816 (-1.7%) 1104 (+0.7%) lfsr_file_t before: 104 lfsr_file_t after: 112 (+7.7%)	2024-05-04 17:22:56 -05:00
Christopher Haster	81ccfbccd0	Dropped -x/--device from dbg*.py scripts This hasn't really proven useful. At one point showing the cksums in dbgrbyd.py was useful, but this is now possible and easier with dbgblock.py -x/--cksum.	2024-04-28 13:21:46 -05:00
Christopher Haster	86a8582445	Tweaked canonical altn to point to itself By definition, altns should never be followed, so it doesn't really matter where they point. But it's not like they can point literally nowhere, so where should they point? A couple options: 1. jump=jump - Wherever the old alt pointed - Easy, literally a noop - Unsafe, bugs could reveal outdated parts of the tree - Encoding size eh 2. jump=0 - Point to offset=0 - Easier, +0 code - Safer, branching to 0 should assert - Worst possible encoding size 3. jump=itself - Point to itself - A bit tricky, +4 code - Safe, should assert, even without asserts worst case infinite loop - Optimal encoding size An infinite loop isn't the best failure state, but we can catch this with an assert, which we would need for jump=0 anyways. And this is only a concern if there are other fs bugs. jump=0 is actually slightly worse if asserts are disabled, since we'd end up reading the revision count as garbage. Adopting jump=itself gives us the optimal 4-byte encoding: altbn w0 = 40 00 00 00 '-+-' ^ ^ '----\|--\|-- tag = altbn '--\|-- weight = 0 '-- jump = itself (branch - 0) This requires tweaking the alt encoder a bit, to avoid relative encoding jump=0s, but this is pretty cheap: code stack jump=jump: 34068 2864 jump=0: 34068 (+0.0%) 2864 (+0.0%) jump=itself: 34072 (+0.0%) 2864 (+0.0%) I thought we may need to also tweak the decoder, so later trunk copies don't accidentally point to the old location, but humorously our pruning kicks in redundantly to reset altbn's jump=itself on every trunk. Note lfsr_rbyd_lookupnext was also rearranged a bit to make it easier to assert on infinite loops and this also added some code. Probably just due to compiler noise: code stack before: 34068 2864 after: 34076 (+0.0%) 2864 (+0.0%) Also note that we still accept all of the above altbn encoding options. This only affects encoding and dbg scripts.	2024-04-28 13:21:46 -05:00
Christopher Haster	faf8c4b641	Tweaked alt-tag encoding to match color/dir naming order This is mainly to avoid mistakes caused by names/encodings disagreeing: LFSR_TAG_ALT 0x4kkk v1cd kkkk -kkk kkkk ^ ^^ '------+-----' '-\|\|--------\|------- valid bit '\|--------\|------- color '--------\|------- dir '------- key Notably, the LFSR_TAG_ALT() macro has already caused issues by being both 1. ambiguous, and 2. not really type-checkable. It's easy to get the order wrong and things not really break, just behave poorly, it's really not great! To be honest the exact order is a bit arbitrary, the color->dir naming appeared by accident because I guess it felt more natural. Maybe because of English's weird implicit adjective ordering? Maybe because of how often conditions show up as the last part of the name in other instruction sets? At least one plus is that this moves the dir-bit next to the key. This makes it so all of the condition information is encoding is the lowest 13-bits of the tag, which may lead to minor optimization tricks for implementing flips and such. Code changes: code stack before: 34080 2864 after: 34068 (-0.0%) 2864 (+0.0%)	2024-04-28 13:21:41 -05:00
Christopher Haster	37c45e1afc	Fixed coloring conflicts in rbyd tree renderers A bit of a hack, but rather than handling conditional alt branches, our dbg rbyd tree renderers just represent single-pointer alts as an alt with both branches pointing to the place. Unfortunately, the two branches technically have different colors. This resulted in a bit of contention when chosing how to color the tree. Basically Python's dict ordering would determine which color won. Which was a bit confusing when dbgrbyd.py displayed different tree colorings for the same rbyd. dbgrbyd.py should be idempotent! This is solved by adding another hack to check explicitly for same-destination branches.	2024-04-09 20:04:14 -05:00
Christopher Haster	8a646d5b8e	Added dbgtag.py for easy tag decoding on the command-line Example: $ ./scripts/dbgtag.py 0x3001 cksum 0x01 dbgtag.py inherits most of crc32c.py's decoding options. The most useful probably being -x/--hex: $ ./scripts/dbgtag.py -x e1 00 01 8a 09 altbgt 0x100 w1 -1162 dbgtag.py also supports reading from a block device if either -b/--block-size or --off are provided. This is mainly for consistency with the other dbg*.py scripts: $ ./scripts/dbgtag.py disk -b4096 0x2.1e4 bookmark w1 1 This should help when debugging and finding a raw tag/alt in some register. Manually decoding is just an unnecessary road bump when this happens.	2024-04-01 16:29:13 -05:00
Christopher Haster	54a03cfe3b	Enabled both pruning/non-pruning dbg reprs, -t/--tree and -R/--rbyd Now that altns/altas are more important structurally, including them in our dbg script's tree renderers is valuable for debugging. On the other hand, they do add quite a bit of visual noise when looking at large multi-rbyd trees topologically. This commit gives us the best of both worlds by making both tree renderings available under different options: -t/--tree, a simplified rbyd tree renderer with altn/alta pruning: .-> 0 reg w1 4 .-+-> uattr 0x01 2 \| .-> uattr 0x02 2 .---+-+-> uattr 0x03 2 \| .-> uattr 0x04 2 \| .-+-> uattr 0x05 2 \| .-+---> uattr 0x06 2 +-+-+-+-+-> 1 reg w1 4 \| \| '-> 2 reg w1 4 \| '---> uattr 0x01 2 '---+-+-+-> uattr 0x02 2 \| \| '-> uattr 0x03 2 \| '-+-> uattr 0x04 2 \| '-> uattr 0x05 2 \| .-> uattr 0x06 2 \| .-+-> uattr 0x07 2 \| \| .-> uattr 0x08 2 '-+-+-> uattr 0x09 2 -R/--rbyd, a full rbyd tree renderer: .---> 0 reg w1 4 .---+-+-> uattr 0x01 2 \| .---> uattr 0x02 2 .-+-+-+-+-> uattr 0x03 2 \| .---> uattr 0x04 2 \| .-+-+-> uattr 0x05 2 \| .-+---+-> uattr 0x06 2 +---+-+-+-+-+-> 1 reg w1 4 \| \| '-> 2 reg w1 4 \| '-----> uattr 0x01 2 '-+-+-+-+-+-+-> uattr 0x02 2 \| \| '---> uattr 0x03 2 \| '---+-+-> uattr 0x04 2 \| '---> uattr 0x05 2 \| .---> uattr 0x06 2 \| .-+-+-> uattr 0x07 2 \| \| .-> uattr 0x08 2 '-----+---+-> uattr 0x09 2 And of course -B/--btree, a simplified B-tree renderer (more useful for multi-rbyds): +-> 0 reg w1 4 \| uattr 0x01 2 \| uattr 0x02 2 \| uattr 0x03 2 \| uattr 0x04 2 \| uattr 0x05 2 \| uattr 0x06 2 \|-> 1 reg w1 4 '-> 2 reg w1 4 uattr 0x01 2 uattr 0x02 2 uattr 0x03 2 uattr 0x04 2 uattr 0x05 2 uattr 0x06 2 uattr 0x07 2 uattr 0x08 2 uattr 0x09 2	2024-04-01 16:23:31 -05:00
Christopher Haster	abe68c0844	rbyd-rr: Reworking rbyd range removal to try to preserve rby structure This is the start of (yet another) rework of rybd range removals, this time in an effort to preserve the rby structure that maps to a balanced 2-3-4 tree. Specifically, the property that all search paths have the same number of black edges (2-3-4 nodes). This is currently incomplete, as you can probably tell from the mess, but this commit at least gets a working altn/alta encoding in place necessary for representing empty 2-3-4 nodes. More on that below. --- First the problem: My assumption, when implementing the previous range removal algorithms, was that we only needed to maintain the existing height of the tree. The existing rbyd operations limit the height to strictly log n. And while we can't _reduce_ the height to maintain perfect balance, we can at least avoid _increasing_ the height, which means the resulting tree should have a height <= log n. Since our rbyds are bounded by the block_size b, this means worst case our rbyd can never exceed a height <= log b, right? Well, not quite. This is true the instance after the remove operation. But there is an implicit assumption that future rbyd operations will still be able to maintain height <= log n after the remove operation. This turns out to not be true. The problem is that our rbyd appends only maintain height <= log n if our rby structure is preserved. If the rby structure is broken, rbyd append assumes an rby structure that doesn't exist, which can lead to an increasingly unbalanced tree. Consider this happily balanced tree: .-------o-------. .--------o .---o---. .---o---. .---o---. \| .-o-. .-o-. .-o-. .-o-. .-o-. .-o-. \| .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. \| a b c d e f g h i j k l m n o p => a b c d e f g h i '------+------' remove After a range removal it looks pretty bad, but note the height is still <= log n (old n not the new n). We are still <= log b. But note what happens if we start to insert attrs into the short half of the tree: .--------o .---o---. \| .-o-. .-o-. \| .o. .o. .o. .o. \| a b c d e f g h i .-----o .--------o .-+-r .---o---. \| \| \| \| .-o-. .-o-. \| \| \| \| .o. .o. .o. .o. \| \| \| \| a b c d e f g h i j'k'l' .-------------o .---o .---+-----r .--------o .-o .-o .-o .-+-r .---o---. \| \| \| \| \| \| \| \| \| \| .-o-. .-o-. \| \| \| \| \| \| \| \| \| \| .o. .o. .o. .o. \| \| \| \| \| \| \| \| \| \| a b c d e f g h i j'k'l'm'n'o'p'q'r' Our right side is generating a perfectly balanced tree as expected, but the left side is suddenly twice as far from the root! height(r')=3, height(a)=6! The problem is when we append l', we don't really know how tall the tree is. We only know l' has one black edge, which assuming rby structure is preserved, means all other attrs must have one black edge, so creating a new root is justified. In reality this just makes the tree grow increasingly unbalanced, increasing the height of the tree by worst case log n every range removal. --- It's interesting to note this was discovered while debugging test_fwrite_overwrite, specifically: test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10 It turns out the append fragments -> delete fragments -> append/carve block + becksum loop contains the perfect sequence of attrs necessary to turn this tree inbalance into a linked-list! .-> 0 data w1 1 .-b-> 1 data w1 1 \| .-> 2 data w1 1 .-b-b-> 3 data w1 1 \| .-> 4 data w1 1 \| .-b-> 5 data w1 1 \| \| .-> 6 data w1 1 .---b-b-b-> 7 data w1 1 \| .-> 8 data w1 1 \| .-b-> 9 data w1 1 \| \| .-> 10 data w1 1 \| .-b-b-> 11 data w1 1 \| .-b-----> 12 data w1 1 .-y-y-------> 13 data w1 1 \| .-> 14 data w1 1 .-y---------y-> 15 data w1 1 \| .-> 16 data w1 1 .-y-----------y-> 17 data w1 1 \| .-> 18 data w1 1 .-y-------------y-> 19 data w1 1 \| .-> 20 data w1 1 .-y---------------y-> 21 data w1 1 \| .-> 22 data w1 1 .-y-----------------y-> 23 data w1 1 \| .-> 24 data w1 1 .-y-------------------y-> 25 data w1 1 \| .---> 26 data w1 1 \| \| .-> 27-2047 block w2021 10 b-------------------r-b-> becksum 5 Note, to reproduce this you need to step through with a breakpoint on lfsr_bshrub_commit. This only shows up in the file's intermediary btree, which at the time of writing ends up at block 0xb8: $ ./scripts/test.py \ test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10 \ -ddisk --gdb -f $ ./scripts/watch.py -Kdisk -b \ ./scripts/dbgrbyd.py -b4096 disk 0xb8 -t (then b lfsr_bshrub_commit and continue a bunch) --- So, we need to preserve the rby structure. Note pruning red/yellow alts is not an issue. These aren't black, so we aren't changing the number of black edges in the tree. We've just effectively reduced a 3/4 node into a 2/3 node: .-> a .---b-> b .-> a <- 2 black \| .---> c .-b-> b \| \| .-> d \| .-> c b-r-b-> e <- rm => b-b-> d <- 2 black The tricky bit is pruning black alts. Naively this changes the number of black edges/2-3-4 nodes in the tree, which is bad: .-> a .-b-> b .-> a <- 2 black \| .-> c .-b-> b b-b-> d <- rm => b---> c <- 1 black It's tempting to just make the alt red at this point, effectively merging the sibling 2-3-4 node. This maintains balance in the subtree, but still removes a black edge, causing problems for our parent: .-> a .-b-> b .-> a <- 3 black \| .-> c .-b-> b .-b-b-> d \| .-> c \| .-> e .-b-b-> d \| .-b-> f \| .---> e \| \| .-> g \| \| .-> f b-b-b-> h <- rm => b-r-b-> g <- 2 black In theory you could propagate this all the way up to the root, and this _would_ probably give you a perfect self-balancing range removal algorithm... but it's recursive... and littlefs can't be recursive... .-> s .-b-> t .-> s \| .-> u .-----b-> t .-b-b-> v \| .-> u \| .-> w \| .---b-> v \| .-b-> x \| \| .---> w \| \| \| \| .-> y \| \| \| \| \| \| \| .-> x b-b- ... b-b-b-> z <- rm => r-b-r-b- ... r-b-r-b-> y So instead, an alternative solution. What if we allowed black alts that point nowhere? A sort of noop 2-3-4 node that serves only to maintain the rby structure? .-> a .-b-> b .-> a <- 2 black \| .-> c .-b-> b b-b-> d <- rm => b-b-> c <- 2 black I guess that would technically make this 1-2-3-4 tree. This does add extra overhead for writing noop alts, which are otherwise useless, but it seems to solve most of our problems: 1. does not increase the height of the tree, 2. maintains the rby structure, 3. tail-recursive. And, thanks to the preserved rby structure, we can say that in the worst case our rbyds will never exceed height <= log b again, even with range removals. If we apply this strategy to our original example, you can see how the preserved rby structure sort of "absorbs" new red alts, preventing further unbalancing: .-------o-------. .--------o .---o---. .---o---. .---o---. o .-o-. .-o-. .-o-. .-o-. .-o-. .-o-. o .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. .o. o a b c d e f g h i j k l m n o p => a b c d e f g h i '------+------' remove Reinserting: .--------o .---o---. o .-o-. .-o-. o .o. .o. .o. .o. o a b c d e f g h i .----------------o .---o---. o .-o-. .-o-. .------o .o. .o. .o. .o. .o. .-+-r a b c d e f g h i j'k'l'm' .----------------------------o .---o---. .-------------o .-o-. .-o-. .---o .---+-----r .o. .o. .o. .o. .-o .-o .-o .-o .-+-r a b c d e f g h i j'k'l'm'n'o'p'q'r's' Much better! --- This commit makes some big steps towards this solution, mainly codifying a now-special alt-never/alt-always (altn/alta) encoding to represent these noop 1 nodes. Technically, since null (0) tags are not allowed, these already exist as altle 0/altgt 0 and don't need any extra carve-out encoding-wise: LFSR_TAG_ALT 0x4kkk v1dc kkkk -kkk kkkk LFSR_TAG_ALTN 0x4000 v10c 0000 -000 0000 LFSR_TAG_ALTA 0x6000 v11c 0000 -000 0000 We actually already used altas to terminate unreachable tags during range removals, but this behavior was implicit. Now, altns have very special treatment as a part of determining bounds during appendattr (both unreachable gt/le alts are represented as altns). For this reason I think the new names are warranted. I've also added these encodings to the dbg*.py scripts for, well, debuggability, and added a special case to dbgrby.py -j to avoid unnecessary altn jump noise. As a part of debugging, I've also extended dbgrbyd.py's tree renderer to show trivial prunable alts. Unsure about keeping this. On one hand it's useful to visualize the exact alt structure, on the other hand it likely adds quite a bit of noise to the more complex dbg scripts. The current state of things is a mess, but at least tests are passing! Though we aren't actually reclaiming any altns yet... We're definitely _not_ preserving the rby structure at the moment, and if you look at the output from the tests, the resulting tree structure is hilarious bad. But at least the path forward is clear.	2024-04-01 16:23:14 -05:00
Christopher Haster	62de865103	Eliminated null tag reachability in dbg scripts This was throwing off tree rendering in dbglfs.py, we attempt to lookup the null tag because we just want to first tag in the tree to stitch things together. Null tag reachability is tricky! You only notice if the tree happens to create a hole, which isn't that common. I think all lookup implementations should have this max(tag, 1) pattern from now on to avoid this. Note that most dbg scripts wouldn't run into this because we usually use the traversal tag+1 pattern. Still, the inconsistency in impl between the dbg scripts and lfs.c is bad.	2024-03-20 13:31:16 -05:00
Christopher Haster	9366674416	Replaced separate BLOCKSIZE/BLOCKCOUNT attrs with single GEOMETRY attr This saves a bit of rbyd overhead, since these almost always come together. Perhaps more interesting, it carves out space for storing mroot-anchor redundancy information. This uses the lowest two bits of the GEOMETRY tag to indicate how many redundant blocks belong to the mroot-anchor: LFSR_TAG_GEOMETRY 0x0008 v--- ---- ---- 1-rr This solves a bit of a hole in our redundancy encoding. The plan is for this info to be stored in the lowest two bits of every pointer, but the mroot-anchor doesn't really have a pointer. Though this is just future plans. Right now the redundancy information is unused. Current implementations should use the GEOMETRY tag 0x0009, which you may notice implied redundancy level-1. This matches our current 2-block per mdir default. Geometry attr encoding: .---+---+---+---. tag (0x0008+r): 1 be16 2 bytes \|x0008+r\| 0 \|siz\| weight (0): 1 leb128 1 byte +---+---+---+---+ size: 1 leb128 1 byte \| block_size \| block_size: 1 leb128 <=4 bytes +---+- -+- -+- -+- -. \| block_count \| block_count: 1 leb128 <=5 bytes '---+- -+- -+- -+- -' total: <=13 bytes Code changes: code stack before: 34092 2880 after: 34040 (-0.2%) 2880 (+0.0%)	2024-03-19 15:02:02 -05:00
Christopher Haster	130281ac05	Reworked compat flags a bit Now with a bit more granularity for possibly-future-optional on-disk data structures: LFSR_RCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) LFSR_RCOMPAT_MLEAF 0x0002 ---- ---- ---- --1- LFSR_RCOMPAT_MSHRUB 0x0004 ---- ---- ---- -1-- (reserved) LFSR_RCOMPAT_MTREE 0x0008 ---- ---- ---- 1--- LFSR_RCOMPAT_BSPROUT 0x0010 ---- ---- ---1 ---- LFSR_RCOMPAT_BLEAF 0x0020 ---- ---- --1- ---- LFSR_RCOMPAT_BSHRUB 0x0040 ---- ---- -1-- ---- LFSR_RCOMPAT_BTREE 0x0080 ---- ---- 1--- ---- LFSR_RCOMPAT_GRM 0x0100 ---- ---1 ---- ---- LFSR_WCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) LFSR_OCOMPAT_NONSTANDARD 0x0001 ---- ---- ---- ---1 (reserved) This adds a couple reserved flags: - LFSR_COMPAT_NONSTANDARD - This flag will never be set by a standard version of littlefs. The idea is to allow implementations with non-standard extensions a way to signal potential compatibility issues without worrying about future compat flag conflicts. This is limited to a single bit, but hey, it's not like it's possible to predict all future extensions. If a non-standard extension needs more granularity, reservations of standard compat flags can always be requested, even if they don't end up implemented in standard littlefs. (Though such reservations will need a strong motivation, it's not like these flags are free). - LFSR_RCOMPAT_MSHRUB - In theory littlefs supports a shrubbed mtree, where the root is inlined into the mroot. But in practice this turned out to be more complicated than it was worth. Still, a future implementation may find an mshrub useful, so preserving a compat flag for such a case makes sense. That being said, I have no plans to add support for mshrubs even in the dbg scripts. I would like the expected feature-set for debug tools to be well-defined, but also conservative. This gets a bit tricky with theoretical features like the mshrubs, but until mshrubs are actually implemented in littlefs, I would like to consider them non-standard. The implication of this is that, while LFSR_RCOMPAT_MSHRUB is currently "reserved", it may be repurposed for some other meaning in the future. These changes also rename COMPATFLAGS -> COMPAT, and reorder the tags by decreasing importance. This ordering seems more valuable than the original intention of making rcompat/wcompat a single bit flip. Implementation-wise, it's interesting to note the internal-only LFSR_COMPAT_OVERFLOW flag. This gets set when out-of-range bits are set on-disk, and allows us to detect unrepresentable compat flags without too much extra complexity. The extra encoding/decoding overhead does add a bit of cost though: code stack before: 33944 2880 after: 34124 (+0.5%) 2880 (+0.0%)	2024-03-16 17:26:04 -05:00
Christopher Haster	5128522fe2	Renamed script flag -Z/--depth -> -z/--depth Previously, the intention of upper case -Z was the match -W/--width and -H/--height, which are uppercase to avoid conflicts with -h/--help. But -z/--depth isn't _really_ related to -W/-H. This avoids a conflict with -Z/--lebesgue, but may conflict with -z/--cat. Fortunately we don't currently have any conflicts with the latter. Since -z/--depth and -Z/--lebesgue are both disk-layout related, the risk of conflicts are probably much higher there.	2024-02-14 14:04:45 -06:00
Christopher Haster	2d2c0f19ff	Renamed block-size flag in scripts from -B -> b So now these should be invoked like so: $ ./scripts/dbglfs.py -b4096x256 disk The motivation for this change is to better match other filesystem tooling. Some prior art: - mkfs.btrfs - -n/--nodesize => node size in bytes, power of 2 >= sector - -s/--sectorsize => sector size in bytes, power of 2 - zfs create - -b => block size in bytes - mkfs.xfs - -b => block size in bytes, power of 2 >= sector - -s => sector size in bytes, power of 2 >= 512 - mkfs.ext[234] - -b => block size in bytes, power of 2 >= 1024 - mkfs.ntfs - -c/--cluster-size => cluster size in bytes, power of 2 >= sector - -s/--sector-size => sector size in bytes, power of 2 >= 256 - mkfs.fat - -s => cluster size in sectors, power of 2 - -S => sector size in bytes, power of 2 >= 512 Why care so much about the flag naming for internal scripts? The intention is for external tooling to eventually use the same set of flags. And maybe even create publically consumable versions of the dbg scripts. It's important that if/when this happens flags stay consistent. Everyone familiar with the ssh -p/scp -P situation knows how annoying this can be. It's especially important for littlefs's -b/--block-size flag, since this will likely end up used everywhere. Unlike other filesystems, littlefs can't mount without knowing the block-size, so any tool that mounts littlefs is going to need the -b/--block-size flag. --- The original motivation for -B was to avoid conflicts with the -b/--by flag that was already in use in all of the measurement scripts. But these are internal, and not really littlefs-related, so I don't think that's a good reason any more. Worst case we can just make the --by flag -B, or just not have a short form (--by is only 4 letters after all). Somehow we ended up with no scripts needing both -b/--block-size and -b/--by so far. Some other conflicts/inconsistencies tweaks were needed, here are all the flag changes: - -B/--block-size -> -b/--block-size - -M/--mleaf-weight -> -m/--mleaf-weight - -b/--btree -> -B/--btree - -C/--block-cycles -> -c/--block-cycles (in tracebd.py) - -c/--coalesce -> -S/--coalesce (in tracebd.py) - -m/--mdirs -> -M/--mdirs (in dbgbmap.py) - -b/--btrees -> -B/--btrees (in dbgbmap.py) - -d/--datas -> -D/--datas (in dbgbmap.py)	2024-02-14 12:45:30 -06:00
Christopher Haster	bea13dcf8e	Use sign bit of rbyd.trunk to indicate shrubness of rbyds Shrubness should have always been a property of lfsr_rbyd_t. You know you've made a good design decision when things just sort of fall into place and the code somehow becomes cleaner. The downside of this change is accessing rbyd trunks requires a mask, which is annoying, but the upside is we don't need to signal shrubness via extra booleans in internal functions anymore. The funny thing is, the actual motivation for this change is was just to free up a bit in our tag encoding. Simplifying some of the internal functions was just a nice side effect. code stack before: 33940 2928 after: 33928 (-0.0%) 2912 (-0.5%)	2024-02-03 18:16:45 -06:00
Christopher Haster	15593ccc49	Renamed scratch files -> orphan files I was originally avoiding naming these orphans, as they're _technically_ not orphans. They do exist in the mtree. But the name orphan just describes this types purpose too well. This does lead to some confusing terms, such as the fact that orphan files can be non-orphaned if there are any in-device references. But I think this makes sense? - LFSR_TAG_SCRATCH -> LFSR_TAG_ORPHAN - LFSR_F_UNCREAT -> LFSR_F_ORPHAN - test_fscratch.toml -> test_forphan.toml	2024-02-03 18:15:38 -06:00
Christopher Haster	ba505c2a37	Implemented scratch file basics "Scratch files" are a new file type added to solve the zero-sized file problem. Though they have a few other uses that may be quite valuable. The "zero-sized file problem" is a common surprise for users, where what seems like a simple file create+write operation: lfs_file_open(&lfs, &file, "hi", LFS_O_WRONLY \| LFS_O_CREAT \| LFS_O_EXCL); lfs_file_write(&lfs, &file, "hello!", strlen("hello!")); lfs_file_close(&lfs, &file); Can end up create a zero-sized file under powerloss, breaking user assumptions and their code. The tricky thing is that this is actually correct behavior as defined by POSIX. `open` with O_CREAT creats a file entry immediately, which is initially zero-sized. And the fact that power can be lost between `open` and `close` isn't really avoidable. But this is a common enough footgun that it's probably worth deviating from POSIX here. But how to avoid zero-sized files exactly? First thought: Delay the file creation until sync/close, tracking uncreated files in-device until then. This solves the problem and avoids any intermediary state if we lose power, but came with a number of headaches: 1. Since we delay file creation, we don't immediately write the filename to disk on open. This implies we need to keep the filename allocated in RAM until the first sync/close call. The requirement to keep the filename allocated for new files until first sync/close could be added to open, and with the option to call sync immediately to save the filename (and accept the risk of zero-sized files), I don't think it would be _that_ bad of an API. But it would still be pretty bad. Extra bad because 1. there's no way to warn on misuse at compile-time, 2. use-after-free bugs have a tendency to go unnoticed annoyingly often, 3. it's a regression from the previous API, and 4. who the heck reads the more-or-less same `open` documentation for every filesystem they adopt. 2. Without an allocated mid, tracking files internally gets a lot harder. The best option I could think of was to keep the opened-file linked-list sorted by mid + (in-device) file name. This did not feel like a great solutiona and was going to add more code cost. 3. Handling mdir splits containing uncreated files adds another headache. Complicated lfsr_mdir_estimate further as it needs to decide in which mdir the uncreated files will end up, and potentially split on a filename that isn't even created yet. 4. Since the number of uncreated files can be potentially unbounded, you can't prevent an mdir from filling up with only uncreated files. On disk this ends up looking like an "empty" mdir, which need specially handling in littlefs to reclaim after powerloss. Support for empty mdirs -- the orphaned mdir scan -- was already added earlier. We already scan each mdir to build gstate, so it doesn't really add much cost. Notice that last bullet point? We already scan each mdir during mount. Why not, instead of scanning for orphaned mdirs, scan for orphaned files? So this leads to the idea of "scratch files". Instead of actually delaying file creation, fake it. Create a scratch file during open, and on the first sync/close, convert it to a regular file. If we lose power, scan for scratch files during mount, and remove them on first write. Some tradeoffs: 1. The orphan scan for scratch files is a bit more expensive than for mdirs on storage with large block sizes. We need to look at each file entry vs just each mdir, which pushed the runtime up to O(BlogB) vs O(B). Though if you also consider large mtrees, the worst case is still O(nlogn). 2. Creating intermediate scratch files adds another commit to file creation. This is probably not a big issue for flash, but may be more of a concern on devices with large prog sizes. 3. Scratch files complicate unrelated mkdir/rename/etc code a bit, since we need to consider what happens when the dest is a scratch file. But the end result is simple. And simple is good. Both for implementation headaches, and code size. Even if the on-disk state is conceptually more complicated. You may have noticed these scratch files are basically isomorphic to just setting an "uncreated" flag on the file, and that's true. There may have been a simpler route to end up with the design, but hey, as long as it works. As a plus, scratch files present a solution for a couple other things: 1. Removing an open file can become a scratch file until closed. 2. Scratch files can be used as temporary files. Open a file with O_DESYNC and never call sync and you have yourself a temporary file. Maybe in the future we should add O_TMPFILE to avoid the need for unique filenames, but that is low priority.	2024-02-03 18:15:29 -06:00
Christopher Haster	f29a4982c4	Added block-level erased-state checksums Much like the erased-state checksums in our rbyds (ecksums), these block-level erased-state checksums (becksums) allow us to detect failed progs to erased parts of a block and are key to achieving efficient incremental write performance with large blocks and frequent power cycles/open-close cycles. These are also key to achieving _reasonable_ write performance for simple writes (linear, non-overwriting), since littlefs now relies solely on becksums to efficiently append to blocks. Though I suppose the previous block staging logic used with the CTZ skip-list could be brought back to make becksums optional and avoid btree lookups during simple writes (we do a _lot_ of btree lookups)... I'll leave this open as a future optimization... Unlike in-rbyd ecksums, becksums need to be stored out-of-band so our data blocks only contain raw data. Since they are optional, an additional tag in the file's btree makes sense. Becksums are relatively simple, but they bring some challenges: 1. Adding becksums to file btrees is the first case we have for multiple struct tags per btree id. This isn't too complicated a problem, but requires some new internal btree APIs. Looking forward, which I probably shouldn't be doing this often, multiple struct tags will also be useful for parity and content ids as a part of data redundancy and data deduplication, though I think it's uncontroversial to consider this both heavier-weight features... 2. Becksums only work if unfilled blocks are aligned to the prog_size. This is the whole point of crystal_size -- to provide temporary storage for unaligned writes -- but actually aligning the block during writes turns out to be a bit tricky without a bunch of unecesssary btree lookups (we already do too many btree lookups!). The current implementation here discards the pcache to force alignment, taking advantage of the requirement that cache_size >= prog_size, but this is corrupting our block checksums. Code cost: code stack before: 31248 2792 after: 32060 (+2.5%) 2864 (+2.5%) Also lfsr_ftree_flush needs work. I'm usually open to gotos in C when they improve internal logic, but even for me, the multiple goto jumps from every left-neighbor lookup into the block writing loop is a bit much...	2023-12-14 01:05:34 -06:00
Christopher Haster	6ccd9eb598	Adopted different strategy for hypothetical future configs Instead of writing every possible config that has the potential to be useful in the future, stick to just writing the configs that we know are useful, and error if we see any configs we don't understand. This prevents unnecessary config bloat, while still allowing configs to be introduced in a backwards compatible way in the future. Currently unknown configs are treated as a mount error, but in theory you could still try to read the filesystem, just with potentially corrupted data. Maybe this could be behind some sort of "FORCE" mount flag. littlefs must never write to the filesystem if it finds unknown configs. --- This also creates a curious case for the hole in our tag encoding previously taken up by the OCOMPATFLAGS config. We can query for any config > SIZELIMIT with lookupnext, but the OCOMPATFLAGS flag would need an extra lookup which just isn't worth it. Instead I'm just adding OCOMPATFLAGS back in. To support OCOMPATFLAGS littlefs has to do literally nothing, so this is really more of a documentation change. And who know, maybe OCOMPATFLAGS will have some weird use case in the future...	2023-12-08 14:03:56 -06:00
Christopher Haster	337bdf61ae	Rearranged tag encodings to make space for BECKSUM, ORPHAN, etc Also: - Renamed GSTATE -> GDELTA for gdelta tags. GSTATE tags added as separate in-device flags. The GSTATE tags were already serving this dual purpose. - Renamed BSHRUB* -> SHRUB when the tag is not necessarily operating on a file bshrub. - Renamed TRUNK -> BSHRUB The tag encoding space now has a couple funky holes: - 0x0005 - Hole for aligning config tags. I guess this could be used for OCOMPATFLAGS in the future? - 0x0203 - Hole so that ORPHAN can be a 1-bit difference from REG. This could be after BOOKMARK, but having a bit to differentiate littlefs specific file types (BOOKMARK, ORPHAN) from normal file types (REG, DIR) is nice. I guess this could be used for SYMLINK if we ever want symlinks in the future? - 0x0314-0x0318 - Hole so that the mdir related tags (MROOT, MDIR, MTREE) are nicely aligned. This is probably a good place for file-related tags to go in the future (BECKSUM, CID, COMPR), but we only have two slots, so will probably run out pretty quickly. - 0x3028 - Hole so that all btree related tags (BTREE, BRANCH, MTREE) share a common lower bit-pattern. I guess this could be used for MSHRUB if we ever want mshrubs in the future?	2023-12-08 13:28:47 -06:00
Christopher Haster	04c6b5a067	Added grm rcompat flag, dropped ocompat, tweaked compat flags a bit I'm just not seeing a use case for optional compat flags (ocompat), so dropping for now. It seems their *nix equivalent, feature_compat, is used to inform fsck of things, but this doesn't really make since in littlefs since there is no fsck. Or from a different perspective, littlefs is always running fsck. Ocompat flags can always be added later (since they do nothing). Unfortunately this really ruins the alignment of the tag encoding. For whatever reason config limits tend to come in pairs. For now the best solution is just leave tag 0x0006 unused. I guess you can consider it reserved for hypothetical ocompat flags in the future. --- This adds an rcompat flag for the grm, since in theory a filesystem doesn't need to support grms if it never renames files (or creates directories?). But if a filesystem doesn't support grms and a grms gets written into the filesystem, this can lead to corruption. I think every piece of gstate will end up with its own compat flag for this reason. --- Also renamed r/w/oflags -> r/w/ocompatflags to make their purpose clearer. --- The code impact of adding the grm rcompat flag is minimal, and will probably be less for additional rcompat flags: code stack before: 31528 2752 after: 31584 (+0.2%) 2752 (+0.0%)	2023-12-07 15:05:51 -06:00
Christopher Haster	4793d2f144	Fixed new bshrub roots and related bug fixing It turned out by implicitly handling root allocation in lfsr_btree_commit_, we were never allowing lfsr_bshrub_commit to intercept new roots as new bshrubs. Fixing this required moving the root allocation logic up into lfsr_btree_commit. This resulted in quite a bit of small bug fixing because it turns out if you can never create non-inlined bshrubs you never test non-inlined bshrubs: - Our previous rbyd.weight == btree.weight check for if we've reached the root no longer works, changed to an explicit check that the blocks match. Fortunately, now that new roots set trunk=0 new roots are no longer a problematic case. - We need to only evict when we calculate an accurate estimate, the previous code had a bug where eviction occurred early based only on the progged-since-last-estimate. - We need to manually set bshrub.block=mdir.block on new bshrubs, otherwise the lfsr_bshrub_isbshrub check fails in mdir commit staging. Also updated btree/bshrub following code in the dbg scripts, which mostly meant making them accept both BRANCH and SHRUBBRANCH tags as btree/bshrub branches. Conveniently very little code needs to change to extend btree read operations to support bshrubs.	2023-11-21 00:06:08 -06:00
Christopher Haster	6b82e9fb25	Fixed dbg scripts to allow explicit trunks without checksums Note this is intentionally different from how lfsr_rbyd_fetch behaves in lfs.c. We only call lfsr_rbyd_fetch when we need validated checksums, otherwise we just don't fetch. The dbg scripts, on the other hand, always go through fetch, but it is useful to be able to inspect the state of incomplete trunks when debugging. This use to be how the dbg scripts behaved, but they broke because of some recent script work.	2023-11-20 23:28:27 -06:00
Christopher Haster	4ecf4cc654	Added dbgbmap.py, tweaked tracebd.py to match dbgbmap.py parses littlefs's mtree/btrees and displays that status of every block in use: $ ./scripts/dbgbmap.py disk -B4096x256 -Z -H8 -W64 bd 4096x256, 7.8% mdir, 10.2% btree, 78.1% data mmddbbddddddmmddddmmdd--bbbbddddddddddddddbbdddd--ddddddmmdddddd mmddddbbddbbddddddddddddddddbbddddbbddddddmmddbbdddddddddddddddd bbdddddddddddd--ddddddddddddddddbbddddmmmmddddddddddddmmmmdddddd ddddddddddbbdddddddddd--ddddddddddddddmmddddddddddddddddddddmmdd ddddddbbddddddddbb--ddddddddddddddddddddbb--mmmmddbbdddddddddddd ddddddddddddddddddddbbddbbdddddddddddddddddddddddddddddddddddddd dddddddddd--ddddbbddddddddmmbbdd--ddddddddddddddbbmmddddbbdddddd ddmmddddddddddmmddddddddmmddddbbbbdddddddd--ddbbddddddmmdd--ddbb (ok, it looks a bit better with colors) dbgbmap.py matches the layout and has the same options as tracebd.py, allowing the combination of both to provide valuable insight into what exactly littlefs is doing. This required a bit of tweaking of tracebd.py to get right, mostly around conflicting order-based arguments. This also reworks the internal Bmap class to be more resilient to out-of-window ops, and adds an optional informative header.	2023-10-30 15:52:33 -05:00
Christopher Haster	46b78de500	Tweaked tracebd.py in a couple of ways, adopted bdgeom/--off/-n - Tried to do the rescaling a bit better with truncating divisions, so there shouldn't be weird cross-pixel updates when things aren't well aligned. - Adopted optional -B<block_size>x<block_count> flag for explicitly specifying the block-device geometry in a way that is compatible with other scripts. Should adopt this more places. - Adopted optional <block>.<off> argument for start of range. This should match dbgblock.py. - Adopted '-' for noop/zero-wear. - Renamed a few internal things. - Dropped subscript chars for wear, this didn't really add anything and can be accomplished by specifying the --wear-chars explicitly. Also changed dbgblock.py to match, this mostly affects the --off/-n/--size flags. For example, these are all the same: ./scripts/dbgblock.py disk -B4096 --off=10 --size=5 ./scripts/dbgblock.py disk -B4096 --off=10 -n5 ./scripts/dbgblock.py disk -B4096 --off=10,15 ./scripts/dbgblock.py disk -B4096 -n10,15 ./scripts/dbgblock.py disk -B4096 0.10 -n5 Also also adopted block-device geometry argument across scripts, where the -B flag can optionally be a full <block_size>x<block_count> geometry: ./scripts/tracebd.py disk -B4096x256 Though this is mostly unused outside of tracebd.py right now. It will be useful for anything that formats littlefs (littlefs-fuse?) and allowing the format everywhere is a bit of a nice convenience.	2023-10-30 15:52:20 -05:00
Christopher Haster	bfc8021176	Reworked config tags, adopted rflags/wflags/oflags The biggest change here is the breaking up of the FLAGS config into RFLAGS/WFLAGS/OFLAGS. This is directly inspired by, and honestly not much more than a renaming, of the compat/ro_compat/incompat flags found in Linux/Unix/POSIX filesystems. I think these were first introduced in ext2? But I need to do a bit more research on that. RFLAGS/WFLAGS/OFLAGS provide a much more flexible, and extensible, feature flag mechanism than the previous minor version bumps. The (re)naming of these flags is intended to make their requirements more clear. In order to do the relevant operation, you must understand every flag set in the relevant flag: - RFLAGS / incompat flags - All flags must be understood to read the filesystem, if not understood the only possible behavior is to fail. - WFLAGS / ro-compat flags - All flags must be understood to write to the filesystem, if not understood the filesystem may be mounted read-only. - OFLAGS / compat flags - Optional flags, if not understood the relevant flag must be cleared before the filesystem can be written to, but other than that these flags can mostly be ignored. Some hypothetical littlefs examples: - RFLAGS / incompat flags - Transparent compression Is this the same as a major disk-version break? Yes kinda? An implementation that doesn't understand compression can't read the filesystem. On the other hand, it's useful to have a filesystem that can read both compressed and uncompressed variants. - WFLAGS / ro-compat flags - Closed block-map The idea behind a closed block-map (currently planned), is that littlefs maintains in global space a complete mapping of all blocks in use by the filesystem. For such a mapping to remain consistent means that if you write to the filesystem you must understand the closed block-map. Or in other words, if you don't understand the closed block-map you must not write to the filesystem. Reading, on the other hand, can ignore many such write-related auxiliary features, so the filesystem can still be read from. - OFLAGS / compat flags - Global checksums Global checksums (currently planned) are extra checksums attached to each mdir that when combined self-validate the filesystem. But if you don't understand global checksums, you can still read and write the filesystem without them. The only catch is that when you write to the filesystem, you may end up invalidating the global checksum. Clearing the global checksum bit in the OFLAGS is a cheap way to signal that the global checksum is no longer valid, allowing you to still write to the filesystem without this optional feature. Other tweaks to note: - Renamed BLOCKLIMIT/DISKLIMIT -> BLOCKSIZE/BLOCKCOUNT Note these are still the _actual_ block_size/block_count minus 1. The subtle difference here was the original reason for the name change, but after working with it for a bit, I just don't think new, otherwise unused, names are worth it. The minus 1 stays, however, since it avoids overflow issues at extreme boundaries of powers of 2. - Introduces STAGLIMIT/SATTRLIMIT, sys-attribute parallels to UTAGLIMIT/UATTRLIMIT. These may be useful if only uattrs are supported, or vice-versa. - Dropped UATTRLIMIT/SATTRLIMIT to 255 bytes. This feels extreme, but matches NAMELIMIT. These _should_ be small, and limiting the uattr/sattr size to a single-byte leads to really nice packing of the utag+uattrsize in a single integer. This can always be expanded in the future if this limit proves to be a problem. - Renamed MLEAFLIMIT -> MDIRLIMIT and (re?)introduced MTREELIMIT. These may be useful to limiting the mtree when needed, though it's not clear the exact use case quite yet.	2023-10-25 12:08:58 -05:00
Christopher Haster	6dcdf1ed61	Renamed BNAME -> NAME, CCKSUM -> CKSUM It's probably better to have a separate names for a tag category and any specific name, but I can't think of a better name for this tag, and I hadn't noticed that I was already ignoring the C prefix for CCKSUM tags in many places. NAME/CKSUM now mean both the specific tag and tag category, which is a bit of a hack since both happen to be the 0th-subtype of their categories.	2023-10-25 01:25:39 -05:00

1 2 3

103 Commits