Made rbyd cksums erased-state agnostic

Long story short, rbyd checksums are now fully reproducible. If you write the same set of tags to any block, you will end up with the same checksum. This is actually a bit tricky with littlefs's constraints. --- The main problem boils down to erased-state. littlefs has a fairly flexible model for erased-state, and this brings some challenges. In littlefs, storage goes through 2 states: 1. Erase - Prepare storage for progging. Reads after an erase may return arbitrary, but consistent, values. 2. Prog - Program storage with data. Storage must be erased and no progs attempted. Reads after a prog must return the new data. Note in this model erased-state may not be all 0xffs, though it likely will be for flash. This allows littlefs to support a wide range of other storage devices: SD, RAM, NVRAM, encryption, ECC, etc. But this model also means erased-state may be different from block to block, and even different on later erases of the same block. And if that wasn't enough of a challenge, _erased-state can contain perfectly valid commits_. Usually you can expect arbitrary valid cksums to be rare, but thanks to SD, RAM, etc, modeling erase as a noop, valid cksums in erased-state is actually very common. So how do we manage erased-state in our rbyds? First we need some way to detect it, since we can't prog if we're not erased. This is accomplished by the forward-looking erased-state cksum (ecksum): .---+---+---+---. \ | commit | | | | | | | | +---+---+---+---+ +-. | ecksum -------. | | <-- ecksum - cksum of erased state +---+---+---+---+ | / | | cksum --------|---' <-- cksum - cksum of commit, +---+---+---+---+ | including ecksum | padding | | | | | +---+---+---+---+ \ | | erased | +-' | | / . . . . You may have already noticed the start of our problems. The ecksum contains the erased-state, which is different per-block, and our rbyd cksum contains the ecksum. We need to include the ecksum so we know if it's valid, but this means our rbyd cksum changes block to block. Solving this is simple enough: Stop the rbyd's canonical cksum before the ecksum, but include the ecksum in the actual cksum we write to disk. Future commits will need to start from the canonical cksum, so the old ecksum won't be included in new commits, but this shouldn't be a problem: .---+---+---+---. . . \ . \ . . . . .---+---+---+---. \ \ | commit | | | | commit | | | | | | +- rbyd | | | | | | | | cksum | | | | +---+---+---+---+ +-. / +---+---+---+---+ | | | ecksum -------. | | | ecksum | . . +---+---+---+---+ | / | +---+---+---+---+ . . | cksum --------|---' | cksum | . . +---+---+---+---+ | +---+---+---+---+ . . | padding | | | padding | . . | | | | | . . +---+---+---+---+ \ | . . . . . . . +---+---+---+---+ | | | erased | +-' | commit | | | | | / | | | +- rbyd . . | | | | cksum . . +---+---+---+---+ +-. / | ecksum -------. | | +---+---+---+---+ | / | | cksum ------------' +---+---+---+---+ | | padding | | | | | +---+---+---+---+ \ | | erased | +-' | | / . . . . The second challenge is the pesky possibility of existing valid commits. We need some way to ensure that erased-state following a commit does not accidentally contain a valid old commit. This is where are tag's valid bits come into play: The valid bit of each tag must match the parity of all preceding tags (equivalent to the parity of the crc32c), and we can use some perturb bits in the cksum tag to make sure any tags in our erased-state do _not_ match: .---+---+---+---. \ . . . . . .---+---+---+---. \ \ \ |v| tag | | |v| tag | | | | +---+---+---+---+ | +---+---+---+---+ | | | | commit | | | commit | | | | | | | | | | | | +---+---+---+---+ +-----. +---+---+---+---+ +-. | | |v|p| tag | | | |v|p| tag | | | | | +---+---+---+---+ / | +---+---+---+---+ / | | | | cksum | | | cksum | | . . +---+---+---+---+ | +---+---+---+---+ | . . | padding | | | padding | | . . | | | | | | . . +---+---+---+---+ . . . | . . +---+---+---+---+ | | | |v---------------- != --' |v------------------' | | | erased | +---+---+---+---+ | | . . | commit | | | . . | | | | +---+---+---+---+ +-. +-. |v|p| tag | | | | | +---+---+---+---+ / | / | | cksum ----------------' +---+---+---+---+ | | padding | | | | | +---+---+---+---+ | |v---------------- != --' | erased | . . . . New problem! The rbyd cksum contains the valid bits, which contain the perturb bits, which depends on the erased-state! And you can't just derive the valid bits from the rbyd's canonical cksum. This avoids erased-state poisoning, sure, but then nothing in the new commit depends on the perturb bits! The catch-22 here is that we need the valid bits to both depend on, and ignore, the erased-state poisoned perturb bits. As far as I can tell, the only way around this is to make the rybd's canonical cksum not include the parity bits. Which is annoying, masking out bits is not great for bulk cksum calculation... But this does solve our problem: .---+---+---+---. \ . . . . . .---+---+---+---. \ \ \ \ |v| tag | | |v| tag | | | o o +---+---+---+---+ | +---+---+---+---+ | | | | | commit | | | commit | | | | | | | | | | | | | | +---+---+---+---+ +-----. +---+---+---+---+ +-. | | | |v|p| tag | | | |v|p| tag | | | | . . +---+---+---+---+ / | +---+---+---+---+ / | | . . | cksum | | | cksum | | . . . +---+---+---+---+ | +---+---+---+---+ | . . . | padding | | | padding | | . . . | | | | | | . . . +---+---+---+---+ . . . | . . +---+---+---+---+ | | | | |v---------------- != --' |v------------------' | o o | erased | +---+---+---+---+ | | | . . | commit | | | +- rbyd . . | | | | | cksum +---+---+---+---+ +-. +-. / |v|p| tag | | | o | +---+---+---+---+ / | / | | cksum ----------------' +---+---+---+---+ | | padding | | | | | +---+---+---+---+ | |v---------------- != --' | erased | . . . . Note that because each commit's cksum derives from the canonical cksum, the valid bits and commit cksums no longer contain the same data, so our parity(m) = parity(crc32c(m)) trick no longer works. However our crc32c still does tell us a bit about each tag's parity, so with a couple well-placed xors we can at least avoid needing two parallel calculations: cksum' = crc32c(cksum, m) valid' = parity(cksum' xor cksum) xor valid This also means our commit cksums don't include any information about the valid bits, since we mask these out before cksum calculation. Which is a bit concerning, but as far as I can tell not a real problem. --- An alternative design would be to just keep track of two cksums: A commit cksum and a canonical cksum. This would be much simpler, but would also require storing two cksums in RAM in our lfsr_rbyd_t struct. A bit annoying for our 4-byte crc32cs, and a bit more than a bit annoying for hypothetical 32-byte sha256s. It's also not entirely clear how you would update both crc32cs efficiently. There is a way to xor out the initial state before each tag, but I think it would still require O(n) cycles of crc32c calculation... As it is, the extra bit needed to keep track of commit parity is easy enough to sneak into some unused sign bits in our lfsr_rbyd_t struct. --- I've also gone ahead and mixed in the current commit parity into our cksum's perturb bits, so the commit cksum at least contains _some_ information about the previous parity. But it's not entirely clear this actually adds anything. Our perturb bits aren't _required_ to reflect the commit parity, so a very unlucky power-loss could in theory still make a cksum valid for the wrong parity. At least this situation will be caught by later valid bits... I've also carved out a tag encoding, LFSR_TAG_PERTURB, solely for adding more perturb bits to commit cksums: LFSR_TAG_CKSUM 0x3cpp v-11 cccc -ppp pppp LFSR_TAG_CKSUM 0x30pp v-11 ---- -ppp pppp LFSR_TAG_PERTURB 0x3100 v-11 ---1 ---- ---- LFSR_TAG_ECKSUM 0x3200 v-11 --1- ---- ---- LFSR_TAG_GCKSUMDELTA+ 0x3300 v-11 --11 ---- ---- + Planned This allows for more than 7 perturb bits, and could even mix in the entire previous commit cksum, if we ever think that is worth the RAM tradeoff. LFSR_TAG_PERTURB also has the advantage that it is validated by the cksum tag's valid bit before being included in the commit cksum, which indirectly includes the current commit parity. We may eventually want to use this instead of the cksum tag's perturb bits for this reason, but right now I'm not sure this tiny bit of extra safety is worth the minimum 5-byte per commit overhead... Note if you want perturb bits that are also included in the rbyd's canonical cksum, you can just use an LFSR_TAG_SHRUBDATA tag. Or any unreferenced shrub tag really. --- All of these changes required a decent amount of code, I think mostly just to keep track of the parity bit. But the isolation of rbyd cksums from erased-state is necessary for several future-planned features: code stack before: 33564 2816 after: 33916 (+1.0%) 2824 (+0.3%)
2024-04-30 12:38:17 -05:00
parent c4fcc78814
commit 8a75a68d8b
9 changed files with 664 additions and 451 deletions
--- a/scripts/dbgbmap.py
+++ b/scripts/dbgbmap.py
@@ -41,7 +41,8 @@ TAG_UATTR           = 0x0400
 TAG_SATTR           = 0x0600
 TAG_SHRUB           = 0x1000
 TAG_CKSUM           = 0x3000
-TAG_ECKSUM          = 0x3100
+TAG_PERTURB         = 0x3100
+TAG_ECKSUM          = 0x3200
 TAG_ALT             = 0x4000
 TAG_R               = 0x2000
 TAG_GT              = 0x1000
@@ -133,6 +134,9 @@ def crc32c(data, crc=0):
 def popc(x):
    return bin(x).count('1')

+def parity(x):
+    return popc(x) & 1
+
 def fromle32(data):
    return struct.unpack('<I', data[0:4].ljust(4, b'\0'))[0]

@@ -582,13 +586,14 @@ class Bmap:

 # our core rbyd type
 class Rbyd:
-    def __init__(self, block, data, rev, eoff, trunk, weight):
+    def __init__(self, block, data, rev, eoff, trunk, weight, cksum):
        self.block = block
        self.data = data
        self.rev = rev
        self.eoff = eoff
        self.trunk = trunk
        self.weight = weight
+        self.cksum = cksum
        self.redund_blocks = []

    @property
@@ -643,7 +648,10 @@ class Rbyd:
        rev = fromle32(data[0:4])
        cksum = 0
        cksum_ = crc32c(data[0:4])
+        cksum__ = cksum_
+        parity__ = parity(cksum_)
        eoff = 0
+        eoff_ = None
        j_ = 4
        trunk_ = 0
        trunk__ = 0
@@ -651,13 +659,14 @@ class Rbyd:
        weight = 0
        weight_ = 0
        weight__ = 0
-        wastrunk = False
-        trunkeoff = None
        while j_ < len(data) and (not trunk or eoff <= trunk):
            v, tag, w, size, d = fromtag(data[j_:])
-            if v != (popc(cksum_) & 1):
+            if v != parity__:
                break
-            cksum_ = crc32c(data[j_:j_+d], cksum_)
+            parity__ ^= parity(cksum__)
+            cksum__ = crc32c([data[j_] & ~0x80], cksum__)
+            cksum__ = crc32c(data[j_+1:j_+d], cksum__)
+            parity__ ^= parity(cksum__)
            j_ += d
            if not tag & TAG_ALT and j_ + size > len(data):
                break
@@ -665,24 +674,27 @@ class Rbyd:
            # take care of cksums
            if not tag & TAG_ALT:
                if (tag & 0xff00) != TAG_CKSUM:
-                    cksum_ = crc32c(data[j_:j_+size], cksum_)
+                    parity__ ^= parity(cksum__)
+                    cksum__ = crc32c(data[j_:j_+size], cksum__)
+                    parity__ ^= parity(cksum__)
                # found a cksum?
                else:
-                    cksum__ = fromle32(data[j_:j_+4])
-                    if cksum_ != cksum__:
+                    cksum___ = fromle32(data[j_:j_+4])
+                    if cksum__ != cksum___:
                        break
                    # commit what we have
-                    eoff = trunkeoff if trunkeoff else j_ + size
+                    eoff = eoff_ if eoff_ else j_ + size
                    cksum = cksum_
                    trunk_ = trunk__
                    weight = weight_
+                    # revert to data cksum
+                    cksum__ = cksum_

            # evaluate trunks
            if (tag & 0xf000) != TAG_CKSUM and (
-                    not trunk or trunk >= j_-d or wastrunk):
+                    not trunk or j_-d <= trunk or trunk___):
                # new trunk?
-                if not wastrunk:
-                    wastrunk = True
+                if not trunk___:
                    trunk___ = j_-d
                    weight__ = 0

@@ -691,24 +703,26 @@ class Rbyd:

                # end of trunk?
                if not tag & TAG_ALT:
-                    wastrunk = False
+                    # update data checksum
+                    cksum_ = cksum__
                    # update trunk/weight unless we found a shrub or an
                    # explicit trunk (which may be a shrub) is requested
-                    if not tag & TAG_SHRUB or trunk:
+                    if not tag & TAG_SHRUB or trunk___ == trunk:
                        trunk__ = trunk___
                        weight_ = weight__
                        # keep track of eoff for best matching trunk
                        if trunk and j_ + size > trunk:
-                            trunkeoff = j_ + size
-                            eoff = trunkeoff
+                            eoff_ = j_ + size
+                            eoff = eoff_
                            cksum = cksum_
                            trunk_ = trunk__
                            weight = weight_
+                    trunk___ = 0

            if not tag & TAG_ALT:
                j_ += size

-        return cls(block, data, rev, eoff, trunk_, weight)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum)

    def lookup(self, rid, tag):
        if not self: