Implemented self-validating global-checksums (gcksums)

This was quite a puzzle. The problem: How do we detect corrupt mdirs? Seems like a simple question, but we can't just rely on mdir cksums. Our mdirs are independently updateable logs, and logs have this annoying tendency to "rollback" to previously valid states when corrupted. Rollback issues aren't littlefs-specific, but what _is_ littlefs- specific is that when one mdir rolls back, it can disagree with other mdirs, resulting in wildly incorrect filesystem state. To solve this, or at least protect against disagreeable mdirs, we need to somehow include the state of all other mdirs in each mdir commit. --- The first thought: Why not use gstate? We already have a system for storing distributed state. If we add the xor of all of our mdir cksums, we can rebuild it during mount and verify that nothing changed: .--------. .--------. .--------. .--------. .| mdir 0 | .| mdir 1 | .| mdir 2 | .| mdir 3 | || | || | || | || | || gdelta | || gdelta | || gdelta | || gdelta | |'-----|--' |'-----|--' |'-----|--' |'-----|--' '------|-' '------|-' '------|-' '------|-' '--.------' '--.------' '--.------' '--.------' cksum | cksum | cksum | cksum | | | v | v | v | '---------> xor -------> xor -------> xor -------> gcksum | v v v =? '---------> xor -------> xor -------> xor ---> gcksum Unfortunately it's not that easy. Consider what this looks like mathematically (g is our gcksum, c_i is an mdir cksum, d_i is a gcksumdelta, and +/-/sum is xor): g = sum(c_i) = sum(d_i) If we solve for a new gcksumdelta, d_i: d_i = g' - g d_i = g + c_i - g d_i = c_i The gcksum cancels itself out! We're left with an equation that depends only on the current mdir, which doesn't help us at all. Next thought: What if we permute the gcksum with a function t before distributing it over our gcksumdeltas? .--------. .--------. .--------. .--------. .| mdir 0 | .| mdir 1 | .| mdir 2 | .| mdir 3 | || | || | || | || | || gdelta | || gdelta | || gdelta | || gdelta | |'-----|--' |'-----|--' |'-----|--' |'-----|--' '------|-' '------|-' '------|-' '------|-' '--.------' '--.------' '--.------' '--.------' cksum | cksum | cksum | cksum | | | v | v | v | '---------> xor -------> xor -------> xor -------> gcksum | | | | .--t--' | | | | '-> t(gcksum) | v v v =? '---------> xor -------> xor -------> xor ---> t(gcksum) In math terms: t(g) = t(sum(c_i)) = sum(d_i) In order for this to work, t needs to be non-linear. If t is linear, the same thing happens: d_i = t(g') - t(g) d_i = t(g + c_i) - t(g) d_i = t(g) + t(c_i) - t(g) d_i = t(c_i) This was quite funny/frustrating (funnistrating?) during development, because it means a lot of seemingly obvious functions don't work! - t(g) = g - Doesn't work - t(g) = crc32c(g) - Doesn't work because crc32cs are linear - t(g) = g^2 in GF(2^n) - g^2 is linear in GF(2^n)!? Fortunately, powers coprime with 2 finally give us a non-linear function in GF(2^n), so t(g) = g^3 works: d_i = g'^3 - g^3 d_i = (g + c_i)^3 - g^3 d_i = (g^2 + gc_i + gc_i + c_i^2)(g + c_i) - g^3 d_i = (g^2 + c_i^2)(g + c_i) - g^3 d_i = g^3 + gc_i^2 + g^2c_i + c_i^3 - g^3 d_i = gc_i^2 + g^2c_i + c_i^3 --- Bleh, now we need to implement finite-field operations? Well, not entirely! Note that our algorithm never uses division. This means we don't need a full finite-field (+, -, *, /), but can get away with a finite-ring (+, -, *). And conveniently for us, our crc32c polynomial defines a ring epimorphic to a 31-bit finite-field. All we need to do is define crc32c multiplication as polynomial multiplication mod our crc32c polynomial: crc32cmul(a, b) = pmod(pmul(a, b), P) And since crc32c is more-or-less just pmod(x, P), this lets us take advantage of any crc32c hardware/tables that may be available. --- Bunch of notes: - Our 2^n-bit crc-ring maps to a 2^n-1-bit finite-field because our crc polynomial is defined as P(x) = Q(x)(x + 1), where Q(x) is a 2^n-1-bit irreducible polynomial. This is a common crc construction as it provides optimal odd-bit/2-bit error detection, so it shouldn't be too difficult to adapt to other crc sizes. - t(g) = g^3 is not the only function that works, but it turns out to be a pretty good one: - 3 and 2^(2^n-1)-1 are coprime, which means our function t(g) = g^3 provides a one-to-one mapping in the underlying fields of all crc rings of size 2^(2^n). We know 3 and 2^(2^n-1)-1 are coprime because 2^(2^n-1)-1 = 2^(2^n)-1 (a Fermat number) - 2^(2^n-1) (a power-of-2), and 3 divides Fermat numbers >=3 (A023394) and is not 2. - Our delta, when viewed as a polynomial in g: d(g) = gc^2 + g^2c + c^3, has degree 2, which implies there are at most 2 solutions or 1-bit of information loss in the underlying field. This is optimal since the original definition already had 2 solutions before we even chose a function: d(g) = t(g + c) - t(g) d(g) = t(g + c) - t((g + c) - c) d(g) = t((g + c) + c) - t(g + c) d(g) = d(g + c) Though note the mapping of our crc-ring to the underlying field already represents 1-bit of information loss. - If you're using a cryptographic hash or other non-crc, you should probably just use an equal sized finite-field. Though note changing from a 2^n-1-bit field to a 2^n-bit field does change the math a bit, with t(g) = g^7 being a better non-linear function: - 7 is the smallest odd-number coprime with 2^n-1, a Fermat number, which makes t(g) = g^7 a one-to-one mapping. 3 humorously divides all 2^n-1 Fermat numbers. - Expanding delta with t(g) = g^7 gives us a 6 degree polynomial, which implies at most 6 solutions or ~3-bits of information loss. This isn't actually the best you can do, some exhaustive searching over small fields (<=2^16) suggests t(g) = g^(2^(n-1)-1) _might_ be optimal, but that's a heck of a lot more multiplications. - Because our crc32cs preserve parity/are epimorphic to parity bits, addition (xor) and multiplication (crc32cmul) also preserve parity, which can be used to show our entire gcksum system preserves parity. This is quite neat, and means we are guaranteed to detect any odd number of bit-errors across the entire filesystem. - Another idea was to use two different addition operations: xor and overflowing addition (or mod a prime). This probably would have worked, but lacks the rigor of the above solution. - You might think an RS-like construction would help here, where g = sum(c_ia^i), but this suffers from the same problem: d_i = g' - g d_i = g + c_ia^i - g d_i = c_ia^i Nothing here depends on anything outside of the current mdir. - Another question is should we be using an RS-like construction anyways to include location information in our gcksum? Maybe in another system, but I don't think it's necessary in littlefs. While our mdir are independently updateable, they aren't _entirely_ independent. The location of each mdir is stored in either the mtree or a parent mdir, so it always gets mixed into the gcksum somewhere. The only exception being the mrootanchor which is always at the fixed blocks 0x{0,1}. - This does _not_ catch "global-rollback" issues, where the most recent commit in the entire filesystem is corrupted, revealing an older, but still valid, filesystem state. But as far as I am aware this is just a fundamental limitation of powerloss-resilient filesystems, short of doing destructive operations. At the very least, exposing the gcksum would allow the user to store it externally and prevent this issue. --- Implementation details: - Our gcksumdelta depends on the rbyd's cksum, so there's a catch-22 if we include it in the rbyd itself. We can avoid this by including it in the commit tags (actually the separate canonical cksum makes this easier than it would have been earlier), but this does mean LFSR_TAG_GCKSUMDELTA is not an LFSR_TAG_GDELTA subtype. Unfortunate but not a dealbreaker. - Reading/writing the gcksumdelta gets a bit annoying with it not being in the rbyd. For now I've extended the low-level lfsr_rbyd_fetch_/ lfsr_rbyd_appendcksum_ to accept an optional gcksumdelta pointer, which is a bit awkward, but I don't know of a better solution. - Unlike the grm, _every_ mdir commit involves the gcksum, which means we either need to propagate the gcksumdelta up the mroot chain correctly, or somehow keep track of partially flushed gcksumdeltas. To make this work I modified the low-level lfsr_mdir_commit__ functions to accept start_rid=-2 to indicate when gcksumdeltas should be flushed. It's a bit of a hack, but I think it might make sense to extend this to all gdeltas eventually. The gcksum cost both code and RAM, but I think it's well worth it for removing an entire category of filesystem corruption: code stack ctx before: 37796 2608 620 after: 38428 (+1.7%) 2640 (+1.2%) 644 (+3.9%)
2025-01-12 16:01:39 -06:00
parent 0eee57017d
commit 1c5adf71b3
11 changed files with 684 additions and 168 deletions
--- a/lfs.c
+++ b/lfs.c
@@ -1158,6 +1158,7 @@ enum lfsr_tag {
    LFSR_TAG_P              = 0x0001,
    LFSR_TAG_NOTE           = 0x3100,
    LFSR_TAG_ECKSUM         = 0x3200,
+    LFSR_TAG_GCKSUMDELTA    = 0x3300,

    // in-device only tags, these should never get written to disk
    LFSR_TAG_INTERNAL       = 0x0800,
@@ -2725,7 +2726,8 @@ static int lfsr_rbyd_ckecksum(lfs_t *lfs, const lfsr_rbyd_t *rbyd,
 }

 // fetch an rbyd
-static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
+static int lfsr_rbyd_fetch_(lfs_t *lfs,
+        lfsr_rbyd_t *rbyd, uint32_t *gcksumdelta,
        lfs_block_t block, lfs_size_t trunk) {
    // set up some initial state
    rbyd->blocks[0] = block;
@@ -2752,8 +2754,11 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
    lfsr_rid_t weight_ = 0;

    // assume unerased until proven otherwise
-    lfsr_data_t ecksum = LFSR_DATA_NULL();
-    lfsr_data_t ecksum_ = LFSR_DATA_NULL();
+    lfsr_ecksum_t ecksum = {.cksize=-1};
+    lfsr_ecksum_t ecksum_ = {.cksize=-1};
+
+    // also find gcksumdelta, though this is only used by mdirs
+    uint32_t gcksumdelta_ = 0;

    // scan tags, checking valid bits, cksums, etc
    while (off < lfs->cfg->block_size
@@ -2793,7 +2798,33 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,

                // found an ecksum? save for later
                if (tag == LFSR_TAG_ECKSUM) {
-                    ecksum_ = LFSR_DATA_DISK(block, off_, size);
+                    err = lfsr_data_readecksum(lfs,
+                            &LFSR_DATA_DISK(block, off_,
+                                // note this size is to make the hint do
+                                // what we want
+                                lfs->cfg->block_size - off_),
+                            &ecksum_);
+                    if (err) {
+                        if (err == LFS_ERR_CORRUPT) {
+                            break;
+                        }
+                        return err;
+                    }
+
+                // found gcksumdelta? save for later
+                } else if (tag == LFSR_TAG_GCKSUMDELTA) {
+                    err = lfsr_data_readle32(lfs,
+                            &LFSR_DATA_DISK(block, off_,
+                                // note this size is to make the hint do
+                                // what we want
+                                lfs->cfg->block_size - off_),
+                            &gcksumdelta_);
+                    if (err) {
+                        if (err == LFS_ERR_CORRUPT) {
+                            break;
+                        }
+                        return err;
+                    }
                }

            // is an end-of-commit cksum
@@ -2824,13 +2855,17 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
                rbyd->trunk = (LFSR_RBYD_ISSHRUB & rbyd->trunk) | trunk_;
                rbyd->weight = weight;
                ecksum = ecksum_;
+                ecksum_.cksize = -1;
+                if (gcksumdelta) {
+                    *gcksumdelta = gcksumdelta_;
+                }
+                gcksumdelta_ = 0;

                // revert to canonical checksum and perturb if necessary
                cksum_ = cksum
                        ^ ((lfsr_rbyd_isperturb(rbyd))
                            ? LFS_CRC32C_ODDZERO
                            : 0);
-                ecksum_ = LFSR_DATA_NULL();
            }
        }

@@ -2888,18 +2923,9 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,

    // did we end on a valid commit? we may have erased-state
    bool erased = false;
-    if (lfsr_data_size(ecksum) != 0) {
-        // read the erased-state checksum
-        lfsr_ecksum_t ecksum__;
-        err = lfsr_data_readecksum(lfs, &ecksum,
-                &ecksum__);
-        if (err && err != LFS_ERR_CORRUPT) {
-            return err;
-        }
-
-        if (err != LFS_ERR_CORRUPT) {
+    if (ecksum.cksize != -1) {
        // check the erased-state checksum
-            err = lfsr_rbyd_ckecksum(lfs, rbyd, &ecksum__);
+        err = lfsr_rbyd_ckecksum(lfs, rbyd, &ecksum);
        if (err && err != LFS_ERR_CORRUPT) {
            return err;
        }
@@ -2907,7 +2933,6 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
        // found valid erased-state?
        erased = (err != LFS_ERR_CORRUPT);
    }
-    }

    // used eoff=-1 to indicate when there is no erased-state
    if (!erased) {
@@ -2917,6 +2942,11 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
    return 0;
 }

+static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
+        lfs_block_t block, lfs_size_t trunk) {
+    return lfsr_rbyd_fetch_(lfs, rbyd, NULL, block, trunk);
+}
+
 // a more aggressive fetch when checksum is known
 static int lfsr_rbyd_fetchck(lfs_t *lfs, lfsr_rbyd_t *rbyd,
        lfs_block_t block, lfs_size_t trunk,
@@ -3937,7 +3967,11 @@ leaf:;
    return 0;
 }

-static int lfsr_rbyd_appendcksum(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
+// needed in lfsr_rbyd_appendcksum
+static uint32_t lfsr_gcksum_cube(uint32_t gcksum);
+
+static int lfsr_rbyd_appendcksum_(lfs_t *lfs,
+        lfsr_rbyd_t *rbyd, uint32_t *gcksumdelta) {
    // begin appending
    int err = lfsr_rbyd_appendinit(lfs, rbyd);
    if (err) {
@@ -3947,6 +3981,28 @@ static int lfsr_rbyd_appendcksum(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
    // save the canonical checksum
    uint32_t cksum = rbyd->cksum;

+    // append gcksumdelta?
+    //
+    // the only requirement for gcksumdelta is we append after
+    // calculating the canonical checksum, it's a bit more convenient to
+    // append before the ecksum because of end-of-commit calculations
+    if (gcksumdelta) {
+        // figure out changes to our gcksumdelta
+        uint32_t gcksumdelta_ = *gcksumdelta
+                ^ lfsr_gcksum_cube(lfs->gcksum_p)
+                ^ lfsr_gcksum_cube(lfs->gcksum)
+                ^ lfs->gcksum_d;
+        *gcksumdelta = gcksumdelta_;
+
+        uint8_t gcksumdelta_buf[LFSR_LE32_DSIZE];
+        err = lfsr_rbyd_appendrat_(lfs, rbyd, LFSR_RAT(
+                LFSR_TAG_GCKSUMDELTA, 0, LFSR_DATA_LE32(
+                    gcksumdelta_, gcksumdelta_buf)));
+        if (err) {
+            return err;
+        }
+    }
+
    // align to the next prog unit
    //
    // this gets a bit complicated as we have two types of cksums:
@@ -4081,6 +4137,10 @@ static int lfsr_rbyd_appendcksum(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
    return 0;
 }

+static int lfsr_rbyd_appendcksum(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
+    return lfsr_rbyd_appendcksum_(lfs, rbyd, NULL);
+}
+
 static int lfsr_rbyd_appendrats(lfs_t *lfs, lfsr_rbyd_t *rbyd,
        lfsr_srid_t rid, lfsr_srid_t start_rid, lfsr_srid_t end_rid,
        const lfsr_rat_t *rats, lfs_size_t rat_count) {
@@ -6808,6 +6868,14 @@ static inline void lfsr_gdelta_xor(
 }


+// gcksum (global checksum) things
+
+// cubing the gcksum prevents trivial gcksumdeltas
+static uint32_t lfsr_gcksum_cube(uint32_t gcksum) {
+    return lfs_crc32c_mul(lfs_crc32c_mul(gcksum, gcksum), gcksum);
+}
+
+
 // grm (global remove) things
 static inline uint8_t lfsr_grm_count_(const lfsr_grm_t *grm) {
    return (grm->mids[0] >= 0) + (grm->mids[1] >= 0);
@@ -6895,6 +6963,8 @@ static int lfsr_data_readgrm(lfs_t *lfs, lfsr_data_t *data,

 // some mdir-related gstate things we need
 static void lfsr_fs_flushgdelta(lfs_t *lfs) {
+    // zero any pending gdeltas
+    lfs->gcksum_d = 0;
    lfs_memset(lfs->grm_d, 0, LFSR_GRM_DSIZE);
 }

@@ -6911,6 +6981,8 @@ static void lfsr_fs_preparegdelta(lfs_t *lfs) {

 static void lfsr_fs_revertgdelta(lfs_t *lfs) {
    // revert gstate to on-disk state
+    lfs->gcksum = lfs->gcksum_p;
+
    int err = lfsr_data_readgrm(lfs,
            &LFSR_DATA_BUF(lfs->grm_p, LFSR_GRM_DSIZE),
            &lfs->grm);
@@ -6921,11 +6993,15 @@ static void lfsr_fs_revertgdelta(lfs_t *lfs) {

 static void lfsr_fs_commitgdelta(lfs_t *lfs) {
    // commit any pending gdeltas
+    lfs->gcksum_p = lfs->gcksum;
    lfsr_data_fromgrm(&lfs->grm, lfs->grm_p);
 }

 // append and consume any pending gstate
 static int lfsr_rbyd_appendgdelta(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
+    // gcksums are a special case and handled directly in
+    // lfsr_mdir_commit__/lfsr_rbyd_appendcksum_
+
    // need grm delta?
    if (!lfsr_gdelta_iszero(lfs->grm_d, LFSR_GRM_DSIZE)) {
        // make sure to xor any existing delta
@@ -6964,6 +7040,9 @@ static int lfsr_rbyd_appendgdelta(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
 }

 static int lfsr_fs_consumegdelta(lfs_t *lfs, const lfsr_mdir_t *mdir) {
+    // consume any gcksum deltas
+    lfs->gcksum_d ^= mdir->gcksumdelta;
+
    // consume any grm deltas
    lfsr_data_t data;
    int err = lfsr_rbyd_lookup(lfs, &mdir->rbyd, -1, LFSR_TAG_GRMDELTA,
@@ -7065,7 +7144,9 @@ static int lfsr_mdir_fetch(lfs_t *lfs, lfsr_mdir_t *mdir,

    // try to fetch rbyds in the order of most recent to least recent
    for (int i = 0; i < 2; i++) {
-        int err = lfsr_rbyd_fetch(lfs, &mdir->rbyd, blocks[0], 0);
+        int err = lfsr_rbyd_fetch_(lfs,
+                &mdir->rbyd, &mdir->gcksumdelta,
+                blocks[0], 0);
        if (err && err != LFS_ERR_CORRUPT) {
            return err;
        }
@@ -7265,6 +7346,7 @@ static int lfsr_mtree_lookup(lfs_t *lfs, lfsr_smid_t mid,
    if (lfsr_mtree_isnull(&lfs->mtree)) {
        mdir_->mid = mid;
        mdir_->rbyd = lfs->mroot.rbyd;
+        mdir_->gcksumdelta = lfs->mroot.gcksumdelta;
        return 0;

    // looking up direct mdir?
@@ -7308,6 +7390,8 @@ static int lfsr_mdir_alloc__(lfs_t *lfs, lfsr_mdir_t *mdir,
        lfsr_smid_t mid, bool partial) {
    // assign the mid
    mdir->mid = mid;
+    // default to zero gcksumdelta
+    mdir->gcksumdelta = 0;

    if (!partial) {
        // allocate one block without an erase
@@ -7362,6 +7446,8 @@ static int lfsr_mdir_swap__(lfs_t *lfs, lfsr_mdir_t *mdir_,
        const lfsr_mdir_t *mdir, bool force) {
    // assign the mid
    mdir_->mid = mdir->mid;
+    // reset to zero gcksumdelta, upper layers should handle this
+    mdir_->gcksumdelta = 0;

    // first thing we need to do is read our current revision count
    uint32_t rev;
@@ -7686,22 +7772,38 @@ static int lfsr_mdir_commit__(lfs_t *lfs, lfsr_mdir_t *mdir,
    }

    // append any gstate?
-    if (start_rid == -1) {
+    if (start_rid <= -1) {
        int err = lfsr_rbyd_appendgdelta(lfs, &mdir->rbyd);
        if (err) {
            return err;
        }
    }

+    // TODO should lfsr_rbyd_appendcksum_ revert cksum on failure?
+    // save cksum in case we fail
+    uint32_t cksum = mdir->rbyd.cksum;
+    // xor our new cksum
+    lfs->gcksum ^= mdir->rbyd.cksum;
+
    // finalize commit
-    int err = lfsr_rbyd_appendcksum(lfs, &mdir->rbyd);
+    int err = lfsr_rbyd_appendcksum_(lfs, &mdir->rbyd,
+            // include gcksumdelta if we're not relocating
+            (start_rid <= -2) ? &mdir->gcksumdelta : NULL);
    if (err) {
+        // undo cksum xor on failure
+        lfs->gcksum ^= cksum;
        return err;
    }

    // success? flush gstate?
-    if (start_rid == -1) {
+    if (start_rid <= -1) {
+        // TODO this is a hack
+        // we only flush gcksumdelta if rid == -2
+        uint32_t gcksum_d = lfs->gcksum_d;
        lfsr_fs_flushgdelta(lfs);
+        if (start_rid > -2) {
+            lfs->gcksum_d = gcksum_d;
+        }
    }

    return 0;
@@ -7719,7 +7821,7 @@ static lfs_ssize_t lfsr_mdir_estimate__(lfs_t *lfs, const lfsr_mdir_t *mdir,

    // calculate dsize by starting from the outside ids and working inwards,
    // this naturally gives us a split rid
-    lfsr_srid_t a_rid = start_rid;
+    lfsr_srid_t a_rid = lfs_smax(start_rid, -1);
    lfsr_srid_t b_rid = lfs_min(mdir->rbyd.weight, end_rid);
    lfs_size_t a_dsize = 0;
    lfs_size_t b_dsize = 0;
@@ -7827,7 +7929,7 @@ static lfs_ssize_t lfsr_mdir_estimate__(lfs_t *lfs, const lfsr_mdir_t *mdir,
            }
        }

-        if (a_rid == -1) {
+        if (a_rid <= -1) {
            mdir_dsize += dsize_;
        } else {
            a_dsize += dsize_;
@@ -7858,8 +7960,14 @@ static int lfsr_mdir_compact__(lfs_t *lfs, lfsr_mdir_t *mdir_,
    // (btree), not the staged state (btree_), this is important,
    // we can't trust btree_ after a failed commit

+    // assume we keep any gcksumdelta, this will get fixed the first time
+    // we commit anything
+    if (start_rid == -2) {
+        mdir_->gcksumdelta = mdir->gcksumdelta;
+    }
+
    // copy over tags in the rbyd in order
-    lfsr_srid_t rid = start_rid;
+    lfsr_srid_t rid = lfs_smax(start_rid, -1);
    lfsr_tag_t tag = 0;
    while (true) {
        lfsr_rid_t weight;
@@ -8075,8 +8183,14 @@ relocate:;
    }

 compact:;
+    // don't copy over gcksum if relocating
+    lfsr_srid_t start_rid_ = start_rid;
+    if (relocated && !overcompacted) {
+        start_rid_ = lfs_smax(start_rid_, -1);
+    }
+
    // compact our mdir
-    err = lfsr_mdir_compact__(lfs, &mdir_, mdir, start_rid, end_rid);
+    err = lfsr_mdir_compact__(lfs, &mdir_, mdir, start_rid_, end_rid);
    if (err) {
        LFS_ASSERT(err != LFS_ERR_RANGE);
        // bad prog? try another block
@@ -8090,7 +8204,7 @@ compact:;
    //
    // upper layers should make sure this can't fail by limiting the
    // maximum commit size
-    err = lfsr_mdir_commit__(lfs, &mdir_, start_rid, end_rid,
+    err = lfsr_mdir_commit__(lfs, &mdir_, start_rid_, end_rid,
            mid, rats, rat_count);
    if (err) {
        LFS_ASSERT(err != LFS_ERR_RANGE);
@@ -8101,6 +8215,10 @@ compact:;
        return err;
    }

+    // consume gcksumdelta if relocated
+    if (relocated && !overcompacted) {
+        lfs->gcksum_d ^= mdir->gcksumdelta;
+    }
    // update mdir
    *mdir = mdir_;
    return 0;
@@ -8196,6 +8314,9 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
    // setup any pending gdeltas
    lfsr_fs_preparegdelta(lfs);

+    // xor our old cksum
+    lfs->gcksum ^= mdir->rbyd.cksum;
+
    // create a copy
    lfsr_mdir_t mdir_[2];
    mdir_[0] = *mdir;
@@ -8218,7 +8339,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,

    // attempt to commit/compact the mdir normally
    lfsr_srid_t split_rid;
-    int err = lfsr_mdir_commit_(lfs, &mdir_[0], -1, -1, &split_rid,
+    int err = lfsr_mdir_commit_(lfs, &mdir_[0], -2, -1, &split_rid,
            mdir->mid, rats, rat_count);
    if (err && err != LFS_ERR_RANGE
            && err != LFS_ERR_NOENT) {
@@ -8229,6 +8350,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
    lfsr_mdir_t mroot_ = lfs->mroot;
    if (!err && lfsr_mdir_cmp(mdir, &lfs->mroot) == 0) {
        mroot_.rbyd = mdir_[0].rbyd;
+        mroot_.gcksumdelta = mdir_[0].gcksumdelta;
    }

    // handle possible mtree updates, this gets a bit messy
@@ -8328,6 +8450,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
                    mdir_[0].mid >> lfs->mdir_bits,
                    mdir_[0].rbyd.blocks[0], mdir_[0].rbyd.blocks[1]);
            mdir_[0].rbyd = mdir_[1].rbyd;
+            mdir_[0].gcksumdelta = mdir_[1].gcksumdelta;
            goto relocated;

        // other sibling reduced to zero
@@ -8509,6 +8632,18 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
        // mtree should never go to zero since we always have a root bookmark
        LFS_ASSERT(lfsr_mtree_weight_(&mtree_) > 0);

+        // make sure mtree/mroot changes are on-disk before committing
+        // metadata
+        err = lfsr_bd_sync(lfs);
+        if (err) {
+            goto failed;
+        }
+
+        // xor mroot's cksum if we haven't already
+        if (lfsr_mdir_cmp(mdir, &lfs->mroot) != 0) {
+            lfs->gcksum ^= lfs->mroot.rbyd.cksum;
+        }
+
        // mark any copies of our mroot as unerased
        lfs->mroot.rbyd.eoff = -1;
        for (lfsr_omdir_t *o = lfs->omdirs; o; o = o->next) {
@@ -8517,19 +8652,12 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
            }
        }

-        // make sure mtree/mroot changes are on-disk before committing
-        // metadata
-        err = lfsr_bd_sync(lfs);
-        if (err) {
-            goto failed;
-        }
-
        // commit new mtree into our mroot
        //
        // note end_rid=0 here will delete any files leftover from a split
        // in our mroot
        uint8_t mtree_buf[LFS_MAX(LFSR_MPTR_DSIZE, LFSR_BTREE_DSIZE)];
-        err = lfsr_mdir_commit_(lfs, &mroot_, -1, 0, NULL,
+        err = lfsr_mdir_commit_(lfs, &mroot_, -2, 0, NULL,
                -1, LFSR_RATS(
                    (lfsr_mtree_ismptr(&mtree_))
                        ? LFSR_RAT(
@@ -8580,9 +8708,12 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
                goto failed;
            }

+            // xor mrootchild's cksum
+            lfs->gcksum ^= mrootparent_.rbyd.cksum;
+
            // commit mrootchild
            uint8_t mrootchild_buf[LFSR_MPTR_DSIZE];
-            err = lfsr_mdir_commit_(lfs, &mrootparent_, -1, -1, NULL,
+            err = lfsr_mdir_commit_(lfs, &mrootparent_, -2, -1, NULL,
                    -1, LFSR_RATS(
                        LFSR_RAT(
                            LFSR_TAG_MROOT, 0,
@@ -8630,7 +8761,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
            }

            uint8_t mrootchild_buf[LFSR_MPTR_DSIZE];
-            err = lfsr_mdir_commit__(lfs, &mrootanchor_, -1, -1,
+            err = lfsr_mdir_commit__(lfs, &mrootanchor_, -2, -1,
                    -1, LFSR_RATS(
                        LFSR_RAT(
                            LFSR_TAG_MAGIC, 0,
@@ -8656,6 +8787,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
    }

    // gstate must have been committed by a lower-level function at this point
+    LFS_ASSERT(lfs->gcksum_d == 0);
    LFS_ASSERT(lfsr_gdelta_iszero(lfs->grm_d, LFSR_GRM_DSIZE));

    // sync on-disk state
@@ -8745,8 +8877,10 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
                        >= (lfsr_srid_t)mdir_[0].rbyd.weight) {
                o->mdir.mid += (1 << lfs->mdir_bits) - mdir_[0].rbyd.weight;
                o->mdir.rbyd = mdir_[1].rbyd;
+                o->mdir.gcksumdelta = mdir_[1].gcksumdelta;
            } else {
                o->mdir.rbyd = mdir_[0].rbyd;
+                o->mdir.gcksumdelta = mdir_[0].gcksumdelta;
            }
        } else if (o->mdir.mid > mdir->mid) {
            o->mdir.mid += mdelta;
@@ -8757,13 +8891,16 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
    if (mdelta > 0
            && mdir->mid == -1) {
        mdir->rbyd = mroot_.rbyd;
+        mdir->gcksumdelta = mroot_.gcksumdelta;
    } else if (mdelta > 0
            && lfsr_mid_rid(lfs, mdir->mid)
                >= (lfsr_srid_t)mdir_[0].rbyd.weight) {
        mdir->mid += (1 << lfs->mdir_bits) - mdir_[0].rbyd.weight;
        mdir->rbyd = mdir_[1].rbyd;
+        mdir->gcksumdelta = mdir_[1].gcksumdelta;
    } else {
        mdir->rbyd = mdir_[0].rbyd;
+        mdir->gcksumdelta = mdir_[0].gcksumdelta;
    }

    // update mroot and mtree
@@ -13331,6 +13468,12 @@ static int lfs_init(lfs_t *lfs, uint32_t flags,
    lfs->omdirs = NULL;

    // zero gstate
+    lfs->gcksum = 0;
+    lfs->gcksum_p = 0;
+    lfs->gcksum_d = 0;
+
+    lfs->grm.mids[0] = -1;
+    lfs->grm.mids[1] = -1;
    lfs_memset(lfs->grm_p, 0, LFSR_GRM_DSIZE);
    lfs_memset(lfs->grm_d, 0, LFSR_GRM_DSIZE);

@@ -13796,6 +13939,9 @@ static int lfsr_mountinited(lfs_t *lfs) {
            // numbers
            lfs->seed ^= mdir->rbyd.cksum;

+            // build gcksum out of mdir cksums
+            lfs->gcksum_p ^= mdir->rbyd.cksum;
+
            // collect any gdeltas from this mdir
            err = lfsr_fs_consumegdelta(lfs, mdir);
            if (err) {
@@ -13815,6 +13961,42 @@ static int lfsr_mountinited(lfs_t *lfs) {
        }
    }

+    // keep track of the current gcksum
+    lfs->gcksum = lfs->gcksum_p;
+
+    // validate gcksum by comparing its cube against the gcksumdeltas
+    //
+    // The use of cksum^3 here is important to avoid trivial
+    // gcksumdeltas. If we use a linear function (cksum, crc32c(cksum),
+    // cksum^2, etc), the state of the filesystem cancels out when
+    // calculating a new gcksumdelta:
+    //
+    //   d_i = t(g') - t(g)
+    //   d_i = t(g + c_i) - t(g)
+    //   d_i = t(g) + t(c_i) - t(g)
+    //   d_i = t(c_i)
+    //
+    // Using cksum^3 prevents this from happening:
+    //
+    //   d_i = (g + c_i)^3 - g^3
+    //   d_i = (g + c_i)(g + c_i)(g + c_i) - g^3
+    //   d_i = (g^2 + gc_i + gc_i + c_i^2)(g + c_i) - g^3
+    //   d_i = (g^2 + c_i^2)(g + c_i) - g^3
+    //   d_i = g^3 + gc_i^2 + g^2c_i + c_i^3 - g^3
+    //   d_i = gc_i^2 + g^2c_i + c_i^3
+    //
+    // cksum^3 also has some other nice properties, providing a perfect
+    // 1->1 mapping of t(g) in 2^31 fields, and losing at most 3-bits of
+    // info when calculating d_i.
+    //
+    if (lfsr_gcksum_cube(lfs->gcksum) != lfs->gcksum_d) {
+        LFS_ERROR("Found gcksum mismatch, cksum^3 %08"PRIx32" "
+                    "(!= %08"PRIx32")",
+                lfsr_gcksum_cube(lfs->gcksum),
+                lfs->gcksum_d);
+        return LFS_ERR_CORRUPT;
+    }
+
    // once we've mounted and derived a pseudo-random seed, initialize our
    // block allocator
    //
@@ -13924,7 +14106,8 @@ int lfsr_mount(lfs_t *lfs, uint32_t flags,

    // TODO this should use any configured values
    LFS_DEBUG("Mounted littlefs v%"PRId32".%"PRId32" %"PRId32"x%"PRId32" "
-                "0x{%"PRIx32",%"PRIx32"}.%"PRIx32" w%"PRId32".%"PRId32,
+                "0x{%"PRIx32",%"PRIx32"}.%"PRIx32" w%"PRId32".%"PRId32", "
+                "cksum %08"PRIx32,
            LFS_DISK_VERSION_MAJOR,
            LFS_DISK_VERSION_MINOR,
            lfs->cfg->block_size,
@@ -13933,7 +14116,8 @@ int lfsr_mount(lfs_t *lfs, uint32_t flags,
            lfs->mroot.rbyd.blocks[1],
            lfsr_rbyd_trunk(&lfs->mroot.rbyd),
            lfsr_mtree_weight_(&lfs->mtree) >> lfs->mdir_bits,
-            1 << lfs->mdir_bits);
+            1 << lfs->mdir_bits,
+            lfs->gcksum);

    return 0;

@@ -13991,7 +14175,7 @@ static int lfsr_formatinited(lfs_t *lfs) {
        uint8_t name_limit_buf[LFSR_LLEB128_DSIZE];
        uint8_t file_limit_buf[LFSR_LEB128_DSIZE];
        uint8_t bookmark_buf[LFSR_LEB128_DSIZE];
-        err = lfsr_rbyd_commit(lfs, &rbyd, -1, LFSR_RATS(
+        err = lfsr_rbyd_appendrats(lfs, &rbyd, -1, -1, -1, LFSR_RATS(
                LFSR_RAT(
                    LFSR_TAG_MAGIC, 0,
                    LFSR_DATA_BUF("littlefs", 8)),
@@ -14025,6 +14209,13 @@ static int lfsr_formatinited(lfs_t *lfs) {
        if (err) {
            return err;
        }
+
+        // prepare initial gcksum and commit
+        lfs->gcksum = rbyd.cksum;
+        err = lfsr_rbyd_appendcksum_(lfs, &rbyd, &(uint32_t){0});
+        if (err) {
+            return err;
+        }
    }

    // sync on-disk state
--- a/lfs.h
+++ b/lfs.h
@@ -611,6 +611,7 @@ typedef struct {
 typedef struct lfsr_mdir {
    lfsr_smid_t mid;
    lfsr_rbyd_t rbyd;
+    uint32_t gcksumdelta;
 } lfsr_mdir_t;

 typedef struct lfsr_omdir {
@@ -874,6 +875,10 @@ typedef struct lfs {
        uint8_t *buffer;
    } lookahead;

+    uint32_t gcksum;
+    uint32_t gcksum_p;
+    uint32_t gcksum_d;
+
    lfsr_grm_t grm;
    uint8_t grm_p[LFSR_GRM_DSIZE];
    uint8_t grm_d[LFSR_GRM_DSIZE];
--- a/lfs_util.c
+++ b/lfs_util.c
@@ -76,37 +76,9 @@ ssize_t lfs_fromleb128(uint32_t *word, const void *buffer, size_t size) {
 //    return crc;
 //}

-// Calculate crc32c incrementally
-uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size) {
-    // init with 0xffffffff so prefixed zeros affect the crc
-    const uint8_t *data = buffer;
-    crc ^= 0xffffffff;

-    // A couple crc32c implementations to choose from.
-    //
-    // The default, "small-table" implementation offers a decent performance
-    // without much additional code-size, reasonable for microcontrollers. For
-    // anything larger where you really don't care about an extra 1KiB of code
-    // the "big-table" implementation is probably better.
-    //
-    // Some quick measurements with GCC 11 using -Os -mcpu=cortex-m55, with
-    // instruction counts from QEMU and an input size of 4KiB. Note these are
-    // not cycle-accurate:
-    //
-    //                code   stack     ins   ld/st  branch
-    // naive            48      12  221192    4099   36865
-    // small-table     124      12   49160   12291    4097
-    // big-table      1064       8   32776    8195    4097
-    //
-    #if defined(LFS_SMALLER_CRC32C)
-    for (size_t i = 0; i < size; i++) {
-        crc = crc ^ data[i];
-        for (size_t j = 0; j < 8; j++) {
-            crc = (crc >> 1) ^ ((crc & 1) ? 0x82f63b78 : 0);
-        }
-    }
-
-    #elif !defined(LFS_FASTER_CRC32C)
+// crc32c tables (see lfs_crc32c for more info)
+#if !defined(LFS_FASTER_CRC32C)
 static const uint32_t lfs_crc32c_table[16] = {
    0x00000000, 0x105ec76f, 0x20bd8ede, 0x30e349b1,
    0x417b1dbc, 0x5125dad3, 0x61c69362, 0x7198540d,
@@ -114,11 +86,6 @@ uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size) {
    0xc38d26c4, 0xd3d3e1ab, 0xe330a81a, 0xf36e6f75,
 };

-    for (size_t i = 0; i < size; i++) {
-        crc = (crc >> 4) ^ lfs_crc32c_table[0xf & (crc ^ (data[i] >> 0))];
-        crc = (crc >> 4) ^ lfs_crc32c_table[0xf & (crc ^ (data[i] >> 4))];
-    }
-
 #else
 static const uint32_t lfs_crc32c_table[256] = {
    0x00000000, 0xf26b8303, 0xe13b70f7, 0x1350f3f4,
@@ -186,7 +153,46 @@ uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size) {
    0x79b737ba, 0x8bdcb4b9, 0x988c474d, 0x6ae7c44e,
    0xbe2da0a5, 0x4c4623a6, 0x5f16d052, 0xad7d5351,
 };
+#endif

+
+// Calculate crc32c incrementally
+uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size) {
+    // init with 0xffffffff so prefixed zeros affect the crc
+    const uint8_t *data = buffer;
+    crc ^= 0xffffffff;
+
+    // A couple crc32c implementations to choose from.
+    //
+    // The default, "small-table" implementation offers a decent performance
+    // without much additional code-size, reasonable for microcontrollers. For
+    // anything larger where you really don't care about an extra 1KiB of code
+    // the "big-table" implementation is probably better.
+    //
+    // Some quick measurements with GCC 11 using -Os -mcpu=cortex-m55, with
+    // instruction counts from QEMU and an input size of 4KiB. Note these are
+    // not cycle-accurate:
+    //
+    //                code   stack     ins   ld/st  branch
+    // naive            48      12  221192    4099   36865
+    // small-table     124      12   49160   12291    4097
+    // big-table      1064       8   32776    8195    4097
+    //
+    #if defined(LFS_SMALLER_CRC32C)
+    for (size_t i = 0; i < size; i++) {
+        crc = crc ^ data[i];
+        for (size_t j = 0; j < 8; j++) {
+            crc = (crc >> 1) ^ ((crc & 1) ? 0x82f63b78 : 0);
+        }
+    }
+
+    #elif !defined(LFS_FASTER_CRC32C)
+    for (size_t i = 0; i < size; i++) {
+        crc = (crc >> 4) ^ lfs_crc32c_table[0xf & (crc ^ (data[i] >> 0))];
+        crc = (crc >> 4) ^ lfs_crc32c_table[0xf & (crc ^ (data[i] >> 4))];
+    }
+
+    #else
    for (size_t i = 0; i < size; i++) {
        crc = (crc >> 8) ^ lfs_crc32c_table[0xff & (crc ^ data[i])];
    }
@@ -197,4 +203,42 @@ uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size) {
    return crc;
 }

+// Multiply two crc32cs in the crc32c ring
+uint32_t lfs_crc32c_mul(uint32_t a, uint32_t b) {
+    // Multiplication in a crc32c ring involves polynomial
+    // multiplication modulo the crc32c polynomial to keep things
+    // finite:
+    //
+    // r = a * b mod P
+    //
+    // Note because our crc32c is not irreducible, this does not give
+    // us a finite-field, i.e. division is undefined. Still,
+    // multiplication has useful properties.
+
+    // This gets a bit funky because crc32cs are little-endian, but
+    // fortunately pmul is symmetric. Unfortunately the result is
+    // 31-bits large, so we need to shift by 1.
+    uint64_t r = lfs_pmul(a, b) << 1;
+
+    // We can accelerate our module with crc32c tables if present, these
+    // loops may look familiar.
+    #if defined(LFS_SMALLER_CRC32C)
+    for (int i = 0; i < 32; i++) {
+        r = (r >> 1) ^ ((r & 1) ? 0x82f63b78 : 0);
+    }
+
+    #elif !defined(LFS_FASTER_CRC32C)
+    for (int i = 0; i < 8; i++) {
+        r = (r >> 4) ^ lfs_crc32c_table[0xf & r];
+    }
+
+    #else
+    for (int i = 0; i < 4; i++) {
+        r = (r >> 8) ^ lfs_crc32c_table[0xff & r];
+    }
+    #endif
+
+    return (uint32_t)r;
+}
+
 #endif
--- a/lfs_util.h
+++ b/lfs_util.h
@@ -281,6 +281,25 @@ static inline int lfs_scmp(uint32_t a, uint32_t b) {
    return (int)(unsigned)(a - b);
 }

+// Perform polynomial/carry-less multiplication
+//
+// This is a multiply where all adds are replaced with xors. If we view
+// a and b as binary polynomials, xor is polynomial addition and pmul is
+// polynomial multiplication.
+static inline uint64_t lfs_pmul(uint32_t a, uint32_t b) {
+    uint64_t r = 0;
+    uint64_t a_ = a;
+    while (b) {
+        if (b & 1) {
+            r ^= a_;
+        }
+        a_ <<= 1;
+        b >>= 1;
+    }
+    return r;
+}
+
+
 // Convert between 32-bit little-endian and native order
 static inline uint32_t lfs_fromle32(uint32_t a) {
 #if !defined(LFS_NO_BUILTINS) && defined(LFS_LITTLE_ENDIAN)
@@ -603,6 +622,9 @@ static inline size_t lfs_strcspn(const char *a, const char *cs) {
 //
 uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size);

+// Multiply two crc32cs in the crc32c ring
+uint32_t lfs_crc32c_mul(uint32_t a, uint32_t b);
+

 // Allocate memory, only used if buffers are not provided to littlefs
 // Note, memory must be 64-bit aligned
--- a/scripts/dbgbmap.py
+++ b/scripts/dbgbmap.py
@@ -50,10 +50,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----


 CHARS = 'mbd-'
@@ -594,7 +595,8 @@ class Bmap:

 # our core rbyd type
 class Rbyd:
-    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum):
+    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum,
+            gcksumdelta):
        if isinstance(blocks, int):
            blocks = (blocks,)

@@ -605,6 +607,7 @@ class Rbyd:
        self.trunk = trunk
        self.weight = weight
        self.cksum = cksum
+        self.gcksumdelta = gcksumdelta

    @property
    def block(self):
@@ -680,6 +683,8 @@ class Rbyd:
        weight = 0
        weight_ = 0
        weight__ = 0
+        gcksumdelta = None
+        gcksumdelta_ = None
        while j_ < len(data) and (not trunk or eoff <= trunk):
            # read next tag
            v, tag, w, size, d = fromtag(data[j_:])
@@ -695,6 +700,11 @@ class Rbyd:
            if not tag & TAG_ALT:
                if (tag & 0xff00) != TAG_CKSUM:
                    cksum___ = crc32c(data[j_:j_+size], cksum___)
+
+                    # found a gcksumdelta?
+                    if (tag & 0xff00) == TAG_GCKSUMDELTA:
+                        gcksumdelta_ = (tag, w, j_-d, d, data[j_:j_+size])
+
                # found a cksum?
                else:
                    # check cksum
@@ -706,6 +716,8 @@ class Rbyd:
                    cksum_ = cksum__
                    trunk_ = trunk__
                    weight = weight_
+                    gcksumdelta = gcksumdelta_
+                    gcksumdelta_ = None
                    # update perturb bit
                    perturb = tag & TAG_P
                    # revert to data cksum and perturb
@@ -737,6 +749,7 @@ class Rbyd:
                                        0xfca42daf if perturb else 0)
                                trunk_ = trunk__
                                weight = weight_
+                                gcksumdelta = gcksumdelta_
                        trunk___ = 0

                # update canonical checksum, xoring out any perturb state
@@ -747,9 +760,9 @@ class Rbyd:

        # cksum mismatch?
        if cksum is not None and cksum_ != cksum:
-            return cls(block, data, rev, 0, 0, 0, cksum_)
+            return cls(block, data, rev, 0, 0, 0, cksum_, gcksumdelta)

-        return cls(block, data, rev, eoff, trunk_, weight, cksum_)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum_, gcksumdelta)

    def lookup(self, rid, tag):
        if not self:
--- a/scripts/dbgbtree.py
+++ b/scripts/dbgbtree.py
@@ -48,10 +48,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 cccc ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----


 # some ways of block geometry representations
@@ -253,6 +254,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                ' w%d' % w if w else '',
                ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
    else:
        return '0x%04x%s%s' % (
                tag,
@@ -265,7 +271,8 @@ TBranch = co.namedtuple('TBranch', 'a, b, d, c')

 # our core rbyd type
 class Rbyd:
-    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum):
+    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum,
+            gcksumdelta):
        if isinstance(blocks, int):
            blocks = (blocks,)

@@ -276,6 +283,7 @@ class Rbyd:
        self.trunk = trunk
        self.weight = weight
        self.cksum = cksum
+        self.gcksumdelta = gcksumdelta

    @property
    def block(self):
@@ -351,6 +359,8 @@ class Rbyd:
        weight = 0
        weight_ = 0
        weight__ = 0
+        gcksumdelta = None
+        gcksumdelta_ = None
        while j_ < len(data) and (not trunk or eoff <= trunk):
            # read next tag
            v, tag, w, size, d = fromtag(data[j_:])
@@ -366,6 +376,11 @@ class Rbyd:
            if not tag & TAG_ALT:
                if (tag & 0xff00) != TAG_CKSUM:
                    cksum___ = crc32c(data[j_:j_+size], cksum___)
+
+                    # found a gcksumdelta?
+                    if (tag & 0xff00) == TAG_GCKSUMDELTA:
+                        gcksumdelta_ = (tag, w, j_-d, d, data[j_:j_+size])
+
                # found a cksum?
                else:
                    # check cksum
@@ -377,6 +392,8 @@ class Rbyd:
                    cksum_ = cksum__
                    trunk_ = trunk__
                    weight = weight_
+                    gcksumdelta = gcksumdelta_
+                    gcksumdelta_ = None
                    # update perturb bit
                    perturb = tag & TAG_P
                    # revert to data cksum and perturb
@@ -408,6 +425,7 @@ class Rbyd:
                                        0xfca42daf if perturb else 0)
                                trunk_ = trunk__
                                weight = weight_
+                                gcksumdelta = gcksumdelta_
                        trunk___ = 0

                # update canonical checksum, xoring out any perturb state
@@ -418,9 +436,9 @@ class Rbyd:

        # cksum mismatch?
        if cksum is not None and cksum_ != cksum:
-            return cls(block, data, rev, 0, 0, 0, cksum_)
+            return cls(block, data, rev, 0, 0, 0, cksum_, gcksumdelta)

-        return cls(block, data, rev, eoff, trunk_, weight, cksum_)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum_, gcksumdelta)

    def lookup(self, rid, tag):
        if not self:
--- a/scripts/dbglfs.py
+++ b/scripts/dbglfs.py
@@ -49,10 +49,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----


 # some ways of block geometry representations
@@ -123,6 +124,21 @@ def crc32c(data, crc=0):
            crc = (crc >> 1) ^ ((crc & 1) * 0x82f63b78)
    return 0xffffffff ^ crc

+def pmul(a, b):
+    r = 0
+    while b:
+        if b & 1:
+            r ^= a
+        a <<= 1
+        b >>= 1
+    return r
+
+def crc32cmul(a, b):
+    r = pmul(a, b)
+    for _ in range(31):
+        r = (r >> 1) ^ ((r & 1) * 0x82f63b78)
+    return r
+
 def popc(x):
    return bin(x).count('1')

@@ -284,6 +300,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                ' w%d' % w if w else '',
                ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
    else:
        return '0x%04x%s%s' % (
                tag,
@@ -296,7 +317,8 @@ TBranch = co.namedtuple('TBranch', 'a, b, d, c')

 # our core rbyd type
 class Rbyd:
-    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum):
+    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum,
+            gcksumdelta=None):
        if isinstance(blocks, int):
            blocks = (blocks,)

@@ -307,6 +329,7 @@ class Rbyd:
        self.trunk = trunk
        self.weight = weight
        self.cksum = cksum
+        self.gcksumdelta = gcksumdelta

    @property
    def block(self):
@@ -382,6 +405,8 @@ class Rbyd:
        weight = 0
        weight_ = 0
        weight__ = 0
+        gcksumdelta = None
+        gcksumdelta_ = None
        while j_ < len(data) and (not trunk or eoff <= trunk):
            # read next tag
            v, tag, w, size, d = fromtag(data[j_:])
@@ -397,6 +422,11 @@ class Rbyd:
            if not tag & TAG_ALT:
                if (tag & 0xff00) != TAG_CKSUM:
                    cksum___ = crc32c(data[j_:j_+size], cksum___)
+
+                    # found a gcksumdelta?
+                    if (tag & 0xff00) == TAG_GCKSUMDELTA:
+                        gcksumdelta_ = (tag, w, j_-d, d, data[j_:j_+size])
+
                # found a cksum?
                else:
                    # check cksum
@@ -408,6 +438,8 @@ class Rbyd:
                    cksum_ = cksum__
                    trunk_ = trunk__
                    weight = weight_
+                    gcksumdelta = gcksumdelta_
+                    gcksumdelta_ = None
                    # update perturb bit
                    perturb = tag & TAG_P
                    # revert to data cksum and perturb
@@ -439,6 +471,7 @@ class Rbyd:
                                        0xfca42daf if perturb else 0)
                                trunk_ = trunk__
                                weight = weight_
+                                gcksumdelta = gcksumdelta_
                        trunk___ = 0

                # update canonical checksum, xoring out any perturb state
@@ -449,9 +482,9 @@ class Rbyd:

        # cksum mismatch?
        if cksum is not None and cksum_ != cksum:
-            return cls(block, data, rev, 0, 0, 0, cksum_)
+            return cls(block, data, rev, 0, 0, 0, cksum_, gcksumdelta)

-        return cls(block, data, rev, eoff, trunk_, weight, cksum_)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum_, gcksumdelta)

    def lookup(self, rid, tag):
        if not self:
@@ -922,8 +955,11 @@ class Rbyd:
            # have mdir?
            done, rid, tag, w, j, _, data, _ = self.lookup(-1, TAG_MDIR)
            if not done and rid == -1 and tag == TAG_MDIR:
+                if mbid == 0:
                    blocks = frommdir(data)
                    return False, 0, 0, Rbyd.fetch(f, block_size, blocks)
+                else:
+                    return True, 0, 0, None

            else:
                # I guess we're inlined?
@@ -1192,15 +1228,11 @@ class GState:
    def __init__(self, mleaf_weight):
        self.gstate = {}
        self.gdelta = {}
+        self.gcksum = 0
        self.mleaf_weight = mleaf_weight

    def xor(self, mbid, mw, mdir):
-        tag = TAG_GDELTA-0x1
-        while True:
-            done, rid, tag, w, j, d, data, _ = mdir.lookup(-1, tag+0x1)
-            if done or rid != -1 or (tag & 0xff00) != TAG_GDELTA:
-                break
-
+        def gxor(rid, tag, w, j, d, data):
            # keep track of gdeltas
            if tag not in self.gdelta:
                self.gdelta[tag] = []
@@ -1213,7 +1245,35 @@ class GState:
                    a^b for a,b in it.zip_longest(
                        self.gstate[tag], data, fillvalue=0))

+        # gcksum deltas are a bit of a special case
+        self.gcksum ^= mdir.cksum
+        if mdir.gcksumdelta is not None:
+            tag, w, j, d, data = mdir.gcksumdelta
+            gxor(-1, tag, w, j, d, data)
+
+        # other gstate deltas
+        tag = TAG_GDELTA-0x1
+        while True:
+            done, rid, tag, w, j, d, data, _ = mdir.lookup(-1, tag+0x1)
+            if done or rid != -1 or (tag & 0xff00) != TAG_GDELTA:
+                break
+
+            gxor(rid, tag, w, j, d, data)
+
    # parsers for some gstate
+    @ft.cached_property
+    def gcksum_(self):
+        # cubed gcksum
+        return crc32cmul(crc32cmul(self.gcksum, self.gcksum), self.gcksum)
+
+    @ft.cached_property
+    def gcksum__(self):
+        # gcksumdelta based cubed gcksum
+        if TAG_GCKSUMDELTA not in self.gstate:
+            return 0
+
+        return fromle32(self.gstate[TAG_GCKSUMDELTA])
+
    @ft.cached_property
    def grm(self):
        if TAG_GRMDELTA not in self.gstate:
@@ -1233,7 +1293,10 @@ class GState:

    def repr(self):
        def grepr(tag, data):
-            if tag == TAG_GRMDELTA:
+            if tag == TAG_GCKSUMDELTA:
+                gcksum = fromle32(data)
+                return 'gcksum %08x' % gcksum
+            elif tag == TAG_GRMDELTA:
                count, _ = fromleb128(data)
                return 'grm %s' % (
                        'none' if count == 0
@@ -1826,7 +1889,7 @@ def main(disk, mroots=None, *,
                corrupted = True
            else:
                rweight = max(rweight, mdir.weight)
-                gstate.xor(0, mdir)
+                gstate.xor(0, 0, mdir)

                # find any dids
                for rid, tag, w, j, d, data in mdir:
@@ -1908,14 +1971,14 @@ def main(disk, mroots=None, *,
        if grmed_dir_dids != grmed_bookmark_dids:
            corrupted = True

-        # are we going to end up rendering the dtree?
-        dtree = args.get('files') or not (
+        # are we going to end up rendering the ftree?
+        ftree = args.get('files') or not (
                args.get('config') or args.get('gstate'))

        # do a pass to find the width that fits file names+tree, this
        # may not terminate! It's up to the user to use -Z in that case
        f_width = 0
-        if dtree:
+        if ftree:
            def rec_f_width(did, depth):
                depth_ = 0
                width_ = 0
@@ -1941,13 +2004,15 @@ def main(disk, mroots=None, *,
        #### actual debugging begins here

        # print some information about the filesystem
-        print('littlefs v%s.%s %dx%d %s w%d.%d, rev %08x' % (
+        print('littlefs v%s.%s %dx%d %s w%d.%d, rev %08x, cksum %08x%s' % (
                config.version[0] if config.version[0] is not None else '?',
                config.version[1] if config.version[1] is not None else '?',
                (config.geometry[0] or 0), (config.geometry[1] or 0),
                mroot.addr(),
                bweight//mleaf_weight, 1*mleaf_weight,
-                mroot.rev))
+                mroot.rev,
+                gstate.gcksum,
+                '' if gstate.gcksum_ == gstate.gcksum__ else '!'))

        # dynamically size the id field
        w_width = max(
@@ -1982,14 +2047,24 @@ def main(disk, mroots=None, *,
        # print gstate?
        if args.get('gstate'):
            for i, (repr_, tag, data) in enumerate(gstate.repr()):
-                print('%12s %*s %-*s  %s' % (
+                # some special situations worth reporting
+                notes = []
+                # gcksum mismatch?
+                if (tag == TAG_GCKSUMDELTA
+                        and gstate.gcksum_ != gstate.gcksum__):
+                    notes.append('gcksum!=%08x' % gstate.gcksum_)
+
+                print('%s%12s %*s %-*s  %s%s%s' % (
+                        '\x1b[31m' if color and notes else '',
                        'gstate:' if i == 0 else '',
                        2*w_width+1, 'g' if i == 0 else '',
                        21+w_width, repr_,
                        next(xxd(data, 8), '')
                            if not args.get('raw')
                                and not args.get('no_truncate')
-                            else ''))
+                            else '',
+                        ' (%s)' % ', '.join(notes) if notes else '',
+                        '\x1b[m' if color and notes else ''))

                # show on-disk encoding
                if args.get('raw') or args.get('no_truncate'):
@@ -2029,8 +2104,8 @@ def main(disk, mroots=None, *,
                                        2*w_width+1, '',
                                        line))

-        # print dtree?
-        if dtree:
+        # print ftree?
+        if ftree:
            # only show mdir on change
            pmbid = None
            # recursively print directories
@@ -2091,7 +2166,7 @@ def main(disk, mroots=None, *,
                            if did_ not in grmed_dir_dids:
                                notes.append('orphaned')

-                    # print human readable dtree entry
+                    # print human readable ftree entry
                    print('%s%12s %*s %-*s  %s%s%s' % (
                            '\x1b[31m' if color and not grmed and notes
                                else '\x1b[90m'
--- a/scripts/dbgmtree.py
+++ b/scripts/dbgmtree.py
@@ -48,10 +48,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----


 # some ways of block geometry representations
@@ -268,6 +269,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                ' w%d' % w if w else '',
                ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
    else:
        return '0x%04x%s%s' % (
                tag,
@@ -280,7 +286,8 @@ TBranch = co.namedtuple('TBranch', 'a, b, d, c')

 # our core rbyd type
 class Rbyd:
-    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum):
+    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum,
+            gcksumdelta):
        if isinstance(blocks, int):
            blocks = (blocks,)

@@ -291,6 +298,7 @@ class Rbyd:
        self.trunk = trunk
        self.weight = weight
        self.cksum = cksum
+        self.gcksumdelta = gcksumdelta

    @property
    def block(self):
@@ -366,6 +374,8 @@ class Rbyd:
        weight = 0
        weight_ = 0
        weight__ = 0
+        gcksumdelta = None
+        gcksumdelta_ = None
        while j_ < len(data) and (not trunk or eoff <= trunk):
            # read next tag
            v, tag, w, size, d = fromtag(data[j_:])
@@ -381,6 +391,11 @@ class Rbyd:
            if not tag & TAG_ALT:
                if (tag & 0xff00) != TAG_CKSUM:
                    cksum___ = crc32c(data[j_:j_+size], cksum___)
+
+                    # found a gcksumdelta?
+                    if (tag & 0xff00) == TAG_GCKSUMDELTA:
+                        gcksumdelta_ = (tag, w, j_-d, d, data[j_:j_+size])
+
                # found a cksum?
                else:
                    # check cksum
@@ -392,6 +407,8 @@ class Rbyd:
                    cksum_ = cksum__
                    trunk_ = trunk__
                    weight = weight_
+                    gcksumdelta = gcksumdelta_
+                    gcksumdelta_ = None
                    # update perturb bit
                    perturb = tag & TAG_P
                    # revert to data cksum and perturb
@@ -423,6 +440,7 @@ class Rbyd:
                                        0xfca42daf if perturb else 0)
                                trunk_ = trunk__
                                weight = weight_
+                                gcksumdelta = gcksumdelta_
                        trunk___ = 0

                # update canonical checksum, xoring out any perturb state
@@ -433,9 +451,9 @@ class Rbyd:

        # cksum mismatch?
        if cksum is not None and cksum_ != cksum:
-            return cls(block, data, rev, 0, 0, 0, cksum_)
+            return cls(block, data, rev, 0, 0, 0, cksum_, gcksumdelta)

-        return cls(block, data, rev, eoff, trunk_, weight, cksum_)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum_, gcksumdelta)

    def lookup(self, rid, tag):
        if not self:
--- a/scripts/dbgrbyd.py
+++ b/scripts/dbgrbyd.py
@@ -58,10 +58,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----


 # some ways of block geometry representations
@@ -256,6 +257,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                ' w%d' % w if w else '',
                ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
    else:
        return '0x%04x%s%s' % (
                tag,
--- a/scripts/dbgtag.py
+++ b/scripts/dbgtag.py
@@ -46,10 +46,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----


 # some ways of block geometry representations
@@ -210,6 +211,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                ' w%d' % w if w else '',
                ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
    else:
        return '0x%04x%s%s' % (
                tag,
--- a/tests/test_ck.toml
+++ b/tests/test_ck.toml
@@ -2,6 +2,124 @@
 after = ['test_traversal', 'test_gc', 'test_mount']


+code = '''
+// naive crc32c
+static uint32_t test_ck_naive_crc32c(
+        uint32_t crc, const void *buffer, size_t size) {
+    const uint8_t *buffer_ = buffer;
+    crc ^= 0xffffffff;
+
+    for (size_t i = 0; i < size; i++) {
+        crc = crc ^ buffer_[i];
+        for (size_t j = 0; j < 8; j++) {
+            crc = (crc >> 1) ^ ((crc & 1) ? 0x82f63b78 : 0);
+        }
+    }
+
+    crc ^= 0xffffffff;
+    return crc;
+}
+
+// naive crc32c multiplication
+static uint32_t test_ck_naive_crc32c_mul(uint32_t a, uint32_t b) {
+    // pmul
+    uint64_t r = 0;
+    for (int i = 0; i < 32; i++) {
+        if (b & (1 << i)) {
+            r ^= (uint64_t)a << i;
+        }
+    }
+
+    // mod crc32c
+    for (int i = 0; i < 31; i++) {
+        r = (r >> 1) ^ ((r & 1) ? 0x82f63b78 : 0);
+    }
+
+    return (uint32_t)r;
+}
+'''
+
+
+# let's first check that our crc32c math probably works
+
+# try some random inputs and compare with a naive implementation
+[cases.test_ck_crc32c]
+defines.SIZE = [1, 2, 4, 8, 16, 32, 64]
+defines.SEED = 'range(10)'
+defines.N = 1000
+fuzz = 'SEED'
+code = '''
+    uint32_t prng = SEED;
+    for (lfs_size_t i = 0; i < N; i++) {
+        uint8_t buffer[SIZE];
+        for (lfs_size_t j = 0; j < SIZE; j++) {
+            buffer[j] = TEST_PRNG(&prng);
+        }
+
+        uint32_t a = test_ck_naive_crc32c(0, buffer, SIZE);
+        uint32_t b = lfs_crc32c(0, buffer, SIZE);
+        assert(a == b);
+    }
+'''
+
+# test incremental crc32cs
+[cases.test_ck_crc32c_incr]
+defines.SIZE = [1, 2, 4, 8, 16, 32, 64]
+defines.SEED = 'range(10)'
+defines.N = 1000
+fuzz = 'SEED'
+code = '''
+    uint32_t prng = SEED;
+    for (lfs_size_t i = 0; i < N; i++) {
+        uint8_t buffer[SIZE];
+        for (lfs_size_t j = 0; j < SIZE; j++) {
+            buffer[j] = TEST_PRNG(&prng);
+        }
+
+        uint32_t a = lfs_crc32c(0, buffer, SIZE);
+        uint32_t b = 0;
+        for (lfs_size_t j = 0; j < SIZE; j++) {
+            b = lfs_crc32c(b, &buffer[j], 1);
+        }
+        assert(a == b);
+    }
+'''
+
+# try some random inputs and compare with a naive implementation
+[cases.test_ck_crc32c_mul]
+defines.SEED = 'range(10)'
+defines.N = 1000
+fuzz = 'SEED'
+code = '''
+    uint32_t prng = SEED;
+    for (lfs_size_t i = 0; i < N; i++) {
+        uint32_t x = TEST_PRNG(&prng);
+        uint32_t y = TEST_PRNG(&prng);
+
+        uint32_t a = test_ck_naive_crc32c_mul(x, y);
+        uint32_t b = lfs_crc32c_mul(x, y);
+        assert(a == b);
+    }
+'''
+
+# test that multiplication is distributive
+[cases.test_ck_crc32c_mul_dist]
+defines.SEED = 'range(10)'
+defines.N = 1000
+fuzz = 'SEED'
+code = '''
+    uint32_t prng = SEED;
+    for (lfs_size_t i = 0; i < N; i++) {
+        uint32_t x = TEST_PRNG(&prng);
+        uint32_t y = TEST_PRNG(&prng);
+        uint32_t z = TEST_PRNG(&prng);
+
+        uint32_t a = lfs_crc32c_mul(x, y ^ z);
+        uint32_t b = lfs_crc32c_mul(x, y) ^ lfs_crc32c_mul(x, z);
+        assert(a == b);
+    }
+'''
+

 # Test filesystem-level checksum things