From 1c5adf71b38f8eb26e5a00798bfdb5eac6c20a74 Mon Sep 17 00:00:00 2001
From: Christopher Haster <geky@geky.net>
Date: Sun, 12 Jan 2025 16:01:39 -0600
Subject: [PATCH] Implemented self-validating global-checksums (gcksums)

This was quite a puzzle.

The problem: How do we detect corrupt mdirs?

Seems like a simple question, but we can't just rely on mdir cksums. Our
mdirs are independently updateable logs, and logs have this annoying
tendency to "rollback" to previously valid states when corrupted.

Rollback issues aren't littlefs-specific, but what _is_ littlefs-
specific is that when one mdir rolls back, it can disagree with other
mdirs, resulting in wildly incorrect filesystem state.

To solve this, or at least protect against disagreeable mdirs, we need
to somehow include the state of all other mdirs in each mdir commit.

---

The first thought: Why not use gstate?

We already have a system for storing distributed state. If we add the
xor of all of our mdir cksums, we can rebuild it during mount and verify
that nothing changed:

   .--------.   .--------.   .--------.   .--------.
  .| mdir 0 |  .| mdir 1 |  .| mdir 2 |  .| mdir 3 |
  ||        |  ||        |  ||        |  ||        |
  || gdelta |  || gdelta |  || gdelta |  || gdelta |
  |'-----|--'  |'-----|--'  |'-----|--'  |'-----|--'
  '------|-'   '------|-'   '------|-'   '------|-'
  '--.------'  '--.------'  '--.------'  '--.------'
   cksum |      cksum |      cksum |      cksum |
     |   |        v   |        v   |        v   |
     '---------> xor -------> xor -------> xor -------> gcksum
         |            v            v            v         =?
         '---------> xor -------> xor -------> xor ---> gcksum

Unfortunately it's not that easy. Consider what this looks like
mathematically (g is our gcksum, c_i is an mdir cksum, d_i is a
gcksumdelta, and +/-/sum is xor):

  g = sum(c_i) = sum(d_i)

If we solve for a new gcksumdelta, d_i:

  d_i = g' - g
  d_i = g + c_i - g
  d_i = c_i

The gcksum cancels itself out! We're left with an equation that depends
only on the current mdir, which doesn't help us at all.

Next thought: What if we permute the gcksum with a function t before
distributing it over our gcksumdeltas?

   .--------.   .--------.   .--------.   .--------.
  .| mdir 0 |  .| mdir 1 |  .| mdir 2 |  .| mdir 3 |
  ||        |  ||        |  ||        |  ||        |
  || gdelta |  || gdelta |  || gdelta |  || gdelta |
  |'-----|--'  |'-----|--'  |'-----|--'  |'-----|--'
  '------|-'   '------|-'   '------|-'   '------|-'
  '--.------'  '--.------'  '--.------'  '--.------'
   cksum |      cksum |      cksum |      cksum |
     |   |        v   |        v   |        v   |
     '---------> xor -------> xor -------> xor -------> gcksum
         |            |            |            |   .--t--'
         |            |            |            |   '-> t(gcksum)
         |            v            v            v          =?
         '---------> xor -------> xor -------> xor ---> t(gcksum)

In math terms:

  t(g) = t(sum(c_i)) = sum(d_i)

In order for this to work, t needs to be non-linear. If t is linear, the
same thing happens:

  d_i = t(g') - t(g)
  d_i = t(g + c_i) - t(g)
  d_i = t(g) + t(c_i) - t(g)
  d_i = t(c_i)

This was quite funny/frustrating (funnistrating?) during development,
because it means a lot of seemingly obvious functions don't work!

- t(g) = g              - Doesn't work
- t(g) = crc32c(g)      - Doesn't work because crc32cs are linear
- t(g) = g^2 in GF(2^n) - g^2 is linear in GF(2^n)!?

Fortunately, powers coprime with 2 finally give us a non-linear function
in GF(2^n), so t(g) = g^3 works:

  d_i = g'^3 - g^3
  d_i = (g + c_i)^3 - g^3
  d_i = (g^2 + gc_i + gc_i + c_i^2)(g + c_i) - g^3
  d_i = (g^2 + c_i^2)(g + c_i) - g^3
  d_i = g^3 + gc_i^2 + g^2c_i + c_i^3 - g^3
  d_i = gc_i^2 + g^2c_i + c_i^3

---

Bleh, now we need to implement finite-field operations? Well, not
entirely!

Note that our algorithm never uses division. This means we don't need a
full finite-field (+, -, *, /), but can get away with a finite-ring (+,
-, *). And conveniently for us, our crc32c polynomial defines a ring
epimorphic to a 31-bit finite-field.

All we need to do is define crc32c multiplication as polynomial
multiplication mod our crc32c polynomial:

  crc32cmul(a, b) = pmod(pmul(a, b), P)

And since crc32c is more-or-less just pmod(x, P), this lets us take
advantage of any crc32c hardware/tables that may be available.

---

Bunch of notes:

- Our 2^n-bit crc-ring maps to a 2^n-1-bit finite-field because our crc
  polynomial is defined as P(x) = Q(x)(x + 1), where Q(x) is a 2^n-1-bit
  irreducible polynomial.

  This is a common crc construction as it provides optimal odd-bit/2-bit
  error detection, so it shouldn't be too difficult to adapt to other
  crc sizes.

- t(g) = g^3 is not the only function that works, but it turns out to be
  a pretty good one:

  - 3 and 2^(2^n-1)-1 are coprime, which means our function t(g) = g^3
    provides a one-to-one mapping in the underlying fields of all crc
    rings of size 2^(2^n).

    We know 3 and 2^(2^n-1)-1 are coprime because 2^(2^n-1)-1 =
    2^(2^n)-1 (a Fermat number) - 2^(2^n-1) (a power-of-2), and 3
    divides Fermat numbers >=3 (A023394) and is not 2.

  - Our delta, when viewed as a polynomial in g: d(g) = gc^2 + g^2c +
    c^3, has degree 2, which implies there are at most 2 solutions or
    1-bit of information loss in the underlying field.

    This is optimal since the original definition already had 2
    solutions before we even chose a function:

      d(g) = t(g + c) - t(g)
      d(g) = t(g + c) - t((g + c) - c)
      d(g) = t((g + c) + c) - t(g + c)
      d(g) = d(g + c)

  Though note the mapping of our crc-ring to the underlying field
  already represents 1-bit of information loss.

- If you're using a cryptographic hash or other non-crc, you should
  probably just use an equal sized finite-field.

  Though note changing from a 2^n-1-bit field to a 2^n-bit field does
  change the math a bit, with t(g) = g^7 being a better non-linear
  function:

  - 7 is the smallest odd-number coprime with 2^n-1, a Fermat number,
    which makes t(g) = g^7 a one-to-one mapping.

    3 humorously divides all 2^n-1 Fermat numbers.

  - Expanding delta with t(g) = g^7 gives us a 6 degree polynomial,
    which implies at most 6 solutions or ~3-bits of information loss.

    This isn't actually the best you can do, some exhaustive searching
    over small fields (<=2^16) suggests t(g) = g^(2^(n-1)-1) _might_ be
    optimal, but that's a heck of a lot more multiplications.

- Because our crc32cs preserve parity/are epimorphic to parity bits,
  addition (xor) and multiplication (crc32cmul) also preserve parity,
  which can be used to show our entire gcksum system preserves parity.

  This is quite neat, and means we are guaranteed to detect any odd
  number of bit-errors across the entire filesystem.

- Another idea was to use two different addition operations: xor and
  overflowing addition (or mod a prime).

  This probably would have worked, but lacks the rigor of the above
  solution.

- You might think an RS-like construction would help here, where g =
  sum(c_ia^i), but this suffers from the same problem:

    d_i = g' - g
    d_i = g + c_ia^i - g
    d_i = c_ia^i

  Nothing here depends on anything outside of the current mdir.

- Another question is should we be using an RS-like construction anyways
  to include location information in our gcksum?

  Maybe in another system, but I don't think it's necessary in littlefs.

  While our mdir are independently updateable, they aren't _entirely_
  independent. The location of each mdir is stored in either the mtree
  or a parent mdir, so it always gets mixed into the gcksum somewhere.

  The only exception being the mrootanchor which is always at the fixed
  blocks 0x{0,1}.

- This does _not_ catch "global-rollback" issues, where the most recent
  commit in the entire filesystem is corrupted, revealing an older, but
  still valid, filesystem state.

  But as far as I am aware this is just a fundamental limitation of
  powerloss-resilient filesystems, short of doing destructive
  operations.

  At the very least, exposing the gcksum would allow the user to store
  it externally and prevent this issue.

---

Implementation details:

- Our gcksumdelta depends on the rbyd's cksum, so there's a catch-22 if
  we include it in the rbyd itself.

  We can avoid this by including it in the commit tags (actually the
  separate canonical cksum makes this easier than it would have been
  earlier), but this does mean LFSR_TAG_GCKSUMDELTA is not an
  LFSR_TAG_GDELTA subtype. Unfortunate but not a dealbreaker.

- Reading/writing the gcksumdelta gets a bit annoying with it not being
  in the rbyd. For now I've extended the low-level lfsr_rbyd_fetch_/
  lfsr_rbyd_appendcksum_ to accept an optional gcksumdelta pointer,
  which is a bit awkward, but I don't know of a better solution.

- Unlike the grm, _every_ mdir commit involves the gcksum, which means
  we either need to propagate the gcksumdelta up the mroot chain
  correctly, or somehow keep track of partially flushed gcksumdeltas.

  To make this work I modified the low-level lfsr_mdir_commit__
  functions to accept start_rid=-2 to indicate when gcksumdeltas should
  be flushed.

  It's a bit of a hack, but I think it might make sense to extend this
  to all gdeltas eventually.

The gcksum cost both code and RAM, but I think it's well worth it for
removing an entire category of filesystem corruption:

           code          stack          ctx
  before: 37796           2608          620
  after:  38428 (+1.7%)   2640 (+1.2%)  644 (+3.9%)
---
 lfs.c               | 279 +++++++++++++++++++++++++++++++++++++-------
 lfs.h               |   5 +
 lfs_util.c          | 192 ++++++++++++++++++------------
 lfs_util.h          |  22 ++++
 scripts/dbgbmap.py  |  25 +++-
 scripts/dbgbtree.py |  30 ++++-
 scripts/dbglfs.py   | 127 +++++++++++++++-----
 scripts/dbgmtree.py |  30 ++++-
 scripts/dbgrbyd.py  |  12 +-
 scripts/dbgtag.py   |  12 +-
 tests/test_ck.toml  | 118 +++++++++++++++++++
 11 files changed, 684 insertions(+), 168 deletions(-)

diff --git a/lfs.c b/lfs.c
index e5e0fd8b..25c260a9 100644
--- a/lfs.c
+++ b/lfs.c
@@ -1158,6 +1158,7 @@ enum lfsr_tag {
     LFSR_TAG_P              = 0x0001,
     LFSR_TAG_NOTE           = 0x3100,
     LFSR_TAG_ECKSUM         = 0x3200,
+    LFSR_TAG_GCKSUMDELTA    = 0x3300,
 
     // in-device only tags, these should never get written to disk
     LFSR_TAG_INTERNAL       = 0x0800,
@@ -2725,7 +2726,8 @@ static int lfsr_rbyd_ckecksum(lfs_t *lfs, const lfsr_rbyd_t *rbyd,
 }
 
 // fetch an rbyd
-static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
+static int lfsr_rbyd_fetch_(lfs_t *lfs,
+        lfsr_rbyd_t *rbyd, uint32_t *gcksumdelta,
         lfs_block_t block, lfs_size_t trunk) {
     // set up some initial state
     rbyd->blocks[0] = block;
@@ -2752,8 +2754,11 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
     lfsr_rid_t weight_ = 0;
 
     // assume unerased until proven otherwise
-    lfsr_data_t ecksum = LFSR_DATA_NULL();
-    lfsr_data_t ecksum_ = LFSR_DATA_NULL();
+    lfsr_ecksum_t ecksum = {.cksize=-1};
+    lfsr_ecksum_t ecksum_ = {.cksize=-1};
+
+    // also find gcksumdelta, though this is only used by mdirs
+    uint32_t gcksumdelta_ = 0;
 
     // scan tags, checking valid bits, cksums, etc
     while (off < lfs->cfg->block_size
@@ -2793,7 +2798,33 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
 
                 // found an ecksum? save for later
                 if (tag == LFSR_TAG_ECKSUM) {
-                    ecksum_ = LFSR_DATA_DISK(block, off_, size);
+                    err = lfsr_data_readecksum(lfs,
+                            &LFSR_DATA_DISK(block, off_,
+                                // note this size is to make the hint do
+                                // what we want
+                                lfs->cfg->block_size - off_),
+                            &ecksum_);
+                    if (err) {
+                        if (err == LFS_ERR_CORRUPT) {
+                            break;
+                        }
+                        return err;
+                    }
+
+                // found gcksumdelta? save for later
+                } else if (tag == LFSR_TAG_GCKSUMDELTA) {
+                    err = lfsr_data_readle32(lfs,
+                            &LFSR_DATA_DISK(block, off_,
+                                // note this size is to make the hint do
+                                // what we want
+                                lfs->cfg->block_size - off_),
+                            &gcksumdelta_);
+                    if (err) {
+                        if (err == LFS_ERR_CORRUPT) {
+                            break;
+                        }
+                        return err;
+                    }
                 }
 
             // is an end-of-commit cksum
@@ -2824,13 +2855,17 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
                 rbyd->trunk = (LFSR_RBYD_ISSHRUB & rbyd->trunk) | trunk_;
                 rbyd->weight = weight;
                 ecksum = ecksum_;
+                ecksum_.cksize = -1;
+                if (gcksumdelta) {
+                    *gcksumdelta = gcksumdelta_;
+                }
+                gcksumdelta_ = 0;
 
                 // revert to canonical checksum and perturb if necessary
                 cksum_ = cksum
                         ^ ((lfsr_rbyd_isperturb(rbyd))
                             ? LFS_CRC32C_ODDZERO
                             : 0);
-                ecksum_ = LFSR_DATA_NULL();
             }
         }
 
@@ -2888,25 +2923,15 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
 
     // did we end on a valid commit? we may have erased-state
     bool erased = false;
-    if (lfsr_data_size(ecksum) != 0) {
-        // read the erased-state checksum
-        lfsr_ecksum_t ecksum__;
-        err = lfsr_data_readecksum(lfs, &ecksum,
-                &ecksum__);
+    if (ecksum.cksize != -1) {
+        // check the erased-state checksum
+        err = lfsr_rbyd_ckecksum(lfs, rbyd, &ecksum);
         if (err && err != LFS_ERR_CORRUPT) {
             return err;
         }
 
-        if (err != LFS_ERR_CORRUPT) {
-            // check the erased-state checksum
-            err = lfsr_rbyd_ckecksum(lfs, rbyd, &ecksum__);
-            if (err && err != LFS_ERR_CORRUPT) {
-                return err;
-            }
-
-            // found valid erased-state?
-            erased = (err != LFS_ERR_CORRUPT);
-        }
+        // found valid erased-state?
+        erased = (err != LFS_ERR_CORRUPT);
     }
 
     // used eoff=-1 to indicate when there is no erased-state
@@ -2917,6 +2942,11 @@ static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
     return 0;
 }
 
+static int lfsr_rbyd_fetch(lfs_t *lfs, lfsr_rbyd_t *rbyd,
+        lfs_block_t block, lfs_size_t trunk) {
+    return lfsr_rbyd_fetch_(lfs, rbyd, NULL, block, trunk);
+}
+
 // a more aggressive fetch when checksum is known
 static int lfsr_rbyd_fetchck(lfs_t *lfs, lfsr_rbyd_t *rbyd,
         lfs_block_t block, lfs_size_t trunk,
@@ -3937,7 +3967,11 @@ leaf:;
     return 0;
 }
 
-static int lfsr_rbyd_appendcksum(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
+// needed in lfsr_rbyd_appendcksum
+static uint32_t lfsr_gcksum_cube(uint32_t gcksum);
+
+static int lfsr_rbyd_appendcksum_(lfs_t *lfs,
+        lfsr_rbyd_t *rbyd, uint32_t *gcksumdelta) {
     // begin appending
     int err = lfsr_rbyd_appendinit(lfs, rbyd);
     if (err) {
@@ -3947,6 +3981,28 @@ static int lfsr_rbyd_appendcksum(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
     // save the canonical checksum
     uint32_t cksum = rbyd->cksum;
 
+    // append gcksumdelta?
+    //
+    // the only requirement for gcksumdelta is we append after
+    // calculating the canonical checksum, it's a bit more convenient to
+    // append before the ecksum because of end-of-commit calculations
+    if (gcksumdelta) {
+        // figure out changes to our gcksumdelta
+        uint32_t gcksumdelta_ = *gcksumdelta
+                ^ lfsr_gcksum_cube(lfs->gcksum_p)
+                ^ lfsr_gcksum_cube(lfs->gcksum)
+                ^ lfs->gcksum_d;
+        *gcksumdelta = gcksumdelta_;
+
+        uint8_t gcksumdelta_buf[LFSR_LE32_DSIZE];
+        err = lfsr_rbyd_appendrat_(lfs, rbyd, LFSR_RAT(
+                LFSR_TAG_GCKSUMDELTA, 0, LFSR_DATA_LE32(
+                    gcksumdelta_, gcksumdelta_buf)));
+        if (err) {
+            return err;
+        }
+    }
+
     // align to the next prog unit
     //
     // this gets a bit complicated as we have two types of cksums:
@@ -4081,6 +4137,10 @@ static int lfsr_rbyd_appendcksum(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
     return 0;
 }
 
+static int lfsr_rbyd_appendcksum(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
+    return lfsr_rbyd_appendcksum_(lfs, rbyd, NULL);
+}
+
 static int lfsr_rbyd_appendrats(lfs_t *lfs, lfsr_rbyd_t *rbyd,
         lfsr_srid_t rid, lfsr_srid_t start_rid, lfsr_srid_t end_rid,
         const lfsr_rat_t *rats, lfs_size_t rat_count) {
@@ -6808,6 +6868,14 @@ static inline void lfsr_gdelta_xor(
 }
 
 
+// gcksum (global checksum) things
+
+// cubing the gcksum prevents trivial gcksumdeltas
+static uint32_t lfsr_gcksum_cube(uint32_t gcksum) {
+    return lfs_crc32c_mul(lfs_crc32c_mul(gcksum, gcksum), gcksum);
+}
+
+
 // grm (global remove) things
 static inline uint8_t lfsr_grm_count_(const lfsr_grm_t *grm) {
     return (grm->mids[0] >= 0) + (grm->mids[1] >= 0);
@@ -6895,6 +6963,8 @@ static int lfsr_data_readgrm(lfs_t *lfs, lfsr_data_t *data,
 
 // some mdir-related gstate things we need
 static void lfsr_fs_flushgdelta(lfs_t *lfs) {
+    // zero any pending gdeltas
+    lfs->gcksum_d = 0;
     lfs_memset(lfs->grm_d, 0, LFSR_GRM_DSIZE);
 }
 
@@ -6911,6 +6981,8 @@ static void lfsr_fs_preparegdelta(lfs_t *lfs) {
 
 static void lfsr_fs_revertgdelta(lfs_t *lfs) {
     // revert gstate to on-disk state
+    lfs->gcksum = lfs->gcksum_p;
+
     int err = lfsr_data_readgrm(lfs,
             &LFSR_DATA_BUF(lfs->grm_p, LFSR_GRM_DSIZE),
             &lfs->grm);
@@ -6921,11 +6993,15 @@ static void lfsr_fs_revertgdelta(lfs_t *lfs) {
 
 static void lfsr_fs_commitgdelta(lfs_t *lfs) {
     // commit any pending gdeltas
+    lfs->gcksum_p = lfs->gcksum;
     lfsr_data_fromgrm(&lfs->grm, lfs->grm_p);
 }
 
 // append and consume any pending gstate
 static int lfsr_rbyd_appendgdelta(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
+    // gcksums are a special case and handled directly in
+    // lfsr_mdir_commit__/lfsr_rbyd_appendcksum_
+
     // need grm delta?
     if (!lfsr_gdelta_iszero(lfs->grm_d, LFSR_GRM_DSIZE)) {
         // make sure to xor any existing delta
@@ -6964,6 +7040,9 @@ static int lfsr_rbyd_appendgdelta(lfs_t *lfs, lfsr_rbyd_t *rbyd) {
 }
 
 static int lfsr_fs_consumegdelta(lfs_t *lfs, const lfsr_mdir_t *mdir) {
+    // consume any gcksum deltas
+    lfs->gcksum_d ^= mdir->gcksumdelta;
+
     // consume any grm deltas
     lfsr_data_t data;
     int err = lfsr_rbyd_lookup(lfs, &mdir->rbyd, -1, LFSR_TAG_GRMDELTA,
@@ -7065,7 +7144,9 @@ static int lfsr_mdir_fetch(lfs_t *lfs, lfsr_mdir_t *mdir,
 
     // try to fetch rbyds in the order of most recent to least recent
     for (int i = 0; i < 2; i++) {
-        int err = lfsr_rbyd_fetch(lfs, &mdir->rbyd, blocks[0], 0);
+        int err = lfsr_rbyd_fetch_(lfs,
+                &mdir->rbyd, &mdir->gcksumdelta,
+                blocks[0], 0);
         if (err && err != LFS_ERR_CORRUPT) {
             return err;
         }
@@ -7265,6 +7346,7 @@ static int lfsr_mtree_lookup(lfs_t *lfs, lfsr_smid_t mid,
     if (lfsr_mtree_isnull(&lfs->mtree)) {
         mdir_->mid = mid;
         mdir_->rbyd = lfs->mroot.rbyd;
+        mdir_->gcksumdelta = lfs->mroot.gcksumdelta;
         return 0;
 
     // looking up direct mdir?
@@ -7308,6 +7390,8 @@ static int lfsr_mdir_alloc__(lfs_t *lfs, lfsr_mdir_t *mdir,
         lfsr_smid_t mid, bool partial) {
     // assign the mid
     mdir->mid = mid;
+    // default to zero gcksumdelta
+    mdir->gcksumdelta = 0;
 
     if (!partial) {
         // allocate one block without an erase
@@ -7362,6 +7446,8 @@ static int lfsr_mdir_swap__(lfs_t *lfs, lfsr_mdir_t *mdir_,
         const lfsr_mdir_t *mdir, bool force) {
     // assign the mid
     mdir_->mid = mdir->mid;
+    // reset to zero gcksumdelta, upper layers should handle this
+    mdir_->gcksumdelta = 0;
 
     // first thing we need to do is read our current revision count
     uint32_t rev;
@@ -7686,22 +7772,38 @@ static int lfsr_mdir_commit__(lfs_t *lfs, lfsr_mdir_t *mdir,
     }
 
     // append any gstate?
-    if (start_rid == -1) {
+    if (start_rid <= -1) {
         int err = lfsr_rbyd_appendgdelta(lfs, &mdir->rbyd);
         if (err) {
             return err;
         }
     }
 
+    // TODO should lfsr_rbyd_appendcksum_ revert cksum on failure?
+    // save cksum in case we fail
+    uint32_t cksum = mdir->rbyd.cksum;
+    // xor our new cksum
+    lfs->gcksum ^= mdir->rbyd.cksum;
+
     // finalize commit
-    int err = lfsr_rbyd_appendcksum(lfs, &mdir->rbyd);
+    int err = lfsr_rbyd_appendcksum_(lfs, &mdir->rbyd,
+            // include gcksumdelta if we're not relocating
+            (start_rid <= -2) ? &mdir->gcksumdelta : NULL);
     if (err) {
+        // undo cksum xor on failure
+        lfs->gcksum ^= cksum;
         return err;
     }
 
     // success? flush gstate?
-    if (start_rid == -1) {
+    if (start_rid <= -1) {
+        // TODO this is a hack
+        // we only flush gcksumdelta if rid == -2
+        uint32_t gcksum_d = lfs->gcksum_d;
         lfsr_fs_flushgdelta(lfs);
+        if (start_rid > -2) {
+            lfs->gcksum_d = gcksum_d;
+        }
     }
 
     return 0;
@@ -7719,7 +7821,7 @@ static lfs_ssize_t lfsr_mdir_estimate__(lfs_t *lfs, const lfsr_mdir_t *mdir,
 
     // calculate dsize by starting from the outside ids and working inwards,
     // this naturally gives us a split rid
-    lfsr_srid_t a_rid = start_rid;
+    lfsr_srid_t a_rid = lfs_smax(start_rid, -1);
     lfsr_srid_t b_rid = lfs_min(mdir->rbyd.weight, end_rid);
     lfs_size_t a_dsize = 0;
     lfs_size_t b_dsize = 0;
@@ -7827,7 +7929,7 @@ static lfs_ssize_t lfsr_mdir_estimate__(lfs_t *lfs, const lfsr_mdir_t *mdir,
             }
         }
 
-        if (a_rid == -1) {
+        if (a_rid <= -1) {
             mdir_dsize += dsize_;
         } else {
             a_dsize += dsize_;
@@ -7858,8 +7960,14 @@ static int lfsr_mdir_compact__(lfs_t *lfs, lfsr_mdir_t *mdir_,
     // (btree), not the staged state (btree_), this is important,
     // we can't trust btree_ after a failed commit
 
+    // assume we keep any gcksumdelta, this will get fixed the first time
+    // we commit anything
+    if (start_rid == -2) {
+        mdir_->gcksumdelta = mdir->gcksumdelta;
+    }
+
     // copy over tags in the rbyd in order
-    lfsr_srid_t rid = start_rid;
+    lfsr_srid_t rid = lfs_smax(start_rid, -1);
     lfsr_tag_t tag = 0;
     while (true) {
         lfsr_rid_t weight;
@@ -8075,8 +8183,14 @@ relocate:;
     }
 
 compact:;
+    // don't copy over gcksum if relocating
+    lfsr_srid_t start_rid_ = start_rid;
+    if (relocated && !overcompacted) {
+        start_rid_ = lfs_smax(start_rid_, -1);
+    }
+
     // compact our mdir
-    err = lfsr_mdir_compact__(lfs, &mdir_, mdir, start_rid, end_rid);
+    err = lfsr_mdir_compact__(lfs, &mdir_, mdir, start_rid_, end_rid);
     if (err) {
         LFS_ASSERT(err != LFS_ERR_RANGE);
         // bad prog? try another block
@@ -8090,7 +8204,7 @@ compact:;
     //
     // upper layers should make sure this can't fail by limiting the
     // maximum commit size
-    err = lfsr_mdir_commit__(lfs, &mdir_, start_rid, end_rid,
+    err = lfsr_mdir_commit__(lfs, &mdir_, start_rid_, end_rid,
             mid, rats, rat_count);
     if (err) {
         LFS_ASSERT(err != LFS_ERR_RANGE);
@@ -8101,6 +8215,10 @@ compact:;
         return err;
     }
 
+    // consume gcksumdelta if relocated
+    if (relocated && !overcompacted) {
+        lfs->gcksum_d ^= mdir->gcksumdelta;
+    }
     // update mdir
     *mdir = mdir_;
     return 0;
@@ -8196,6 +8314,9 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
     // setup any pending gdeltas
     lfsr_fs_preparegdelta(lfs);
 
+    // xor our old cksum
+    lfs->gcksum ^= mdir->rbyd.cksum;
+
     // create a copy
     lfsr_mdir_t mdir_[2];
     mdir_[0] = *mdir;
@@ -8218,7 +8339,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
 
     // attempt to commit/compact the mdir normally
     lfsr_srid_t split_rid;
-    int err = lfsr_mdir_commit_(lfs, &mdir_[0], -1, -1, &split_rid,
+    int err = lfsr_mdir_commit_(lfs, &mdir_[0], -2, -1, &split_rid,
             mdir->mid, rats, rat_count);
     if (err && err != LFS_ERR_RANGE
             && err != LFS_ERR_NOENT) {
@@ -8229,6 +8350,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
     lfsr_mdir_t mroot_ = lfs->mroot;
     if (!err && lfsr_mdir_cmp(mdir, &lfs->mroot) == 0) {
         mroot_.rbyd = mdir_[0].rbyd;
+        mroot_.gcksumdelta = mdir_[0].gcksumdelta;
     }
 
     // handle possible mtree updates, this gets a bit messy
@@ -8328,6 +8450,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
                     mdir_[0].mid >> lfs->mdir_bits,
                     mdir_[0].rbyd.blocks[0], mdir_[0].rbyd.blocks[1]);
             mdir_[0].rbyd = mdir_[1].rbyd;
+            mdir_[0].gcksumdelta = mdir_[1].gcksumdelta;
             goto relocated;
 
         // other sibling reduced to zero
@@ -8509,6 +8632,18 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
         // mtree should never go to zero since we always have a root bookmark
         LFS_ASSERT(lfsr_mtree_weight_(&mtree_) > 0);
 
+        // make sure mtree/mroot changes are on-disk before committing
+        // metadata
+        err = lfsr_bd_sync(lfs);
+        if (err) {
+            goto failed;
+        }
+
+        // xor mroot's cksum if we haven't already
+        if (lfsr_mdir_cmp(mdir, &lfs->mroot) != 0) {
+            lfs->gcksum ^= lfs->mroot.rbyd.cksum;
+        }
+
         // mark any copies of our mroot as unerased
         lfs->mroot.rbyd.eoff = -1;
         for (lfsr_omdir_t *o = lfs->omdirs; o; o = o->next) {
@@ -8517,19 +8652,12 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
             }
         }
 
-        // make sure mtree/mroot changes are on-disk before committing
-        // metadata
-        err = lfsr_bd_sync(lfs);
-        if (err) {
-            goto failed;
-        }
-
         // commit new mtree into our mroot
         //
         // note end_rid=0 here will delete any files leftover from a split
         // in our mroot
         uint8_t mtree_buf[LFS_MAX(LFSR_MPTR_DSIZE, LFSR_BTREE_DSIZE)];
-        err = lfsr_mdir_commit_(lfs, &mroot_, -1, 0, NULL,
+        err = lfsr_mdir_commit_(lfs, &mroot_, -2, 0, NULL,
                 -1, LFSR_RATS(
                     (lfsr_mtree_ismptr(&mtree_))
                         ? LFSR_RAT(
@@ -8580,9 +8708,12 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
                 goto failed;
             }
 
+            // xor mrootchild's cksum
+            lfs->gcksum ^= mrootparent_.rbyd.cksum;
+
             // commit mrootchild
             uint8_t mrootchild_buf[LFSR_MPTR_DSIZE];
-            err = lfsr_mdir_commit_(lfs, &mrootparent_, -1, -1, NULL,
+            err = lfsr_mdir_commit_(lfs, &mrootparent_, -2, -1, NULL,
                     -1, LFSR_RATS(
                         LFSR_RAT(
                             LFSR_TAG_MROOT, 0,
@@ -8630,7 +8761,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
             }
 
             uint8_t mrootchild_buf[LFSR_MPTR_DSIZE];
-            err = lfsr_mdir_commit__(lfs, &mrootanchor_, -1, -1,
+            err = lfsr_mdir_commit__(lfs, &mrootanchor_, -2, -1,
                     -1, LFSR_RATS(
                         LFSR_RAT(
                             LFSR_TAG_MAGIC, 0,
@@ -8656,6 +8787,7 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
     }
 
     // gstate must have been committed by a lower-level function at this point
+    LFS_ASSERT(lfs->gcksum_d == 0);
     LFS_ASSERT(lfsr_gdelta_iszero(lfs->grm_d, LFSR_GRM_DSIZE));
 
     // sync on-disk state
@@ -8745,8 +8877,10 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
                         >= (lfsr_srid_t)mdir_[0].rbyd.weight) {
                 o->mdir.mid += (1 << lfs->mdir_bits) - mdir_[0].rbyd.weight;
                 o->mdir.rbyd = mdir_[1].rbyd;
+                o->mdir.gcksumdelta = mdir_[1].gcksumdelta;
             } else {
                 o->mdir.rbyd = mdir_[0].rbyd;
+                o->mdir.gcksumdelta = mdir_[0].gcksumdelta;
             }
         } else if (o->mdir.mid > mdir->mid) {
             o->mdir.mid += mdelta;
@@ -8757,13 +8891,16 @@ static int lfsr_mdir_commit(lfs_t *lfs, lfsr_mdir_t *mdir,
     if (mdelta > 0
             && mdir->mid == -1) {
         mdir->rbyd = mroot_.rbyd;
+        mdir->gcksumdelta = mroot_.gcksumdelta;
     } else if (mdelta > 0
             && lfsr_mid_rid(lfs, mdir->mid)
                 >= (lfsr_srid_t)mdir_[0].rbyd.weight) {
         mdir->mid += (1 << lfs->mdir_bits) - mdir_[0].rbyd.weight;
         mdir->rbyd = mdir_[1].rbyd;
+        mdir->gcksumdelta = mdir_[1].gcksumdelta;
     } else {
         mdir->rbyd = mdir_[0].rbyd;
+        mdir->gcksumdelta = mdir_[0].gcksumdelta;
     }
 
     // update mroot and mtree
@@ -13331,6 +13468,12 @@ static int lfs_init(lfs_t *lfs, uint32_t flags,
     lfs->omdirs = NULL;
 
     // zero gstate
+    lfs->gcksum = 0;
+    lfs->gcksum_p = 0;
+    lfs->gcksum_d = 0;
+
+    lfs->grm.mids[0] = -1;
+    lfs->grm.mids[1] = -1;
     lfs_memset(lfs->grm_p, 0, LFSR_GRM_DSIZE);
     lfs_memset(lfs->grm_d, 0, LFSR_GRM_DSIZE);
 
@@ -13796,6 +13939,9 @@ static int lfsr_mountinited(lfs_t *lfs) {
             // numbers
             lfs->seed ^= mdir->rbyd.cksum;
 
+            // build gcksum out of mdir cksums
+            lfs->gcksum_p ^= mdir->rbyd.cksum;
+
             // collect any gdeltas from this mdir
             err = lfsr_fs_consumegdelta(lfs, mdir);
             if (err) {
@@ -13815,6 +13961,42 @@ static int lfsr_mountinited(lfs_t *lfs) {
         }
     }
 
+    // keep track of the current gcksum
+    lfs->gcksum = lfs->gcksum_p;
+
+    // validate gcksum by comparing its cube against the gcksumdeltas
+    //
+    // The use of cksum^3 here is important to avoid trivial
+    // gcksumdeltas. If we use a linear function (cksum, crc32c(cksum),
+    // cksum^2, etc), the state of the filesystem cancels out when
+    // calculating a new gcksumdelta:
+    //
+    //   d_i = t(g') - t(g)
+    //   d_i = t(g + c_i) - t(g)
+    //   d_i = t(g) + t(c_i) - t(g)
+    //   d_i = t(c_i)
+    //
+    // Using cksum^3 prevents this from happening:
+    //
+    //   d_i = (g + c_i)^3 - g^3
+    //   d_i = (g + c_i)(g + c_i)(g + c_i) - g^3
+    //   d_i = (g^2 + gc_i + gc_i + c_i^2)(g + c_i) - g^3
+    //   d_i = (g^2 + c_i^2)(g + c_i) - g^3
+    //   d_i = g^3 + gc_i^2 + g^2c_i + c_i^3 - g^3
+    //   d_i = gc_i^2 + g^2c_i + c_i^3
+    //
+    // cksum^3 also has some other nice properties, providing a perfect
+    // 1->1 mapping of t(g) in 2^31 fields, and losing at most 3-bits of
+    // info when calculating d_i.
+    //
+    if (lfsr_gcksum_cube(lfs->gcksum) != lfs->gcksum_d) {
+        LFS_ERROR("Found gcksum mismatch, cksum^3 %08"PRIx32" "
+                    "(!= %08"PRIx32")",
+                lfsr_gcksum_cube(lfs->gcksum),
+                lfs->gcksum_d);
+        return LFS_ERR_CORRUPT;
+    }
+
     // once we've mounted and derived a pseudo-random seed, initialize our
     // block allocator
     //
@@ -13924,7 +14106,8 @@ int lfsr_mount(lfs_t *lfs, uint32_t flags,
 
     // TODO this should use any configured values
     LFS_DEBUG("Mounted littlefs v%"PRId32".%"PRId32" %"PRId32"x%"PRId32" "
-                "0x{%"PRIx32",%"PRIx32"}.%"PRIx32" w%"PRId32".%"PRId32,
+                "0x{%"PRIx32",%"PRIx32"}.%"PRIx32" w%"PRId32".%"PRId32", "
+                "cksum %08"PRIx32,
             LFS_DISK_VERSION_MAJOR,
             LFS_DISK_VERSION_MINOR,
             lfs->cfg->block_size,
@@ -13933,7 +14116,8 @@ int lfsr_mount(lfs_t *lfs, uint32_t flags,
             lfs->mroot.rbyd.blocks[1],
             lfsr_rbyd_trunk(&lfs->mroot.rbyd),
             lfsr_mtree_weight_(&lfs->mtree) >> lfs->mdir_bits,
-            1 << lfs->mdir_bits);
+            1 << lfs->mdir_bits,
+            lfs->gcksum);
 
     return 0;
 
@@ -13991,7 +14175,7 @@ static int lfsr_formatinited(lfs_t *lfs) {
         uint8_t name_limit_buf[LFSR_LLEB128_DSIZE];
         uint8_t file_limit_buf[LFSR_LEB128_DSIZE];
         uint8_t bookmark_buf[LFSR_LEB128_DSIZE];
-        err = lfsr_rbyd_commit(lfs, &rbyd, -1, LFSR_RATS(
+        err = lfsr_rbyd_appendrats(lfs, &rbyd, -1, -1, -1, LFSR_RATS(
                 LFSR_RAT(
                     LFSR_TAG_MAGIC, 0,
                     LFSR_DATA_BUF("littlefs", 8)),
@@ -14025,6 +14209,13 @@ static int lfsr_formatinited(lfs_t *lfs) {
         if (err) {
             return err;
         }
+
+        // prepare initial gcksum and commit
+        lfs->gcksum = rbyd.cksum;
+        err = lfsr_rbyd_appendcksum_(lfs, &rbyd, &(uint32_t){0});
+        if (err) {
+            return err;
+        }
     }
 
     // sync on-disk state
diff --git a/lfs.h b/lfs.h
index bd6055fe..7f8e27e1 100644
--- a/lfs.h
+++ b/lfs.h
@@ -611,6 +611,7 @@ typedef struct {
 typedef struct lfsr_mdir {
     lfsr_smid_t mid;
     lfsr_rbyd_t rbyd;
+    uint32_t gcksumdelta;
 } lfsr_mdir_t;
 
 typedef struct lfsr_omdir {
@@ -874,6 +875,10 @@ typedef struct lfs {
         uint8_t *buffer;
     } lookahead;
 
+    uint32_t gcksum;
+    uint32_t gcksum_p;
+    uint32_t gcksum_d;
+
     lfsr_grm_t grm;
     uint8_t grm_p[LFSR_GRM_DSIZE];
     uint8_t grm_d[LFSR_GRM_DSIZE];
diff --git a/lfs_util.c b/lfs_util.c
index cc3f3ee9..1f339ab6 100644
--- a/lfs_util.c
+++ b/lfs_util.c
@@ -76,6 +76,86 @@ ssize_t lfs_fromleb128(uint32_t *word, const void *buffer, size_t size) {
 //    return crc;
 //}
 
+
+// crc32c tables (see lfs_crc32c for more info)
+#if !defined(LFS_FASTER_CRC32C)
+static const uint32_t lfs_crc32c_table[16] = {
+    0x00000000, 0x105ec76f, 0x20bd8ede, 0x30e349b1,
+    0x417b1dbc, 0x5125dad3, 0x61c69362, 0x7198540d,
+    0x82f63b78, 0x92a8fc17, 0xa24bb5a6, 0xb21572c9,
+    0xc38d26c4, 0xd3d3e1ab, 0xe330a81a, 0xf36e6f75,
+};
+
+#else
+static const uint32_t lfs_crc32c_table[256] = {
+    0x00000000, 0xf26b8303, 0xe13b70f7, 0x1350f3f4,
+    0xc79a971f, 0x35f1141c, 0x26a1e7e8, 0xd4ca64eb,
+    0x8ad958cf, 0x78b2dbcc, 0x6be22838, 0x9989ab3b,
+    0x4d43cfd0, 0xbf284cd3, 0xac78bf27, 0x5e133c24,
+    0x105ec76f, 0xe235446c, 0xf165b798, 0x030e349b,
+    0xd7c45070, 0x25afd373, 0x36ff2087, 0xc494a384,
+    0x9a879fa0, 0x68ec1ca3, 0x7bbcef57, 0x89d76c54,
+    0x5d1d08bf, 0xaf768bbc, 0xbc267848, 0x4e4dfb4b,
+    0x20bd8ede, 0xd2d60ddd, 0xc186fe29, 0x33ed7d2a,
+    0xe72719c1, 0x154c9ac2, 0x061c6936, 0xf477ea35,
+    0xaa64d611, 0x580f5512, 0x4b5fa6e6, 0xb93425e5,
+    0x6dfe410e, 0x9f95c20d, 0x8cc531f9, 0x7eaeb2fa,
+    0x30e349b1, 0xc288cab2, 0xd1d83946, 0x23b3ba45,
+    0xf779deae, 0x05125dad, 0x1642ae59, 0xe4292d5a,
+    0xba3a117e, 0x4851927d, 0x5b016189, 0xa96ae28a,
+    0x7da08661, 0x8fcb0562, 0x9c9bf696, 0x6ef07595,
+    0x417b1dbc, 0xb3109ebf, 0xa0406d4b, 0x522bee48,
+    0x86e18aa3, 0x748a09a0, 0x67dafa54, 0x95b17957,
+    0xcba24573, 0x39c9c670, 0x2a993584, 0xd8f2b687,
+    0x0c38d26c, 0xfe53516f, 0xed03a29b, 0x1f682198,
+    0x5125dad3, 0xa34e59d0, 0xb01eaa24, 0x42752927,
+    0x96bf4dcc, 0x64d4cecf, 0x77843d3b, 0x85efbe38,
+    0xdbfc821c, 0x2997011f, 0x3ac7f2eb, 0xc8ac71e8,
+    0x1c661503, 0xee0d9600, 0xfd5d65f4, 0x0f36e6f7,
+    0x61c69362, 0x93ad1061, 0x80fde395, 0x72966096,
+    0xa65c047d, 0x5437877e, 0x4767748a, 0xb50cf789,
+    0xeb1fcbad, 0x197448ae, 0x0a24bb5a, 0xf84f3859,
+    0x2c855cb2, 0xdeeedfb1, 0xcdbe2c45, 0x3fd5af46,
+    0x7198540d, 0x83f3d70e, 0x90a324fa, 0x62c8a7f9,
+    0xb602c312, 0x44694011, 0x5739b3e5, 0xa55230e6,
+    0xfb410cc2, 0x092a8fc1, 0x1a7a7c35, 0xe811ff36,
+    0x3cdb9bdd, 0xceb018de, 0xdde0eb2a, 0x2f8b6829,
+    0x82f63b78, 0x709db87b, 0x63cd4b8f, 0x91a6c88c,
+    0x456cac67, 0xb7072f64, 0xa457dc90, 0x563c5f93,
+    0x082f63b7, 0xfa44e0b4, 0xe9141340, 0x1b7f9043,
+    0xcfb5f4a8, 0x3dde77ab, 0x2e8e845f, 0xdce5075c,
+    0x92a8fc17, 0x60c37f14, 0x73938ce0, 0x81f80fe3,
+    0x55326b08, 0xa759e80b, 0xb4091bff, 0x466298fc,
+    0x1871a4d8, 0xea1a27db, 0xf94ad42f, 0x0b21572c,
+    0xdfeb33c7, 0x2d80b0c4, 0x3ed04330, 0xccbbc033,
+    0xa24bb5a6, 0x502036a5, 0x4370c551, 0xb11b4652,
+    0x65d122b9, 0x97baa1ba, 0x84ea524e, 0x7681d14d,
+    0x2892ed69, 0xdaf96e6a, 0xc9a99d9e, 0x3bc21e9d,
+    0xef087a76, 0x1d63f975, 0x0e330a81, 0xfc588982,
+    0xb21572c9, 0x407ef1ca, 0x532e023e, 0xa145813d,
+    0x758fe5d6, 0x87e466d5, 0x94b49521, 0x66df1622,
+    0x38cc2a06, 0xcaa7a905, 0xd9f75af1, 0x2b9cd9f2,
+    0xff56bd19, 0x0d3d3e1a, 0x1e6dcdee, 0xec064eed,
+    0xc38d26c4, 0x31e6a5c7, 0x22b65633, 0xd0ddd530,
+    0x0417b1db, 0xf67c32d8, 0xe52cc12c, 0x1747422f,
+    0x49547e0b, 0xbb3ffd08, 0xa86f0efc, 0x5a048dff,
+    0x8ecee914, 0x7ca56a17, 0x6ff599e3, 0x9d9e1ae0,
+    0xd3d3e1ab, 0x21b862a8, 0x32e8915c, 0xc083125f,
+    0x144976b4, 0xe622f5b7, 0xf5720643, 0x07198540,
+    0x590ab964, 0xab613a67, 0xb831c993, 0x4a5a4a90,
+    0x9e902e7b, 0x6cfbad78, 0x7fab5e8c, 0x8dc0dd8f,
+    0xe330a81a, 0x115b2b19, 0x020bd8ed, 0xf0605bee,
+    0x24aa3f05, 0xd6c1bc06, 0xc5914ff2, 0x37faccf1,
+    0x69e9f0d5, 0x9b8273d6, 0x88d28022, 0x7ab90321,
+    0xae7367ca, 0x5c18e4c9, 0x4f48173d, 0xbd23943e,
+    0xf36e6f75, 0x0105ec76, 0x12551f82, 0xe03e9c81,
+    0x34f4f86a, 0xc69f7b69, 0xd5cf889d, 0x27a40b9e,
+    0x79b737ba, 0x8bdcb4b9, 0x988c474d, 0x6ae7c44e,
+    0xbe2da0a5, 0x4c4623a6, 0x5f16d052, 0xad7d5351,
+};
+#endif
+
+
 // Calculate crc32c incrementally
 uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size) {
     // init with 0xffffffff so prefixed zeros affect the crc
@@ -107,86 +187,12 @@ uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size) {
     }
 
     #elif !defined(LFS_FASTER_CRC32C)
-    static const uint32_t lfs_crc32c_table[16] = {
-        0x00000000, 0x105ec76f, 0x20bd8ede, 0x30e349b1,
-        0x417b1dbc, 0x5125dad3, 0x61c69362, 0x7198540d,
-        0x82f63b78, 0x92a8fc17, 0xa24bb5a6, 0xb21572c9,
-        0xc38d26c4, 0xd3d3e1ab, 0xe330a81a, 0xf36e6f75,
-    };
-
     for (size_t i = 0; i < size; i++) {
         crc = (crc >> 4) ^ lfs_crc32c_table[0xf & (crc ^ (data[i] >> 0))];
         crc = (crc >> 4) ^ lfs_crc32c_table[0xf & (crc ^ (data[i] >> 4))];
     }
 
     #else
-    static const uint32_t lfs_crc32c_table[256] = {
-        0x00000000, 0xf26b8303, 0xe13b70f7, 0x1350f3f4,
-        0xc79a971f, 0x35f1141c, 0x26a1e7e8, 0xd4ca64eb,
-        0x8ad958cf, 0x78b2dbcc, 0x6be22838, 0x9989ab3b,
-        0x4d43cfd0, 0xbf284cd3, 0xac78bf27, 0x5e133c24,
-        0x105ec76f, 0xe235446c, 0xf165b798, 0x030e349b,
-        0xd7c45070, 0x25afd373, 0x36ff2087, 0xc494a384,
-        0x9a879fa0, 0x68ec1ca3, 0x7bbcef57, 0x89d76c54,
-        0x5d1d08bf, 0xaf768bbc, 0xbc267848, 0x4e4dfb4b,
-        0x20bd8ede, 0xd2d60ddd, 0xc186fe29, 0x33ed7d2a,
-        0xe72719c1, 0x154c9ac2, 0x061c6936, 0xf477ea35,
-        0xaa64d611, 0x580f5512, 0x4b5fa6e6, 0xb93425e5,
-        0x6dfe410e, 0x9f95c20d, 0x8cc531f9, 0x7eaeb2fa,
-        0x30e349b1, 0xc288cab2, 0xd1d83946, 0x23b3ba45,
-        0xf779deae, 0x05125dad, 0x1642ae59, 0xe4292d5a,
-        0xba3a117e, 0x4851927d, 0x5b016189, 0xa96ae28a,
-        0x7da08661, 0x8fcb0562, 0x9c9bf696, 0x6ef07595,
-        0x417b1dbc, 0xb3109ebf, 0xa0406d4b, 0x522bee48,
-        0x86e18aa3, 0x748a09a0, 0x67dafa54, 0x95b17957,
-        0xcba24573, 0x39c9c670, 0x2a993584, 0xd8f2b687,
-        0x0c38d26c, 0xfe53516f, 0xed03a29b, 0x1f682198,
-        0x5125dad3, 0xa34e59d0, 0xb01eaa24, 0x42752927,
-        0x96bf4dcc, 0x64d4cecf, 0x77843d3b, 0x85efbe38,
-        0xdbfc821c, 0x2997011f, 0x3ac7f2eb, 0xc8ac71e8,
-        0x1c661503, 0xee0d9600, 0xfd5d65f4, 0x0f36e6f7,
-        0x61c69362, 0x93ad1061, 0x80fde395, 0x72966096,
-        0xa65c047d, 0x5437877e, 0x4767748a, 0xb50cf789,
-        0xeb1fcbad, 0x197448ae, 0x0a24bb5a, 0xf84f3859,
-        0x2c855cb2, 0xdeeedfb1, 0xcdbe2c45, 0x3fd5af46,
-        0x7198540d, 0x83f3d70e, 0x90a324fa, 0x62c8a7f9,
-        0xb602c312, 0x44694011, 0x5739b3e5, 0xa55230e6,
-        0xfb410cc2, 0x092a8fc1, 0x1a7a7c35, 0xe811ff36,
-        0x3cdb9bdd, 0xceb018de, 0xdde0eb2a, 0x2f8b6829,
-        0x82f63b78, 0x709db87b, 0x63cd4b8f, 0x91a6c88c,
-        0x456cac67, 0xb7072f64, 0xa457dc90, 0x563c5f93,
-        0x082f63b7, 0xfa44e0b4, 0xe9141340, 0x1b7f9043,
-        0xcfb5f4a8, 0x3dde77ab, 0x2e8e845f, 0xdce5075c,
-        0x92a8fc17, 0x60c37f14, 0x73938ce0, 0x81f80fe3,
-        0x55326b08, 0xa759e80b, 0xb4091bff, 0x466298fc,
-        0x1871a4d8, 0xea1a27db, 0xf94ad42f, 0x0b21572c,
-        0xdfeb33c7, 0x2d80b0c4, 0x3ed04330, 0xccbbc033,
-        0xa24bb5a6, 0x502036a5, 0x4370c551, 0xb11b4652,
-        0x65d122b9, 0x97baa1ba, 0x84ea524e, 0x7681d14d,
-        0x2892ed69, 0xdaf96e6a, 0xc9a99d9e, 0x3bc21e9d,
-        0xef087a76, 0x1d63f975, 0x0e330a81, 0xfc588982,
-        0xb21572c9, 0x407ef1ca, 0x532e023e, 0xa145813d,
-        0x758fe5d6, 0x87e466d5, 0x94b49521, 0x66df1622,
-        0x38cc2a06, 0xcaa7a905, 0xd9f75af1, 0x2b9cd9f2,
-        0xff56bd19, 0x0d3d3e1a, 0x1e6dcdee, 0xec064eed,
-        0xc38d26c4, 0x31e6a5c7, 0x22b65633, 0xd0ddd530,
-        0x0417b1db, 0xf67c32d8, 0xe52cc12c, 0x1747422f,
-        0x49547e0b, 0xbb3ffd08, 0xa86f0efc, 0x5a048dff,
-        0x8ecee914, 0x7ca56a17, 0x6ff599e3, 0x9d9e1ae0,
-        0xd3d3e1ab, 0x21b862a8, 0x32e8915c, 0xc083125f,
-        0x144976b4, 0xe622f5b7, 0xf5720643, 0x07198540,
-        0x590ab964, 0xab613a67, 0xb831c993, 0x4a5a4a90,
-        0x9e902e7b, 0x6cfbad78, 0x7fab5e8c, 0x8dc0dd8f,
-        0xe330a81a, 0x115b2b19, 0x020bd8ed, 0xf0605bee,
-        0x24aa3f05, 0xd6c1bc06, 0xc5914ff2, 0x37faccf1,
-        0x69e9f0d5, 0x9b8273d6, 0x88d28022, 0x7ab90321,
-        0xae7367ca, 0x5c18e4c9, 0x4f48173d, 0xbd23943e,
-        0xf36e6f75, 0x0105ec76, 0x12551f82, 0xe03e9c81,
-        0x34f4f86a, 0xc69f7b69, 0xd5cf889d, 0x27a40b9e,
-        0x79b737ba, 0x8bdcb4b9, 0x988c474d, 0x6ae7c44e,
-        0xbe2da0a5, 0x4c4623a6, 0x5f16d052, 0xad7d5351,
-    };
-
     for (size_t i = 0; i < size; i++) {
         crc = (crc >> 8) ^ lfs_crc32c_table[0xff & (crc ^ data[i])];
     }
@@ -197,4 +203,42 @@ uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size) {
     return crc;
 }
 
+// Multiply two crc32cs in the crc32c ring
+uint32_t lfs_crc32c_mul(uint32_t a, uint32_t b) {
+    // Multiplication in a crc32c ring involves polynomial
+    // multiplication modulo the crc32c polynomial to keep things
+    // finite:
+    //
+    // r = a * b mod P
+    //
+    // Note because our crc32c is not irreducible, this does not give
+    // us a finite-field, i.e. division is undefined. Still,
+    // multiplication has useful properties.
+
+    // This gets a bit funky because crc32cs are little-endian, but
+    // fortunately pmul is symmetric. Unfortunately the result is
+    // 31-bits large, so we need to shift by 1.
+    uint64_t r = lfs_pmul(a, b) << 1;
+
+    // We can accelerate our module with crc32c tables if present, these
+    // loops may look familiar.
+    #if defined(LFS_SMALLER_CRC32C)
+    for (int i = 0; i < 32; i++) {
+        r = (r >> 1) ^ ((r & 1) ? 0x82f63b78 : 0);
+    }
+
+    #elif !defined(LFS_FASTER_CRC32C)
+    for (int i = 0; i < 8; i++) {
+        r = (r >> 4) ^ lfs_crc32c_table[0xf & r];
+    }
+
+    #else
+    for (int i = 0; i < 4; i++) {
+        r = (r >> 8) ^ lfs_crc32c_table[0xff & r];
+    }
+    #endif
+
+    return (uint32_t)r;
+}
+
 #endif
diff --git a/lfs_util.h b/lfs_util.h
index 55c44da7..a6e5908d 100644
--- a/lfs_util.h
+++ b/lfs_util.h
@@ -281,6 +281,25 @@ static inline int lfs_scmp(uint32_t a, uint32_t b) {
     return (int)(unsigned)(a - b);
 }
 
+// Perform polynomial/carry-less multiplication
+//
+// This is a multiply where all adds are replaced with xors. If we view
+// a and b as binary polynomials, xor is polynomial addition and pmul is
+// polynomial multiplication.
+static inline uint64_t lfs_pmul(uint32_t a, uint32_t b) {
+    uint64_t r = 0;
+    uint64_t a_ = a;
+    while (b) {
+        if (b & 1) {
+            r ^= a_;
+        }
+        a_ <<= 1;
+        b >>= 1;
+    }
+    return r;
+}
+
+
 // Convert between 32-bit little-endian and native order
 static inline uint32_t lfs_fromle32(uint32_t a) {
 #if !defined(LFS_NO_BUILTINS) && defined(LFS_LITTLE_ENDIAN)
@@ -603,6 +622,9 @@ static inline size_t lfs_strcspn(const char *a, const char *cs) {
 //
 uint32_t lfs_crc32c(uint32_t crc, const void *buffer, size_t size);
 
+// Multiply two crc32cs in the crc32c ring
+uint32_t lfs_crc32c_mul(uint32_t a, uint32_t b);
+
 
 // Allocate memory, only used if buffers are not provided to littlefs
 // Note, memory must be 64-bit aligned
diff --git a/scripts/dbgbmap.py b/scripts/dbgbmap.py
index 7fb02563..39d619f6 100755
--- a/scripts/dbgbmap.py
+++ b/scripts/dbgbmap.py
@@ -50,10 +50,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----
 
 
 CHARS = 'mbd-'
@@ -594,7 +595,8 @@ class Bmap:
 
 # our core rbyd type
 class Rbyd:
-    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum):
+    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum,
+            gcksumdelta):
         if isinstance(blocks, int):
             blocks = (blocks,)
 
@@ -605,6 +607,7 @@ class Rbyd:
         self.trunk = trunk
         self.weight = weight
         self.cksum = cksum
+        self.gcksumdelta = gcksumdelta
 
     @property
     def block(self):
@@ -680,6 +683,8 @@ class Rbyd:
         weight = 0
         weight_ = 0
         weight__ = 0
+        gcksumdelta = None
+        gcksumdelta_ = None
         while j_ < len(data) and (not trunk or eoff <= trunk):
             # read next tag
             v, tag, w, size, d = fromtag(data[j_:])
@@ -695,6 +700,11 @@ class Rbyd:
             if not tag & TAG_ALT:
                 if (tag & 0xff00) != TAG_CKSUM:
                     cksum___ = crc32c(data[j_:j_+size], cksum___)
+
+                    # found a gcksumdelta?
+                    if (tag & 0xff00) == TAG_GCKSUMDELTA:
+                        gcksumdelta_ = (tag, w, j_-d, d, data[j_:j_+size])
+
                 # found a cksum?
                 else:
                     # check cksum
@@ -706,6 +716,8 @@ class Rbyd:
                     cksum_ = cksum__
                     trunk_ = trunk__
                     weight = weight_
+                    gcksumdelta = gcksumdelta_
+                    gcksumdelta_ = None
                     # update perturb bit
                     perturb = tag & TAG_P
                     # revert to data cksum and perturb
@@ -737,6 +749,7 @@ class Rbyd:
                                         0xfca42daf if perturb else 0)
                                 trunk_ = trunk__
                                 weight = weight_
+                                gcksumdelta = gcksumdelta_
                         trunk___ = 0
 
                 # update canonical checksum, xoring out any perturb state
@@ -747,9 +760,9 @@ class Rbyd:
 
         # cksum mismatch?
         if cksum is not None and cksum_ != cksum:
-            return cls(block, data, rev, 0, 0, 0, cksum_)
+            return cls(block, data, rev, 0, 0, 0, cksum_, gcksumdelta)
 
-        return cls(block, data, rev, eoff, trunk_, weight, cksum_)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum_, gcksumdelta)
 
     def lookup(self, rid, tag):
         if not self:
diff --git a/scripts/dbgbtree.py b/scripts/dbgbtree.py
index 4590030f..2cd66637 100755
--- a/scripts/dbgbtree.py
+++ b/scripts/dbgbtree.py
@@ -48,10 +48,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 cccc ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----
 
 
 # some ways of block geometry representations
@@ -253,6 +254,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                 ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                 ' w%d' % w if w else '',
                 ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
     else:
         return '0x%04x%s%s' % (
                 tag,
@@ -265,7 +271,8 @@ TBranch = co.namedtuple('TBranch', 'a, b, d, c')
 
 # our core rbyd type
 class Rbyd:
-    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum):
+    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum,
+            gcksumdelta):
         if isinstance(blocks, int):
             blocks = (blocks,)
 
@@ -276,6 +283,7 @@ class Rbyd:
         self.trunk = trunk
         self.weight = weight
         self.cksum = cksum
+        self.gcksumdelta = gcksumdelta
 
     @property
     def block(self):
@@ -351,6 +359,8 @@ class Rbyd:
         weight = 0
         weight_ = 0
         weight__ = 0
+        gcksumdelta = None
+        gcksumdelta_ = None
         while j_ < len(data) and (not trunk or eoff <= trunk):
             # read next tag
             v, tag, w, size, d = fromtag(data[j_:])
@@ -366,6 +376,11 @@ class Rbyd:
             if not tag & TAG_ALT:
                 if (tag & 0xff00) != TAG_CKSUM:
                     cksum___ = crc32c(data[j_:j_+size], cksum___)
+
+                    # found a gcksumdelta?
+                    if (tag & 0xff00) == TAG_GCKSUMDELTA:
+                        gcksumdelta_ = (tag, w, j_-d, d, data[j_:j_+size])
+
                 # found a cksum?
                 else:
                     # check cksum
@@ -377,6 +392,8 @@ class Rbyd:
                     cksum_ = cksum__
                     trunk_ = trunk__
                     weight = weight_
+                    gcksumdelta = gcksumdelta_
+                    gcksumdelta_ = None
                     # update perturb bit
                     perturb = tag & TAG_P
                     # revert to data cksum and perturb
@@ -408,6 +425,7 @@ class Rbyd:
                                         0xfca42daf if perturb else 0)
                                 trunk_ = trunk__
                                 weight = weight_
+                                gcksumdelta = gcksumdelta_
                         trunk___ = 0
 
                 # update canonical checksum, xoring out any perturb state
@@ -418,9 +436,9 @@ class Rbyd:
 
         # cksum mismatch?
         if cksum is not None and cksum_ != cksum:
-            return cls(block, data, rev, 0, 0, 0, cksum_)
+            return cls(block, data, rev, 0, 0, 0, cksum_, gcksumdelta)
 
-        return cls(block, data, rev, eoff, trunk_, weight, cksum_)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum_, gcksumdelta)
 
     def lookup(self, rid, tag):
         if not self:
diff --git a/scripts/dbglfs.py b/scripts/dbglfs.py
index 5690cf3e..0ea428c7 100755
--- a/scripts/dbglfs.py
+++ b/scripts/dbglfs.py
@@ -49,10 +49,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----
 
 
 # some ways of block geometry representations
@@ -123,6 +124,21 @@ def crc32c(data, crc=0):
             crc = (crc >> 1) ^ ((crc & 1) * 0x82f63b78)
     return 0xffffffff ^ crc
 
+def pmul(a, b):
+    r = 0
+    while b:
+        if b & 1:
+            r ^= a
+        a <<= 1
+        b >>= 1
+    return r
+
+def crc32cmul(a, b):
+    r = pmul(a, b)
+    for _ in range(31):
+        r = (r >> 1) ^ ((r & 1) * 0x82f63b78)
+    return r
+
 def popc(x):
     return bin(x).count('1')
 
@@ -284,6 +300,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                 ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                 ' w%d' % w if w else '',
                 ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
     else:
         return '0x%04x%s%s' % (
                 tag,
@@ -296,7 +317,8 @@ TBranch = co.namedtuple('TBranch', 'a, b, d, c')
 
 # our core rbyd type
 class Rbyd:
-    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum):
+    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum,
+            gcksumdelta=None):
         if isinstance(blocks, int):
             blocks = (blocks,)
 
@@ -307,6 +329,7 @@ class Rbyd:
         self.trunk = trunk
         self.weight = weight
         self.cksum = cksum
+        self.gcksumdelta = gcksumdelta
 
     @property
     def block(self):
@@ -382,6 +405,8 @@ class Rbyd:
         weight = 0
         weight_ = 0
         weight__ = 0
+        gcksumdelta = None
+        gcksumdelta_ = None
         while j_ < len(data) and (not trunk or eoff <= trunk):
             # read next tag
             v, tag, w, size, d = fromtag(data[j_:])
@@ -397,6 +422,11 @@ class Rbyd:
             if not tag & TAG_ALT:
                 if (tag & 0xff00) != TAG_CKSUM:
                     cksum___ = crc32c(data[j_:j_+size], cksum___)
+
+                    # found a gcksumdelta?
+                    if (tag & 0xff00) == TAG_GCKSUMDELTA:
+                        gcksumdelta_ = (tag, w, j_-d, d, data[j_:j_+size])
+
                 # found a cksum?
                 else:
                     # check cksum
@@ -408,6 +438,8 @@ class Rbyd:
                     cksum_ = cksum__
                     trunk_ = trunk__
                     weight = weight_
+                    gcksumdelta = gcksumdelta_
+                    gcksumdelta_ = None
                     # update perturb bit
                     perturb = tag & TAG_P
                     # revert to data cksum and perturb
@@ -439,6 +471,7 @@ class Rbyd:
                                         0xfca42daf if perturb else 0)
                                 trunk_ = trunk__
                                 weight = weight_
+                                gcksumdelta = gcksumdelta_
                         trunk___ = 0
 
                 # update canonical checksum, xoring out any perturb state
@@ -449,9 +482,9 @@ class Rbyd:
 
         # cksum mismatch?
         if cksum is not None and cksum_ != cksum:
-            return cls(block, data, rev, 0, 0, 0, cksum_)
+            return cls(block, data, rev, 0, 0, 0, cksum_, gcksumdelta)
 
-        return cls(block, data, rev, eoff, trunk_, weight, cksum_)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum_, gcksumdelta)
 
     def lookup(self, rid, tag):
         if not self:
@@ -922,8 +955,11 @@ class Rbyd:
             # have mdir?
             done, rid, tag, w, j, _, data, _ = self.lookup(-1, TAG_MDIR)
             if not done and rid == -1 and tag == TAG_MDIR:
-                blocks = frommdir(data)
-                return False, 0, 0, Rbyd.fetch(f, block_size, blocks)
+                if mbid == 0:
+                    blocks = frommdir(data)
+                    return False, 0, 0, Rbyd.fetch(f, block_size, blocks)
+                else:
+                    return True, 0, 0, None
 
             else:
                 # I guess we're inlined?
@@ -1192,15 +1228,11 @@ class GState:
     def __init__(self, mleaf_weight):
         self.gstate = {}
         self.gdelta = {}
+        self.gcksum = 0
         self.mleaf_weight = mleaf_weight
 
     def xor(self, mbid, mw, mdir):
-        tag = TAG_GDELTA-0x1
-        while True:
-            done, rid, tag, w, j, d, data, _ = mdir.lookup(-1, tag+0x1)
-            if done or rid != -1 or (tag & 0xff00) != TAG_GDELTA:
-                break
-
+        def gxor(rid, tag, w, j, d, data):
             # keep track of gdeltas
             if tag not in self.gdelta:
                 self.gdelta[tag] = []
@@ -1213,7 +1245,35 @@ class GState:
                     a^b for a,b in it.zip_longest(
                         self.gstate[tag], data, fillvalue=0))
 
+        # gcksum deltas are a bit of a special case
+        self.gcksum ^= mdir.cksum
+        if mdir.gcksumdelta is not None:
+            tag, w, j, d, data = mdir.gcksumdelta
+            gxor(-1, tag, w, j, d, data)
+
+        # other gstate deltas
+        tag = TAG_GDELTA-0x1
+        while True:
+            done, rid, tag, w, j, d, data, _ = mdir.lookup(-1, tag+0x1)
+            if done or rid != -1 or (tag & 0xff00) != TAG_GDELTA:
+                break
+
+            gxor(rid, tag, w, j, d, data)
+
     # parsers for some gstate
+    @ft.cached_property
+    def gcksum_(self):
+        # cubed gcksum
+        return crc32cmul(crc32cmul(self.gcksum, self.gcksum), self.gcksum)
+
+    @ft.cached_property
+    def gcksum__(self):
+        # gcksumdelta based cubed gcksum
+        if TAG_GCKSUMDELTA not in self.gstate:
+            return 0
+
+        return fromle32(self.gstate[TAG_GCKSUMDELTA])
+
     @ft.cached_property
     def grm(self):
         if TAG_GRMDELTA not in self.gstate:
@@ -1233,7 +1293,10 @@ class GState:
 
     def repr(self):
         def grepr(tag, data):
-            if tag == TAG_GRMDELTA:
+            if tag == TAG_GCKSUMDELTA:
+                gcksum = fromle32(data)
+                return 'gcksum %08x' % gcksum
+            elif tag == TAG_GRMDELTA:
                 count, _ = fromleb128(data)
                 return 'grm %s' % (
                         'none' if count == 0
@@ -1826,7 +1889,7 @@ def main(disk, mroots=None, *,
                 corrupted = True
             else:
                 rweight = max(rweight, mdir.weight)
-                gstate.xor(0, mdir)
+                gstate.xor(0, 0, mdir)
 
                 # find any dids
                 for rid, tag, w, j, d, data in mdir:
@@ -1908,14 +1971,14 @@ def main(disk, mroots=None, *,
         if grmed_dir_dids != grmed_bookmark_dids:
             corrupted = True
 
-        # are we going to end up rendering the dtree?
-        dtree = args.get('files') or not (
+        # are we going to end up rendering the ftree?
+        ftree = args.get('files') or not (
                 args.get('config') or args.get('gstate'))
 
         # do a pass to find the width that fits file names+tree, this
         # may not terminate! It's up to the user to use -Z in that case
         f_width = 0
-        if dtree:
+        if ftree:
             def rec_f_width(did, depth):
                 depth_ = 0
                 width_ = 0
@@ -1941,13 +2004,15 @@ def main(disk, mroots=None, *,
         #### actual debugging begins here
 
         # print some information about the filesystem
-        print('littlefs v%s.%s %dx%d %s w%d.%d, rev %08x' % (
+        print('littlefs v%s.%s %dx%d %s w%d.%d, rev %08x, cksum %08x%s' % (
                 config.version[0] if config.version[0] is not None else '?',
                 config.version[1] if config.version[1] is not None else '?',
                 (config.geometry[0] or 0), (config.geometry[1] or 0),
                 mroot.addr(),
                 bweight//mleaf_weight, 1*mleaf_weight,
-                mroot.rev))
+                mroot.rev,
+                gstate.gcksum,
+                '' if gstate.gcksum_ == gstate.gcksum__ else '!'))
 
         # dynamically size the id field
         w_width = max(
@@ -1982,14 +2047,24 @@ def main(disk, mroots=None, *,
         # print gstate?
         if args.get('gstate'):
             for i, (repr_, tag, data) in enumerate(gstate.repr()):
-                print('%12s %*s %-*s  %s' % (
+                # some special situations worth reporting
+                notes = []
+                # gcksum mismatch?
+                if (tag == TAG_GCKSUMDELTA
+                        and gstate.gcksum_ != gstate.gcksum__):
+                    notes.append('gcksum!=%08x' % gstate.gcksum_)
+
+                print('%s%12s %*s %-*s  %s%s%s' % (
+                        '\x1b[31m' if color and notes else '',
                         'gstate:' if i == 0 else '',
                         2*w_width+1, 'g' if i == 0 else '',
                         21+w_width, repr_,
                         next(xxd(data, 8), '')
                             if not args.get('raw')
                                 and not args.get('no_truncate')
-                            else ''))
+                            else '',
+                        ' (%s)' % ', '.join(notes) if notes else '',
+                        '\x1b[m' if color and notes else ''))
 
                 # show on-disk encoding
                 if args.get('raw') or args.get('no_truncate'):
@@ -2029,8 +2104,8 @@ def main(disk, mroots=None, *,
                                         2*w_width+1, '',
                                         line))
 
-        # print dtree?
-        if dtree:
+        # print ftree?
+        if ftree:
             # only show mdir on change
             pmbid = None
             # recursively print directories
@@ -2091,7 +2166,7 @@ def main(disk, mroots=None, *,
                             if did_ not in grmed_dir_dids:
                                 notes.append('orphaned')
 
-                    # print human readable dtree entry
+                    # print human readable ftree entry
                     print('%s%12s %*s %-*s  %s%s%s' % (
                             '\x1b[31m' if color and not grmed and notes
                                 else '\x1b[90m'
diff --git a/scripts/dbgmtree.py b/scripts/dbgmtree.py
index a5a4f114..1f622a7f 100755
--- a/scripts/dbgmtree.py
+++ b/scripts/dbgmtree.py
@@ -48,10 +48,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----
 
 
 # some ways of block geometry representations
@@ -268,6 +269,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                 ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                 ' w%d' % w if w else '',
                 ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
     else:
         return '0x%04x%s%s' % (
                 tag,
@@ -280,7 +286,8 @@ TBranch = co.namedtuple('TBranch', 'a, b, d, c')
 
 # our core rbyd type
 class Rbyd:
-    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum):
+    def __init__(self, blocks, data, rev, eoff, trunk, weight, cksum,
+            gcksumdelta):
         if isinstance(blocks, int):
             blocks = (blocks,)
 
@@ -291,6 +298,7 @@ class Rbyd:
         self.trunk = trunk
         self.weight = weight
         self.cksum = cksum
+        self.gcksumdelta = gcksumdelta
 
     @property
     def block(self):
@@ -366,6 +374,8 @@ class Rbyd:
         weight = 0
         weight_ = 0
         weight__ = 0
+        gcksumdelta = None
+        gcksumdelta_ = None
         while j_ < len(data) and (not trunk or eoff <= trunk):
             # read next tag
             v, tag, w, size, d = fromtag(data[j_:])
@@ -381,6 +391,11 @@ class Rbyd:
             if not tag & TAG_ALT:
                 if (tag & 0xff00) != TAG_CKSUM:
                     cksum___ = crc32c(data[j_:j_+size], cksum___)
+
+                    # found a gcksumdelta?
+                    if (tag & 0xff00) == TAG_GCKSUMDELTA:
+                        gcksumdelta_ = (tag, w, j_-d, d, data[j_:j_+size])
+
                 # found a cksum?
                 else:
                     # check cksum
@@ -392,6 +407,8 @@ class Rbyd:
                     cksum_ = cksum__
                     trunk_ = trunk__
                     weight = weight_
+                    gcksumdelta = gcksumdelta_
+                    gcksumdelta_ = None
                     # update perturb bit
                     perturb = tag & TAG_P
                     # revert to data cksum and perturb
@@ -423,6 +440,7 @@ class Rbyd:
                                         0xfca42daf if perturb else 0)
                                 trunk_ = trunk__
                                 weight = weight_
+                                gcksumdelta = gcksumdelta_
                         trunk___ = 0
 
                 # update canonical checksum, xoring out any perturb state
@@ -433,9 +451,9 @@ class Rbyd:
 
         # cksum mismatch?
         if cksum is not None and cksum_ != cksum:
-            return cls(block, data, rev, 0, 0, 0, cksum_)
+            return cls(block, data, rev, 0, 0, 0, cksum_, gcksumdelta)
 
-        return cls(block, data, rev, eoff, trunk_, weight, cksum_)
+        return cls(block, data, rev, eoff, trunk_, weight, cksum_, gcksumdelta)
 
     def lookup(self, rid, tag):
         if not self:
diff --git a/scripts/dbgrbyd.py b/scripts/dbgrbyd.py
index f6320f64..4fdb2046 100755
--- a/scripts/dbgrbyd.py
+++ b/scripts/dbgrbyd.py
@@ -58,10 +58,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----
 
 
 # some ways of block geometry representations
@@ -256,6 +257,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                 ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                 ' w%d' % w if w else '',
                 ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
     else:
         return '0x%04x%s%s' % (
                 tag,
diff --git a/scripts/dbgtag.py b/scripts/dbgtag.py
index 98271da5..669e857d 100755
--- a/scripts/dbgtag.py
+++ b/scripts/dbgtag.py
@@ -46,10 +46,11 @@ TAG_B           = 0x0000
 TAG_R           = 0x2000
 TAG_LE          = 0x0000
 TAG_GT          = 0x1000
-TAG_CKSUM       = 0x3000    ## 0x3c0p  v-11 cccc ---- ---p
+TAG_CKSUM       = 0x3000    ## 0x300p  v-11 ---- ---- ---p
 TAG_P           = 0x0001
-TAG_NOTE        = 0x3100    #  0x3100  v-11 ---1 ---- ----
-TAG_ECKSUM      = 0x3200    #  0x3200  v-11 --1- ---- ----
+TAG_NOTE        = 0x3100    ## 0x3100  v-11 ---1 ---- ----
+TAG_ECKSUM      = 0x3200    ## 0x3200  v-11 --1- ---- ----
+TAG_GCKSUMDELTA = 0x3300    ## 0x3300  v-11 --11 ---- ----
 
 
 # some ways of block geometry representations
@@ -210,6 +211,11 @@ def tagrepr(tag, w=None, size=None, off=None):
                 ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
                 ' w%d' % w if w else '',
                 ' %s' % size if size is not None else '')
+    elif (tag & 0x7f00) == TAG_GCKSUMDELTA:
+        return 'gcksumdelta%s%s%s' % (
+                ' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
+                ' w%d' % w if w else '',
+                ' %s' % size if size is not None else '')
     else:
         return '0x%04x%s%s' % (
                 tag,
diff --git a/tests/test_ck.toml b/tests/test_ck.toml
index a3f7ab75..c0a7389c 100644
--- a/tests/test_ck.toml
+++ b/tests/test_ck.toml
@@ -2,6 +2,124 @@
 after = ['test_traversal', 'test_gc', 'test_mount']
 
 
+code = '''
+// naive crc32c
+static uint32_t test_ck_naive_crc32c(
+        uint32_t crc, const void *buffer, size_t size) {
+    const uint8_t *buffer_ = buffer;
+    crc ^= 0xffffffff;
+
+    for (size_t i = 0; i < size; i++) {
+        crc = crc ^ buffer_[i];
+        for (size_t j = 0; j < 8; j++) {
+            crc = (crc >> 1) ^ ((crc & 1) ? 0x82f63b78 : 0);
+        }
+    }
+
+    crc ^= 0xffffffff;
+    return crc;
+}
+
+// naive crc32c multiplication
+static uint32_t test_ck_naive_crc32c_mul(uint32_t a, uint32_t b) {
+    // pmul
+    uint64_t r = 0;
+    for (int i = 0; i < 32; i++) {
+        if (b & (1 << i)) {
+            r ^= (uint64_t)a << i;
+        }
+    }
+
+    // mod crc32c
+    for (int i = 0; i < 31; i++) {
+        r = (r >> 1) ^ ((r & 1) ? 0x82f63b78 : 0);
+    }
+
+    return (uint32_t)r;
+}
+'''
+
+
+# let's first check that our crc32c math probably works
+
+# try some random inputs and compare with a naive implementation
+[cases.test_ck_crc32c]
+defines.SIZE = [1, 2, 4, 8, 16, 32, 64]
+defines.SEED = 'range(10)'
+defines.N = 1000
+fuzz = 'SEED'
+code = '''
+    uint32_t prng = SEED;
+    for (lfs_size_t i = 0; i < N; i++) {
+        uint8_t buffer[SIZE];
+        for (lfs_size_t j = 0; j < SIZE; j++) {
+            buffer[j] = TEST_PRNG(&prng);
+        }
+
+        uint32_t a = test_ck_naive_crc32c(0, buffer, SIZE);
+        uint32_t b = lfs_crc32c(0, buffer, SIZE);
+        assert(a == b);
+    }
+'''
+
+# test incremental crc32cs
+[cases.test_ck_crc32c_incr]
+defines.SIZE = [1, 2, 4, 8, 16, 32, 64]
+defines.SEED = 'range(10)'
+defines.N = 1000
+fuzz = 'SEED'
+code = '''
+    uint32_t prng = SEED;
+    for (lfs_size_t i = 0; i < N; i++) {
+        uint8_t buffer[SIZE];
+        for (lfs_size_t j = 0; j < SIZE; j++) {
+            buffer[j] = TEST_PRNG(&prng);
+        }
+
+        uint32_t a = lfs_crc32c(0, buffer, SIZE);
+        uint32_t b = 0;
+        for (lfs_size_t j = 0; j < SIZE; j++) {
+            b = lfs_crc32c(b, &buffer[j], 1);
+        }
+        assert(a == b);
+    }
+'''
+
+# try some random inputs and compare with a naive implementation
+[cases.test_ck_crc32c_mul]
+defines.SEED = 'range(10)'
+defines.N = 1000
+fuzz = 'SEED'
+code = '''
+    uint32_t prng = SEED;
+    for (lfs_size_t i = 0; i < N; i++) {
+        uint32_t x = TEST_PRNG(&prng);
+        uint32_t y = TEST_PRNG(&prng);
+
+        uint32_t a = test_ck_naive_crc32c_mul(x, y);
+        uint32_t b = lfs_crc32c_mul(x, y);
+        assert(a == b);
+    }
+'''
+
+# test that multiplication is distributive
+[cases.test_ck_crc32c_mul_dist]
+defines.SEED = 'range(10)'
+defines.N = 1000
+fuzz = 'SEED'
+code = '''
+    uint32_t prng = SEED;
+    for (lfs_size_t i = 0; i < N; i++) {
+        uint32_t x = TEST_PRNG(&prng);
+        uint32_t y = TEST_PRNG(&prng);
+        uint32_t z = TEST_PRNG(&prng);
+
+        uint32_t a = lfs_crc32c_mul(x, y ^ z);
+        uint32_t b = lfs_crc32c_mul(x, y) ^ lfs_crc32c_mul(x, z);
+        assert(a == b);
+    }
+'''
+
 
 # Test filesystem-level checksum things