Made rbyd cksums erased-state agnostic

Long story short, rbyd checksums are now fully reproducible. If you
write the same set of tags to any block, you will end up with the same
checksum.

This is actually a bit tricky with littlefs's constraints.

---

The main problem boils down to erased-state. littlefs has a fairly
flexible model for erased-state, and this brings some challenges. In
littlefs, storage goes through 2 states:

1. Erase - Prepare storage for progging. Reads after an erase may return
   arbitrary, but consistent, values.

2. Prog - Program storage with data. Storage must be erased and no progs
   attempted. Reads after a prog must return the new data.

Note in this model erased-state may not be all 0xffs, though it likely
will be for flash. This allows littlefs to support a wide range of
other storage devices: SD, RAM, NVRAM, encryption, ECC, etc.

But this model also means erased-state may be different from block to
block, and even different on later erases of the same block.

And if that wasn't enough of a challenge, _erased-state can contain
perfectly valid commits_. Usually you can expect arbitrary valid cksums
to be rare, but thanks to SD, RAM, etc, modeling erase as a noop, valid
cksums in erased-state is actually very common.

So how do we manage erased-state in our rbyds?

First we need some way to detect it, since we can't prog if we're not
erased. This is accomplished by the forward-looking erased-state cksum
(ecksum):

  .---+---+---+---.     \
  |     commit    |     |
  |               |     |
  |               |     |
  +---+---+---+---+     +-.
  |     ecksum -------. | | <-- ecksum - cksum of erased state
  +---+---+---+---+   | / |
  |     cksum --------|---' <-- cksum - cksum of commit,
  +---+---+---+---+   |                 including ecksum
  |    padding    |   |
  |               |   |
  +---+---+---+---+ \ |
  |     erased    | +-'
  |               | /
  .               .
  .               .

You may have already noticed the start of our problems. The ecksum
contains the erased-state, which is different per-block, and our rbyd
cksum contains the ecksum. We need to include the ecksum so we know if
it's valid, but this means our rbyd cksum changes block to block.

Solving this is simple enough: Stop the rbyd's canonical cksum before
the ecksum, but include the ecksum in the actual cksum we write to disk.

Future commits will need to start from the canonical cksum, so the old
ecksum won't be included in new commits, but this shouldn't be a
problem:

  .---+---+---+---. . . \ . \ . . . . .---+---+---+---.     \   \
  |     commit    |     |   |         |     commit    |     |   |
  |               |     |   +- rbyd   |               |     |   |
  |               |     |   |  cksum  |               |     |   |
  +---+---+---+---+     +-. /         +---+---+---+---+     |   |
  |     ecksum -------. | |           |     ecksum    |     .   .
  +---+---+---+---+   | / |           +---+---+---+---+     .   .
  |     cksum --------|---'           |     cksum     |     .   .
  +---+---+---+---+   |               +---+---+---+---+     .   .
  |    padding    |   |               |    padding    |     .   .
  |               |   |               |               |     .   .
  +---+---+---+---+ \ | . . . . . . . +---+---+---+---+     |   |
  |     erased    | +-'               |     commit    |     |   |
  |               | /                 |               |     |   +- rbyd
  .               .                   |               |     |   |  cksum
  .               .                   +---+---+---+---+     +-. /
                                      |     ecksum -------. | |
                                      +---+---+---+---+   | / |
                                      |     cksum ------------'
                                      +---+---+---+---+   |
                                      |    padding    |   |
                                      |               |   |
                                      +---+---+---+---+ \ |
                                      |     erased    | +-'
                                      |               | /
                                      .               .
                                      .               .

The second challenge is the pesky possibility of existing valid commits.
We need some way to ensure that erased-state following a commit does not
accidentally contain a valid old commit.

This is where are tag's valid bits come into play: The valid bit of each
tag must match the parity of all preceding tags (equivalent to the
parity of the crc32c), and we can use some perturb bits in the cksum tag
to make sure any tags in our erased-state do _not_ match:

  .---+---+---+---. \ . . . . . .---+---+---+---. \   \   \
  |v|    tag      | |           |v|    tag      | |   |   |
  +---+---+---+---+ |           +---+---+---+---+ |   |   |
  |     commit    | |           |     commit    | |   |   |
  |               | |           |               | |   |   |
  +---+---+---+---+ +-----.     +---+---+---+---+ +-. |   |
  |v|p|  tag      | |     |     |v|p|  tag      | | | |   |
  +---+---+---+---+ /     |     +---+---+---+---+ / | |   |
  |     cksum     |       |     |     cksum     |   | .   .
  +---+---+---+---+       |     +---+---+---+---+   | .   .
  |    padding    |       |     |    padding    |   | .   .
  |               |       |     |               |   | .   .
  +---+---+---+---+ . . . | . . +---+---+---+---+   | |   |
  |v---------------- != --'     |v------------------' |   |
  |     erased    |             +---+---+---+---+     |   |
  .               .             |     commit    |     |   |
  .               .             |               |     |   |
                                +---+---+---+---+     +-. +-.
                                |v|p|  tag      |     | | | |
                                +---+---+---+---+     / | / |
                                |     cksum ----------------'
                                +---+---+---+---+       |
                                |    padding    |       |
                                |               |       |
                                +---+---+---+---+       |
                                |v---------------- != --'
                                |     erased    |
                                .               .
                                .               .

New problem! The rbyd cksum contains the valid bits, which contain the
perturb bits, which depends on the erased-state!

And you can't just derive the valid bits from the rbyd's canonical
cksum. This avoids erased-state poisoning, sure, but then nothing in the
new commit depends on the perturb bits! The catch-22 here is that we
need the valid bits to both depend on, and ignore, the erased-state
poisoned perturb bits.

As far as I can tell, the only way around this is to make the rybd's
canonical cksum not include the parity bits. Which is annoying, masking
out bits is not great for bulk cksum calculation...

But this does solve our problem:

  .---+---+---+---. \ . . . . . .---+---+---+---. \   \   \   \
  |v|    tag      | |           |v|    tag      | |   |   o   o
  +---+---+---+---+ |           +---+---+---+---+ |   |   |   |
  |     commit    | |           |     commit    | |   |   |   |
  |               | |           |               | |   |   |   |
  +---+---+---+---+ +-----.     +---+---+---+---+ +-. |   |   |
  |v|p|  tag      | |     |     |v|p|  tag      | | | |   .   .
  +---+---+---+---+ /     |     +---+---+---+---+ / | |   .   .
  |     cksum     |       |     |     cksum     |   | .   .   .
  +---+---+---+---+       |     +---+---+---+---+   | .   .   .
  |    padding    |       |     |    padding    |   | .   .   .
  |               |       |     |               |   | .   .   .
  +---+---+---+---+ . . . | . . +---+---+---+---+   | |   |   |
  |v---------------- != --'     |v------------------' |   o   o
  |     erased    |             +---+---+---+---+     |   |   |
  .               .             |     commit    |     |   |   +- rbyd
  .               .             |               |     |   |   |  cksum
                                +---+---+---+---+     +-. +-. /
                                |v|p|  tag      |     | | o |
                                +---+---+---+---+     / | / |
                                |     cksum ----------------'
                                +---+---+---+---+       |
                                |    padding    |       |
                                |               |       |
                                +---+---+---+---+       |
                                |v---------------- != --'
                                |     erased    |
                                .               .
                                .               .

Note that because each commit's cksum derives from the canonical cksum,
the valid bits and commit cksums no longer contain the same data, so our
parity(m) = parity(crc32c(m)) trick no longer works.

However our crc32c still does tell us a bit about each tag's parity, so
with a couple well-placed xors we can at least avoid needing two
parallel calculations:

  cksum' = crc32c(cksum, m)
  valid' = parity(cksum' xor cksum) xor valid

This also means our commit cksums don't include any information about
the valid bits, since we mask these out before cksum calculation. Which
is a bit concerning, but as far as I can tell not a real problem.

---

An alternative design would be to just keep track of two cksums: A
commit cksum and a canonical cksum.

This would be much simpler, but would also require storing two cksums in
RAM in our lfsr_rbyd_t struct. A bit annoying for our 4-byte crc32cs,
and a bit more than a bit annoying for hypothetical 32-byte sha256s.

It's also not entirely clear how you would update both crc32cs
efficiently. There is a way to xor out the initial state before each
tag, but I think it would still require O(n) cycles of crc32c
calculation...

As it is, the extra bit needed to keep track of commit parity is easy
enough to sneak into some unused sign bits in our lfsr_rbyd_t struct.

---

I've also gone ahead and mixed in the current commit parity into our
cksum's perturb bits, so the commit cksum at least contains _some_
information about the previous parity.

But it's not entirely clear this actually adds anything. Our perturb
bits aren't _required_ to reflect the commit parity, so a very unlucky
power-loss could in theory still make a cksum valid for the wrong
parity.

At least this situation will be caught by later valid bits...

I've also carved out a tag encoding, LFSR_TAG_PERTURB, solely for adding
more perturb bits to commit cksums:

  LFSR_TAG_CKSUM          0x3cpp  v-11 cccc -ppp pppp

  LFSR_TAG_CKSUM          0x30pp  v-11 ---- -ppp pppp
  LFSR_TAG_PERTURB        0x3100  v-11 ---1 ---- ----
  LFSR_TAG_ECKSUM         0x3200  v-11 --1- ---- ----
  LFSR_TAG_GCKSUMDELTA+   0x3300  v-11 --11 ---- ----

  + Planned

This allows for more than 7 perturb bits, and could even mix in the
entire previous commit cksum, if we ever think that is worth the RAM
tradeoff.

LFSR_TAG_PERTURB also has the advantage that it is validated by the
cksum tag's valid bit before being included in the commit cksum, which
indirectly includes the current commit parity. We may eventually want to
use this instead of the cksum tag's perturb bits for this reason, but
right now I'm not sure this tiny bit of extra safety is worth the
minimum 5-byte per commit overhead...

Note if you want perturb bits that are also included in the rbyd's
canonical cksum, you can just use an LFSR_TAG_SHRUBDATA tag. Or any
unreferenced shrub tag really.

---

All of these changes required a decent amount of code, I think mostly
just to keep track of the parity bit. But the isolation of rbyd cksums
from erased-state is necessary for several future-planned features:

           code          stack
  before: 33564           2816
  after:  33916 (+1.0%)   2824 (+0.3%)
This commit is contained in:
Christopher Haster
2024-04-30 12:38:17 -05:00
parent c4fcc78814
commit 8a75a68d8b
9 changed files with 664 additions and 451 deletions

View File

@@ -41,7 +41,8 @@ TAG_UATTR = 0x0400
TAG_SATTR = 0x0600
TAG_SHRUB = 0x1000
TAG_CKSUM = 0x3000
TAG_ECKSUM = 0x3100
TAG_PERTURB = 0x3100
TAG_ECKSUM = 0x3200
TAG_ALT = 0x4000
TAG_R = 0x2000
TAG_GT = 0x1000
@@ -133,6 +134,9 @@ def crc32c(data, crc=0):
def popc(x):
return bin(x).count('1')
def parity(x):
return popc(x) & 1
def fromle32(data):
return struct.unpack('<I', data[0:4].ljust(4, b'\0'))[0]
@@ -582,13 +586,14 @@ class Bmap:
# our core rbyd type
class Rbyd:
def __init__(self, block, data, rev, eoff, trunk, weight):
def __init__(self, block, data, rev, eoff, trunk, weight, cksum):
self.block = block
self.data = data
self.rev = rev
self.eoff = eoff
self.trunk = trunk
self.weight = weight
self.cksum = cksum
self.redund_blocks = []
@property
@@ -643,7 +648,10 @@ class Rbyd:
rev = fromle32(data[0:4])
cksum = 0
cksum_ = crc32c(data[0:4])
cksum__ = cksum_
parity__ = parity(cksum_)
eoff = 0
eoff_ = None
j_ = 4
trunk_ = 0
trunk__ = 0
@@ -651,13 +659,14 @@ class Rbyd:
weight = 0
weight_ = 0
weight__ = 0
wastrunk = False
trunkeoff = None
while j_ < len(data) and (not trunk or eoff <= trunk):
v, tag, w, size, d = fromtag(data[j_:])
if v != (popc(cksum_) & 1):
if v != parity__:
break
cksum_ = crc32c(data[j_:j_+d], cksum_)
parity__ ^= parity(cksum__)
cksum__ = crc32c([data[j_] & ~0x80], cksum__)
cksum__ = crc32c(data[j_+1:j_+d], cksum__)
parity__ ^= parity(cksum__)
j_ += d
if not tag & TAG_ALT and j_ + size > len(data):
break
@@ -665,24 +674,27 @@ class Rbyd:
# take care of cksums
if not tag & TAG_ALT:
if (tag & 0xff00) != TAG_CKSUM:
cksum_ = crc32c(data[j_:j_+size], cksum_)
parity__ ^= parity(cksum__)
cksum__ = crc32c(data[j_:j_+size], cksum__)
parity__ ^= parity(cksum__)
# found a cksum?
else:
cksum__ = fromle32(data[j_:j_+4])
if cksum_ != cksum__:
cksum___ = fromle32(data[j_:j_+4])
if cksum__ != cksum___:
break
# commit what we have
eoff = trunkeoff if trunkeoff else j_ + size
eoff = eoff_ if eoff_ else j_ + size
cksum = cksum_
trunk_ = trunk__
weight = weight_
# revert to data cksum
cksum__ = cksum_
# evaluate trunks
if (tag & 0xf000) != TAG_CKSUM and (
not trunk or trunk >= j_-d or wastrunk):
not trunk or j_-d <= trunk or trunk___):
# new trunk?
if not wastrunk:
wastrunk = True
if not trunk___:
trunk___ = j_-d
weight__ = 0
@@ -691,24 +703,26 @@ class Rbyd:
# end of trunk?
if not tag & TAG_ALT:
wastrunk = False
# update data checksum
cksum_ = cksum__
# update trunk/weight unless we found a shrub or an
# explicit trunk (which may be a shrub) is requested
if not tag & TAG_SHRUB or trunk:
if not tag & TAG_SHRUB or trunk___ == trunk:
trunk__ = trunk___
weight_ = weight__
# keep track of eoff for best matching trunk
if trunk and j_ + size > trunk:
trunkeoff = j_ + size
eoff = trunkeoff
eoff_ = j_ + size
eoff = eoff_
cksum = cksum_
trunk_ = trunk__
weight = weight_
trunk___ = 0
if not tag & TAG_ALT:
j_ += size
return cls(block, data, rev, eoff, trunk_, weight)
return cls(block, data, rev, eoff, trunk_, weight, cksum)
def lookup(self, rid, tag):
if not self: