forked from Imagelibrary/littlefs
Long story short, rbyd checksums are now fully reproducible. If you
write the same set of tags to any block, you will end up with the same
checksum.
This is actually a bit tricky with littlefs's constraints.
---
The main problem boils down to erased-state. littlefs has a fairly
flexible model for erased-state, and this brings some challenges. In
littlefs, storage goes through 2 states:
1. Erase - Prepare storage for progging. Reads after an erase may return
arbitrary, but consistent, values.
2. Prog - Program storage with data. Storage must be erased and no progs
attempted. Reads after a prog must return the new data.
Note in this model erased-state may not be all 0xffs, though it likely
will be for flash. This allows littlefs to support a wide range of
other storage devices: SD, RAM, NVRAM, encryption, ECC, etc.
But this model also means erased-state may be different from block to
block, and even different on later erases of the same block.
And if that wasn't enough of a challenge, _erased-state can contain
perfectly valid commits_. Usually you can expect arbitrary valid cksums
to be rare, but thanks to SD, RAM, etc, modeling erase as a noop, valid
cksums in erased-state is actually very common.
So how do we manage erased-state in our rbyds?
First we need some way to detect it, since we can't prog if we're not
erased. This is accomplished by the forward-looking erased-state cksum
(ecksum):
.---+---+---+---. \
| commit | |
| | |
| | |
+---+---+---+---+ +-.
| ecksum -------. | | <-- ecksum - cksum of erased state
+---+---+---+---+ | / |
| cksum --------|---' <-- cksum - cksum of commit,
+---+---+---+---+ | including ecksum
| padding | |
| | |
+---+---+---+---+ \ |
| erased | +-'
| | /
. .
. .
You may have already noticed the start of our problems. The ecksum
contains the erased-state, which is different per-block, and our rbyd
cksum contains the ecksum. We need to include the ecksum so we know if
it's valid, but this means our rbyd cksum changes block to block.
Solving this is simple enough: Stop the rbyd's canonical cksum before
the ecksum, but include the ecksum in the actual cksum we write to disk.
Future commits will need to start from the canonical cksum, so the old
ecksum won't be included in new commits, but this shouldn't be a
problem:
.---+---+---+---. . . \ . \ . . . . .---+---+---+---. \ \
| commit | | | | commit | | |
| | | +- rbyd | | | |
| | | | cksum | | | |
+---+---+---+---+ +-. / +---+---+---+---+ | |
| ecksum -------. | | | ecksum | . .
+---+---+---+---+ | / | +---+---+---+---+ . .
| cksum --------|---' | cksum | . .
+---+---+---+---+ | +---+---+---+---+ . .
| padding | | | padding | . .
| | | | | . .
+---+---+---+---+ \ | . . . . . . . +---+---+---+---+ | |
| erased | +-' | commit | | |
| | / | | | +- rbyd
. . | | | | cksum
. . +---+---+---+---+ +-. /
| ecksum -------. | |
+---+---+---+---+ | / |
| cksum ------------'
+---+---+---+---+ |
| padding | |
| | |
+---+---+---+---+ \ |
| erased | +-'
| | /
. .
. .
The second challenge is the pesky possibility of existing valid commits.
We need some way to ensure that erased-state following a commit does not
accidentally contain a valid old commit.
This is where are tag's valid bits come into play: The valid bit of each
tag must match the parity of all preceding tags (equivalent to the
parity of the crc32c), and we can use some perturb bits in the cksum tag
to make sure any tags in our erased-state do _not_ match:
.---+---+---+---. \ . . . . . .---+---+---+---. \ \ \
|v| tag | | |v| tag | | | |
+---+---+---+---+ | +---+---+---+---+ | | |
| commit | | | commit | | | |
| | | | | | | |
+---+---+---+---+ +-----. +---+---+---+---+ +-. | |
|v|p| tag | | | |v|p| tag | | | | |
+---+---+---+---+ / | +---+---+---+---+ / | | |
| cksum | | | cksum | | . .
+---+---+---+---+ | +---+---+---+---+ | . .
| padding | | | padding | | . .
| | | | | | . .
+---+---+---+---+ . . . | . . +---+---+---+---+ | | |
|v---------------- != --' |v------------------' | |
| erased | +---+---+---+---+ | |
. . | commit | | |
. . | | | |
+---+---+---+---+ +-. +-.
|v|p| tag | | | | |
+---+---+---+---+ / | / |
| cksum ----------------'
+---+---+---+---+ |
| padding | |
| | |
+---+---+---+---+ |
|v---------------- != --'
| erased |
. .
. .
New problem! The rbyd cksum contains the valid bits, which contain the
perturb bits, which depends on the erased-state!
And you can't just derive the valid bits from the rbyd's canonical
cksum. This avoids erased-state poisoning, sure, but then nothing in the
new commit depends on the perturb bits! The catch-22 here is that we
need the valid bits to both depend on, and ignore, the erased-state
poisoned perturb bits.
As far as I can tell, the only way around this is to make the rybd's
canonical cksum not include the parity bits. Which is annoying, masking
out bits is not great for bulk cksum calculation...
But this does solve our problem:
.---+---+---+---. \ . . . . . .---+---+---+---. \ \ \ \
|v| tag | | |v| tag | | | o o
+---+---+---+---+ | +---+---+---+---+ | | | |
| commit | | | commit | | | | |
| | | | | | | | |
+---+---+---+---+ +-----. +---+---+---+---+ +-. | | |
|v|p| tag | | | |v|p| tag | | | | . .
+---+---+---+---+ / | +---+---+---+---+ / | | . .
| cksum | | | cksum | | . . .
+---+---+---+---+ | +---+---+---+---+ | . . .
| padding | | | padding | | . . .
| | | | | | . . .
+---+---+---+---+ . . . | . . +---+---+---+---+ | | | |
|v---------------- != --' |v------------------' | o o
| erased | +---+---+---+---+ | | |
. . | commit | | | +- rbyd
. . | | | | | cksum
+---+---+---+---+ +-. +-. /
|v|p| tag | | | o |
+---+---+---+---+ / | / |
| cksum ----------------'
+---+---+---+---+ |
| padding | |
| | |
+---+---+---+---+ |
|v---------------- != --'
| erased |
. .
. .
Note that because each commit's cksum derives from the canonical cksum,
the valid bits and commit cksums no longer contain the same data, so our
parity(m) = parity(crc32c(m)) trick no longer works.
However our crc32c still does tell us a bit about each tag's parity, so
with a couple well-placed xors we can at least avoid needing two
parallel calculations:
cksum' = crc32c(cksum, m)
valid' = parity(cksum' xor cksum) xor valid
This also means our commit cksums don't include any information about
the valid bits, since we mask these out before cksum calculation. Which
is a bit concerning, but as far as I can tell not a real problem.
---
An alternative design would be to just keep track of two cksums: A
commit cksum and a canonical cksum.
This would be much simpler, but would also require storing two cksums in
RAM in our lfsr_rbyd_t struct. A bit annoying for our 4-byte crc32cs,
and a bit more than a bit annoying for hypothetical 32-byte sha256s.
It's also not entirely clear how you would update both crc32cs
efficiently. There is a way to xor out the initial state before each
tag, but I think it would still require O(n) cycles of crc32c
calculation...
As it is, the extra bit needed to keep track of commit parity is easy
enough to sneak into some unused sign bits in our lfsr_rbyd_t struct.
---
I've also gone ahead and mixed in the current commit parity into our
cksum's perturb bits, so the commit cksum at least contains _some_
information about the previous parity.
But it's not entirely clear this actually adds anything. Our perturb
bits aren't _required_ to reflect the commit parity, so a very unlucky
power-loss could in theory still make a cksum valid for the wrong
parity.
At least this situation will be caught by later valid bits...
I've also carved out a tag encoding, LFSR_TAG_PERTURB, solely for adding
more perturb bits to commit cksums:
LFSR_TAG_CKSUM 0x3cpp v-11 cccc -ppp pppp
LFSR_TAG_CKSUM 0x30pp v-11 ---- -ppp pppp
LFSR_TAG_PERTURB 0x3100 v-11 ---1 ---- ----
LFSR_TAG_ECKSUM 0x3200 v-11 --1- ---- ----
LFSR_TAG_GCKSUMDELTA+ 0x3300 v-11 --11 ---- ----
+ Planned
This allows for more than 7 perturb bits, and could even mix in the
entire previous commit cksum, if we ever think that is worth the RAM
tradeoff.
LFSR_TAG_PERTURB also has the advantage that it is validated by the
cksum tag's valid bit before being included in the commit cksum, which
indirectly includes the current commit parity. We may eventually want to
use this instead of the cksum tag's perturb bits for this reason, but
right now I'm not sure this tiny bit of extra safety is worth the
minimum 5-byte per commit overhead...
Note if you want perturb bits that are also included in the rbyd's
canonical cksum, you can just use an LFSR_TAG_SHRUBDATA tag. Or any
unreferenced shrub tag really.
---
All of these changes required a decent amount of code, I think mostly
just to keep track of the parity bit. But the isolation of rbyd cksums
from erased-state is necessary for several future-planned features:
code stack
before: 33564 2816
after: 33916 (+1.0%) 2824 (+0.3%)
319 lines
10 KiB
Python
Executable File
319 lines
10 KiB
Python
Executable File
#!/usr/bin/env python3
|
|
|
|
import io
|
|
import os
|
|
import struct
|
|
import sys
|
|
|
|
|
|
TAG_NULL = 0x0000
|
|
TAG_CONFIG = 0x0000
|
|
TAG_MAGIC = 0x0003
|
|
TAG_VERSION = 0x0004
|
|
TAG_RCOMPAT = 0x0005
|
|
TAG_WCOMPAT = 0x0006
|
|
TAG_OCOMPAT = 0x0007
|
|
TAG_GEOMETRY = 0x0009
|
|
TAG_NAMELIMIT = 0x000c
|
|
TAG_SIZELIMIT = 0x000d
|
|
TAG_GDELTA = 0x0100
|
|
TAG_GRMDELTA = 0x0100
|
|
TAG_NAME = 0x0200
|
|
TAG_REG = 0x0201
|
|
TAG_DIR = 0x0202
|
|
TAG_BOOKMARK = 0x0204
|
|
TAG_ORPHAN = 0x0205
|
|
TAG_STRUCT = 0x0300
|
|
TAG_DATA = 0x0300
|
|
TAG_BLOCK = 0x0304
|
|
TAG_BSHRUB = 0x0308
|
|
TAG_BTREE = 0x030c
|
|
TAG_MROOT = 0x0311
|
|
TAG_MDIR = 0x0315
|
|
TAG_MTREE = 0x031c
|
|
TAG_DID = 0x0320
|
|
TAG_BRANCH = 0x032c
|
|
TAG_UATTR = 0x0400
|
|
TAG_SATTR = 0x0600
|
|
TAG_SHRUB = 0x1000
|
|
TAG_CKSUM = 0x3000
|
|
TAG_PERTURB = 0x3100
|
|
TAG_ECKSUM = 0x3200
|
|
TAG_ALT = 0x4000
|
|
TAG_R = 0x2000
|
|
TAG_GT = 0x1000
|
|
|
|
|
|
# some ways of block geometry representations
|
|
# 512 -> 512
|
|
# 512x16 -> (512, 16)
|
|
# 0x200x10 -> (512, 16)
|
|
def bdgeom(s):
|
|
s = s.strip()
|
|
b = 10
|
|
if s.startswith('0x') or s.startswith('0X'):
|
|
s = s[2:]
|
|
b = 16
|
|
elif s.startswith('0o') or s.startswith('0O'):
|
|
s = s[2:]
|
|
b = 8
|
|
elif s.startswith('0b') or s.startswith('0B'):
|
|
s = s[2:]
|
|
b = 2
|
|
|
|
if 'x' in s:
|
|
s, s_ = s.split('x', 1)
|
|
return (int(s, b), int(s_, b))
|
|
else:
|
|
return int(s, b)
|
|
|
|
# parse some rbyd addr encodings
|
|
# 0xa -> [0xa]
|
|
# 0xa.c -> [(0xa, 0xc)]
|
|
# 0x{a,b} -> [0xa, 0xb]
|
|
# 0x{a,b}.c -> [(0xa, 0xc), (0xb, 0xc)]
|
|
def rbydaddr(s):
|
|
s = s.strip()
|
|
b = 10
|
|
if s.startswith('0x') or s.startswith('0X'):
|
|
s = s[2:]
|
|
b = 16
|
|
elif s.startswith('0o') or s.startswith('0O'):
|
|
s = s[2:]
|
|
b = 8
|
|
elif s.startswith('0b') or s.startswith('0B'):
|
|
s = s[2:]
|
|
b = 2
|
|
|
|
trunk = None
|
|
if '.' in s:
|
|
s, s_ = s.split('.', 1)
|
|
trunk = int(s_, b)
|
|
|
|
if s.startswith('{') and '}' in s:
|
|
ss = s[1:s.find('}')].split(',')
|
|
else:
|
|
ss = [s]
|
|
|
|
addr = []
|
|
for s in ss:
|
|
if trunk is not None:
|
|
addr.append((int(s, b), trunk))
|
|
else:
|
|
addr.append(int(s, b))
|
|
|
|
return addr
|
|
|
|
def fromleb128(data):
|
|
word = 0
|
|
for i, b in enumerate(data):
|
|
word |= ((b & 0x7f) << 7*i)
|
|
word &= 0xffffffff
|
|
if not b & 0x80:
|
|
return word, i+1
|
|
return word, len(data)
|
|
|
|
def tagrepr(tag, w=None, size=None, off=None):
|
|
if (tag & 0x6fff) == TAG_NULL:
|
|
return '%snull%s%s' % (
|
|
'shrub' if tag & TAG_SHRUB else '',
|
|
' w%d' % w if w else '',
|
|
' %d' % size if size else '')
|
|
elif (tag & 0x6f00) == TAG_CONFIG:
|
|
return '%s%s%s%s' % (
|
|
'shrub' if tag & TAG_SHRUB else '',
|
|
'magic' if (tag & 0xfff) == TAG_MAGIC
|
|
else 'version' if (tag & 0xfff) == TAG_VERSION
|
|
else 'rcompat' if (tag & 0xfff) == TAG_RCOMPAT
|
|
else 'wcompat' if (tag & 0xfff) == TAG_WCOMPAT
|
|
else 'ocompat' if (tag & 0xfff) == TAG_OCOMPAT
|
|
else 'geometry' if (tag & 0xfff) == TAG_GEOMETRY
|
|
else 'sizelimit' if (tag & 0xfff) == TAG_SIZELIMIT
|
|
else 'namelimit' if (tag & 0xfff) == TAG_NAMELIMIT
|
|
else 'config 0x%02x' % (tag & 0xff),
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif (tag & 0x6f00) == TAG_GDELTA:
|
|
return '%s%s%s%s' % (
|
|
'shrub' if tag & TAG_SHRUB else '',
|
|
'grmdelta' if (tag & 0xfff) == TAG_GRMDELTA
|
|
else 'gdelta 0x%02x' % (tag & 0xff),
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif (tag & 0x6f00) == TAG_NAME:
|
|
return '%s%s%s%s' % (
|
|
'shrub' if tag & TAG_SHRUB else '',
|
|
'name' if (tag & 0xfff) == TAG_NAME
|
|
else 'reg' if (tag & 0xfff) == TAG_REG
|
|
else 'dir' if (tag & 0xfff) == TAG_DIR
|
|
else 'orphan' if (tag & 0xfff) == TAG_ORPHAN
|
|
else 'bookmark' if (tag & 0xfff) == TAG_BOOKMARK
|
|
else 'name 0x%02x' % (tag & 0xff),
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif (tag & 0x6f00) == TAG_STRUCT:
|
|
return '%s%s%s%s' % (
|
|
'shrub' if tag & TAG_SHRUB else '',
|
|
'data' if (tag & 0xfff) == TAG_DATA
|
|
else 'block' if (tag & 0xfff) == TAG_BLOCK
|
|
else 'bshrub' if (tag & 0xfff) == TAG_BSHRUB
|
|
else 'btree' if (tag & 0xfff) == TAG_BTREE
|
|
else 'mroot' if (tag & 0xfff) == TAG_MROOT
|
|
else 'mdir' if (tag & 0xfff) == TAG_MDIR
|
|
else 'mtree' if (tag & 0xfff) == TAG_MTREE
|
|
else 'did' if (tag & 0xfff) == TAG_DID
|
|
else 'branch' if (tag & 0xfff) == TAG_BRANCH
|
|
else 'struct 0x%02x' % (tag & 0xff),
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif (tag & 0x6e00) == TAG_UATTR:
|
|
return '%suattr 0x%02x%s%s' % (
|
|
'shrub' if tag & TAG_SHRUB else '',
|
|
((tag & 0x100) >> 1) | (tag & 0xff),
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif (tag & 0x6e00) == TAG_SATTR:
|
|
return '%ssattr 0x%02x%s%s' % (
|
|
'shrub' if tag & TAG_SHRUB else '',
|
|
((tag & 0x100) >> 1) | (tag & 0xff),
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif (tag & 0x7f00) == TAG_CKSUM:
|
|
return 'cksum 0x%02x%s%s' % (
|
|
tag & 0xff,
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif (tag & 0x7f00) == TAG_PERTURB:
|
|
return 'perturb%s%s%s' % (
|
|
' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif (tag & 0x7f00) == TAG_ECKSUM:
|
|
return 'ecksum%s%s%s' % (
|
|
' 0x%02x' % (tag & 0xff) if tag & 0xff else '',
|
|
' w%d' % w if w else '',
|
|
' %s' % size if size is not None else '')
|
|
elif tag & TAG_ALT:
|
|
return 'alt%s%s%s%s%s' % (
|
|
'r' if tag & TAG_R else 'b',
|
|
'a' if tag & 0x0fff == 0 and tag & TAG_GT
|
|
else 'n' if tag & 0x0fff == 0
|
|
else 'gt' if tag & TAG_GT
|
|
else 'le',
|
|
' 0x%x' % (tag & 0x0fff) if tag & 0x0fff != 0 else '',
|
|
' w%d' % w if w is not None else '',
|
|
' 0x%x' % (0xffffffff & (off-size))
|
|
if size and off is not None
|
|
else ' -%d' % size if size
|
|
else '')
|
|
else:
|
|
return '0x%04x%s%s' % (
|
|
tag,
|
|
' w%d' % w if w is not None else '',
|
|
' %d' % size if size is not None else '')
|
|
|
|
|
|
def dbg_tag(data):
|
|
if isinstance(data, int):
|
|
tag = data
|
|
weight = None
|
|
size = None
|
|
else:
|
|
data = data.ljust(2, b'\0')
|
|
tag = (data[0] << 8) | data[1]
|
|
weight, d = fromleb128(data[2:]) if 2 < len(data) else (None, 2)
|
|
size, d_ = fromleb128(data[2+d:]) if d < len(data) else (None, d)
|
|
|
|
print(tagrepr(tag, weight, size))
|
|
|
|
def main(tags, *,
|
|
block_size=None,
|
|
block_count=None,
|
|
off=None,
|
|
**args):
|
|
# interpret as a sequence of hex bytes
|
|
if args.get('hex'):
|
|
dbg_tag(bytes(int(tag, 16) for tag in tags))
|
|
|
|
# interpret as strings
|
|
elif args.get('string'):
|
|
for tag in tags:
|
|
dbg_tag(tag.encode('utf8'))
|
|
|
|
# default to interpreting as ints
|
|
elif block_size is None and off is None:
|
|
for tag in tags:
|
|
dbg_tag(int(tag, 0))
|
|
|
|
# if either block_size or off provided interpret as a block device
|
|
else:
|
|
disk, *blocks = tags
|
|
blocks = [rbydaddr(block) for block in blocks]
|
|
|
|
# is bd geometry specified?
|
|
if isinstance(block_size, tuple):
|
|
block_size, block_count_ = block_size
|
|
if block_count is None:
|
|
block_count = block_count_
|
|
|
|
# flatten block, default to block 0
|
|
if not blocks:
|
|
blocks = [[0]]
|
|
blocks = [block for blocks_ in blocks for block in blocks_]
|
|
|
|
with open(disk, 'rb') as f:
|
|
# if block_size is omitted, assume the block device is one big block
|
|
if block_size is None:
|
|
f.seek(0, os.SEEK_END)
|
|
block_size = f.tell()
|
|
|
|
# blocks may also encode offsets
|
|
blocks, offs = (
|
|
[block[0] if isinstance(block, tuple) else block
|
|
for block in blocks],
|
|
[off if off is not None
|
|
else block[1] if isinstance(block, tuple)
|
|
else None
|
|
for block in blocks])
|
|
|
|
# read each tag
|
|
for block, off in zip(blocks, offs):
|
|
f.seek((block * block_size) + (off or 0))
|
|
# read maximum tag size
|
|
data = f.read(2+5+5)
|
|
dbg_tag(data)
|
|
|
|
if __name__ == "__main__":
|
|
import argparse
|
|
import sys
|
|
parser = argparse.ArgumentParser(
|
|
description="Decode littlefs tags.",
|
|
allow_abbrev=False)
|
|
parser.add_argument(
|
|
'tags',
|
|
nargs='*',
|
|
help="Tags to decode.")
|
|
parser.add_argument(
|
|
'-x', '--hex',
|
|
action='store_true',
|
|
help="Interpret as a sequence of hex bytes.")
|
|
parser.add_argument(
|
|
'-s', '--string',
|
|
action='store_true',
|
|
help="Interpret as strings.")
|
|
parser.add_argument(
|
|
'-b', '--block-size',
|
|
type=bdgeom,
|
|
help="Block size/geometry in bytes.")
|
|
parser.add_argument(
|
|
'--block-count',
|
|
type=lambda x: int(x, 0),
|
|
help="Block count in blocks.")
|
|
parser.add_argument(
|
|
'--off',
|
|
type=lambda x: int(x, 0),
|
|
help="Use this offset.")
|
|
sys.exit(main(**{k: v
|
|
for k, v in vars(parser.parse_intermixed_args()).items()
|
|
if v is not None}))
|