Adopted new struct encoding scheme with redund tag bits

Struct tags, in littlefs, generally encode pointers to different on-disk
data structures. At this point, they've gotten a bit complex, with the
btree struct, for example, containing 1. a block address, 2. the trunk
offset, 3. the weight of the trunk, and 4. a checksum.

Also some future plans:

1. Block redundancy will make it so these pointers may have a variable
   number of block addresses to contend with.

2. Different checksum types may make the checksum field itself variable
   length, at least on larger builds of littlefs.

   This may also happen if we support truncated checksums in littlefs
   for storage saving reasons.

Having two variable sized fields becomes a bit of a pain. We can use the
encoded tag size to figure out the size of one of these fields, but not
both.

The change here makes it so the tag size now determines the checksum
size, requiring the redundancy amount to go somewhere else. This makes
it so checksums can be variably sized, and the explicit redundancy
amount avoids the need to parse the leb128s fully to know how many
blocks we're expecting.

But where to put the redundancy amount?

This commit carves out 2-bits from the struct tag to store the amount of
redundancy to allow up to 3 blocks of redundancy:

  v0000011 0TTTTTrr
  ^--^---^-^----^-^- valid bit
     '---|-|----|-|- 3-bit mode (0x0 for structs)
         '-|----|-|- 4-bit suptype (0x3 for structs)
           '----|-|- 0 bit (reserved for leb128)
                '-|- 5-bit subtype
                  '- 2-bit redund

3 blocks may sound extremely limiting, but it's a common limit for
filesystems, 1. because you have to keep in mind each redundant block
adds that much more writing/reading overhead and 2. the fact
that 2^(2^n)-1 is always divisible by 3 makes >3 parity blocks much more
complicated mathematically.

Worst case, if we ever have >3 redundant blocks, we can create new
struct subtypes. Maybe adding extended struct types that prefix the
block addresses with a leb128 encoding the redundancy amount.

---

As a part of this, reorganized the on-disk btree and ecksum encodings to
put the checksum last.

Also split out the btree and inner btree branches as separate struct
types. The btree includes the weight, whereas the weight is implicit in
inner btree branches. This came about after realizing context-specific
prefixes are relatively easy to add thanks to the composability of our
parsers.

This led to some name collisions though:

- BRANCH   -> BNAME
- BOOKMARK -> DMARK
This commit is contained in:
Christopher Haster
2023-08-11 01:44:51 -05:00
parent d069fed3ed
commit 314c832588
5 changed files with 221 additions and 169 deletions

View File

@@ -14,18 +14,19 @@ TAG_SUPERCONFIG = 0x0004
TAG_GSTATE = 0x0100
TAG_GRM = 0x0100
TAG_NAME = 0x0200
TAG_BRANCH = 0x0200
TAG_BOOKMARK = 0x0201
TAG_BNAME = 0x0200
TAG_DMARK = 0x0201
TAG_REG = 0x0202
TAG_DIR = 0x0203
TAG_STRUCT = 0x0300
TAG_INLINED = 0x0300
TAG_BLOCK = 0x0302
TAG_BTREE = 0x0303
TAG_MROOT = 0x0304
TAG_MDIR = 0x0305
TAG_MTREE = 0x0306
TAG_DID = 0x0307
TAG_BLOCK = 0x0308
TAG_BTREE = 0x030c
TAG_MDIR = 0x0311
TAG_MTREE = 0x0314
TAG_MROOT = 0x0318
TAG_BRANCH = 0x031c
TAG_DID = 0x0320
TAG_UATTR = 0x0400
TAG_SATTR = 0x0500
TAG_ALT = 0x4000
@@ -97,12 +98,12 @@ def fromtag(data):
size, d_ = fromleb128(data[2+d:])
return tag>>15, tag&0x7fff, weight, size, 2+d+d_
def frombtree(data):
cksum = fromle32(data)
w, d1 = fromleb128(data[4:])
trunk, d2 = fromleb128(data[4+d1:])
block, d3 = fromleb128(data[4+d1+d2:])
return w, trunk, block, cksum
def frombranch(data):
d = 0
block, d_ = fromleb128(data[d:]); d += d_
trunk, d_ = fromleb128(data[d:]); d += d_
cksum = fromle32(data[d:]); d += 4
return block, trunk, cksum
def popc(x):
return bin(x).count('1')
@@ -138,8 +139,8 @@ def tagrepr(tag, w, size, off=None):
size)
elif (tag & 0xff00) == TAG_NAME:
return '%s%s %d' % (
'branch' if tag == TAG_BRANCH
else 'bookmark' if tag == TAG_BOOKMARK
'bname' if tag == TAG_BNAME
else 'dmark' if tag == TAG_DMARK
else 'reg' if tag == TAG_REG
else 'dir' if tag == TAG_DIR
else 'name 0x%02x' % (tag & 0xff),
@@ -150,9 +151,10 @@ def tagrepr(tag, w, size, off=None):
'inlined' if tag == TAG_INLINED
else 'block' if tag == TAG_BLOCK
else 'btree' if tag == TAG_BTREE
else 'mroot' if tag == TAG_MROOT
else 'mdir' if tag == TAG_MDIR
else 'mtree' if tag == TAG_MTREE
else 'mroot' if tag == TAG_MROOT
else 'branch' if tag == TAG_BRANCH
else 'did' if tag == TAG_DID
else 'struct 0x%02x' % (tag & 0xff),
' w%d' % w if w else '',
@@ -542,7 +544,7 @@ def main(disk, roots=None, *,
rid_, w = rid__, w_
# catch any branches
if tag == TAG_BTREE:
if tag == TAG_BRANCH:
branch = (tag, j, d, data)
tags.append((tag, j, d, data))
@@ -554,7 +556,7 @@ def main(disk, roots=None, *,
if branch is not None and (
not depth or depth_ < depth):
tag, j, d, data = branch
w_, trunk, block, cksum = frombtree(data)
block, trunk, cksum = frombranch(data)
rbyd = Rbyd.fetch(f, block_size, block, trunk)
# corrupted? bail here so we can keep traversing the tree
@@ -648,7 +650,7 @@ def main(disk, roots=None, *,
))
d_ += max(bdepths.get(d, 0), 1)
leaf = (bid-(w-1), d, rid-(w-1), TAG_BTREE)
leaf = (bid-(w-1), d, rid-(w-1), TAG_BRANCH)
# remap branches to leaves if we aren't showing inner branches
if not args.get('inner'):