129 Commits

Author SHA1 Message Date
Christopher Haster
fb73f78c91 Updated comments to prefer "canonical checksum" for rbyd checksums
I think this describes the goal of the non-perturbed rbyd checksums
decently. At the very least it's less wrong that "data checksum", and
calling it the "metadata checksum" would just be confusing. (Would our
commit checksum be the "metametadata checksum" then?)
2024-07-31 12:29:13 -05:00
Christopher Haster
c739e18f6f Renamed LFSR_TAG_NOISE -> LFSR_TAG_NOTE
Sort of like SHT_NOTE in elf files, but with no defined format. Using
LFSR_TAG_NOTE for additional noise/nonces is still encouraged, but it
can also be used to add debug info.
2024-06-20 13:04:20 -05:00
Christopher Haster
ae0e3348fe Added -l/--list to dbgtag.py
Inspired by errno's/dbgerr.py's -l/--list, this gives a quick and easy
list of the current tag encodings, which can be very useful:

  $ ./scripts/dbgtag.py -l
  LFSR_TAG_NULL       0x0000  v--- ---- ---- ----
  LFSR_TAG_CONFIG     0x00tt  v--- ---- -ttt tttt
  LFSR_TAG_MAGIC      0x0003  v--- ---- ---- --11
  LFSR_TAG_VERSION    0x0004  v--- ---- ---- -1--
  ... snip ...

We already need to keep dbgtag.py in-sync or risk a bad debugging
experience, so we might as well let it tell us all the information it
currently knows.

Also yay for self-inspecting code, I don't know if it's bad that I'm
becoming a fan of parsing information out of comments...
2024-06-20 13:02:08 -05:00
Christopher Haster
898f916778 Fixed pl hole in perturb logic
Turns out there's very _very_ small powerloss hole in our current
perturb logic.

We rely on tag valid bits to validate perturb bits, but these
intentionally don't end up in the commit checksum. This means there will
always be a powerloss hole when we write the last valid bit. If we lose
power after writing that bit, suddenly the remaining commit and any
following commits may appear as valid.

Now, this is really unlikely considering we need to lose power exactly
when we write the cksum tag's valid bit, and our nonce helps protect
against this. But a hole is a hole.

The solution here is to include the _current_ perturb bit (q) in the
commit's cksum tag, alongside the _next_ perturb bit (p). This will be
included in the commit's checksum, but _not_ in the canonical checksum,
allowing the commit's checksum validate the current perturb state
without ruining our erased-state agnostic checksums:

  .---+---+---+---. . . .---+---+---+---. \   \   \   \
  |v|    tag      |     |v|    tag      | |   |   |   |
  +---+---+---+---+     +---+---+---+---+ |   |   |   |
  |     commit    |     |     commit    | |   |   |   |
  |               |     |               | +-. |   |   |
  +---+---+---+---+     +---+---+---+---+ / | |   |   |
  |v|qp-------------.   |v|qp| tag      |   | .   .   .
  +---+---+---+---+ |   +---+---+---+---+   | .   .   .
  |     cksum     | |   |     cksum     |   | .   .   .
  +---+---+---+---+ |   +---+---+---+---+   | .   .   .
  |    padding    | |   |    padding    |   | .   .   .
  |               | |   |               |   | .   .   .
  +---+---+---+---+ | . +---+---+---+---+   | |   |   |
  |     erased    | +-> |v------------------' |   |   |
  |               | |   +---+---+---+---+     |   |   |
  .               . |   |     commit    |     +-. |   +- rbyd
  .               . |   |.----------------.   | | |   |  cksum
                    |   +| -+---+---+---+ |   / | +-. /
                    +-> |v|qp| tag      | '-----' | |
                    |   +- ^ ---+---+---+         / |
                    '------'  cksum ----------------'
                        +---+---+---+---+
                        |    padding    |
                        |               |
                        +---+---+---+---+
                        |     erased    |
                        |               |
                        .               .
                        .               .

(Ok maybe this diagram needs work...)

This adds another thing that needs to be checked during rbyd fetch, and
note, we _do_ need to explicitly check this, but it solves the problem.
If power is loss after v, q would be invalid, and if power is lost after
q, our cksum would be invalid.

Note this would have also been an issue for the previous cksum + parity
perturb scheme.

Code changes:

           code          stack
  before: 33570           2592
  after:  33598 (+0.1%)   2592 (+0.0%)
2024-06-07 19:41:47 -05:00
Christopher Haster
8a4f6fcf68 Adopted a simpler rbyd perturb scheme
The previous cksum + parity scheme worked, but needing to calculate both
cksum + parity on slightly different sets of metadata felt overly
complicated. After taking a step back, I've realized the problem is that
we're trying to force perturb effects to be implicit via the parity. If we
instead actually implement perturb effects explicitly, things get quite
a bit simpler...

This does add a bit more logic to the read path, but I don't think it's
worse than the mess we needed to parse separate cksum + parity.

Now, the perturb bit has the explicit behavior of inverting all tag
valid bits in the following commit. Which is conveniently the same as
xoring the crc32c with 00000080 before parsing each tag:

  .---+---+---+---. . . .---+---+---+---. \   \   \   \
  |v|    tag      |     |v|    tag      | |   |   |   |
  +---+---+---+---+     +---+---+---+---+ |   |   |   |
  |     commit    |     |     commit    | |   |   |   |
  |               |     |               | +-. |   |   |
  +---+---+---+---+     +---+---+---+---+ / | |   |   |
  |v|p--------------.   |v|p|  tag      |   | .   .   .
  +---+---+---+---+ |   +---+---+---+---+   | .   .   .
  |     cksum     | |   |     cksum     |   | .   .   .
  +---+---+---+---+ |   +---+---+---+---+   | .   .   .
  |    padding    | |   |    padding    |   | .   .   .
  |               | |   |               |   | .   .   .
  +---+---+---+---+ | . +---+---+---+---+   | |   |   |
  |     erased    | +-> |v------------------' |   |   |
  |               | |   +---+---+---+---+     |   |   |
  .               . |   |     commit    |     +-. |   +- rbyd
  .               . |   |               |     | | |   |  cksum
                    |   +---+---+---+---+     / | +-. /
                    '-> |v----------------------' | |
                        +---+---+---+---+         / |
                        |     cksum ----------------'
                        +---+---+---+---+
                        |    padding    |
                        |               |
                        +---+---+---+---+
                        |     erased    |
                        |               |
                        .               .
                        .               .

With this scheme, we don't need to calculate a separate parity, because
each valid bit effectively validates the current state of the perturb
bit.

We also don't need extra logic to omit valid bits from the cksum,
because flipping all valid bits effectively makes perturb=0 the
canonical metadata encoding and cksum.

---

I also considered only inverting the first valid bit, which would have
the additional benefit of allowing entire commits to be crc32ced at
once, but since we don't actually track when we've started a commit
this turned out to be quite a bit more complicated than I thought.

We need someway to validate the first valid bit, otherwise it could be
flipped by a failed prog and we'd never notice. This is fine, we can
store a copy of the previous perturb bit in the next cksum tag, but it
does mean we need to track the perturb bit for the duration of the
commit. So we'd end up needing to track both start-of-commit and the
perturb bit state, which starts getting difficult to fit into our rbyd
struct...

It's easier and simpler to just flip every valid bit. As a plus this
means every valid bit contributes to validating the perturb bit.

---

Also renamed LFSR_TAG_PERTURB -> LFSR_TAG_NOISE just to avoid confusion.
Though not sure if this tag should stick around...

The end result is a nice bit of code/stack savings, which is what we'd
expect with a simpler scheme:

           code          stack
  before: 33746           2600
  after:  33570 (-0.5%)   2592 (-0.3%)
2024-06-07 18:24:13 -05:00
Christopher Haster
bb3ef46cdf Fixed B-tree rendering in dbgmtree.py, unexpected arg
Rbyd.btree_btree renders B-trees, not rbyds, the rbyd arg doesn't make
sense here. This was just an accidental refactor mistake.
2024-05-31 16:36:33 -05:00
Christopher Haster
2e8012681b Tweaked dbg script headers to match the mount info log
The main difference being rendering the weight with a single letter "w"
prefix:

  $ ./scripts/dbglfs.py disk -b4096
  littlefs v0.0 4096x256 0x{1,0}.8b w2.512, rev eb7f2a0d
  ...

This lets us add valuable weight info without too much noise.

Adopting this in the dbg scripts is nice for consistency.
2024-05-24 14:56:11 -05:00
Christopher Haster
56b18dfd9a Reworked revision count logic a bit, block_cycles -> block_recycles
The original goal here was to restore all of the revision count/
wear-leveling features that were intentionally ignored during
refactoring, but over time a few other ideas to better leverage our
revision count bits crept in, so this is sort of the amalgamation of
that...

Note! None of these changes affect reading. mdir fetch strictly needs
only to look at the revision count as a big 32-bit counter to determine
which block is the most recent.

The interesting thing about the original definition of the revision
count, a simple 32-bit counter, is that it actually only needs 2-bits to
work. Well, three states really: 1. most recent, 2. less recent, 3.
future most recent. This means the remaining bits are sort of up for
grabs to other things.

Previously, we've used the extra revision count bits as a heuristic for
wear-leveling. Here we reintroduce that, a bit more rigorously, while
also carving out space for a nonce to help with commit collisions.

Here's the new revision count breakdown:

  vvvvrrrr rrrrrrnn nnnnnnnn nnnnnnnn
  '-.''----.----''---------.--------'
    '------|---------------|---------- 4-bit relocation revision
           '---------------|---------- recycle-bits recycle counter
                           '---------- pseudorandom nonce

- 4-bit relocation revision

  We technically only need 2-bits to tell which block is the most
  recent, but I've bumped it up to 4-bits just to be safe and to make
  it a bit more readable in hex form.

- recycle-bits recycle counter

  A user configurable counter, this counter tracks how many times a
  metadata block has been erased. When it overflows we return the block
  to the allocator to participate in block-level wear-leveling again.
  This implements our copy-on-bounded-write strategy.

- pseudorandom nonce

  The remaining bits we fill with a pseudorandom nonce derived from the
  filesystem's prng. Note this prng isn't the greatest (it's just the
  xor of all mdir cksums), but it gets the job done. It should also be
  reproducible, which can be a good thing.

  Suggested by ithinuel, the addition of a nonce should help with the
  commit collision issue caused by noop erases. It doesn't completely
  solve things, since we're only using crc32c cksums not collision
  resistant cryptographic hashes, but we still have the existing
  valid/perturb bit system to fall back on.

When we allocate a new mdir, we want to zero the recycle counter. This
is where our relocation revision is useful for indicating which block is
the most recent:

  initial state: 10101010 10101010 10101010 10101010
                 '-.'
                  +1     zero           random
                   v .----'----..---------'--------.
  lfsr_rev_init: 10110000 00000011 01110010 11101111

When we increment, we increment recycle counter and xor in a new nonce:

  initial state: 10110000 00000011 01110010 11101111
                 '--------.----''---------.--------'
                         +1              xor <-- random
                          v               v
  lfsr_rev_init: 10110000 00000111 01010100 01000000

And when the recycle counter overflows, we relocate the mdir.

If we aren't wear-leveling, we just increment the relocation revision to
maximize the nonce.

---

Some other notes:

- Renamed block_cycles -> block_recycles.

  This is intended to help avoid confusing block_cycles with the actual
  physical number of erase cycles supported by the device.

  I've noticed this happening a few times, and it's unfortunately
  equivalent to disabling wear-leveling completely. This can be improved
  with better documentation, but also changing the name doesn't hurt.

- We now relocate both blocks in the mdir at the same time.

  Previously we only relocated one block in the mdir per recycle. This
  was necessary to keep our threaded linked-list in sync, but the
  threaded linked-list is now no more!

  Relocating both blocks is simpler, updates the mtree less often,
  compatible with metadata redundancy, and avoids aliasing issues that
  were a problem when relocating one block.

  Note that block_recycles is internally multiplied by 2 so each block
  sees the correct number of erase cycles.

- block_recycles is now rounded down to a power-of-2.

  This makes the counter logic easier to work with and takes up less RAM
  in lfs_t. This is a rough heuristic anyways.

- Moved the lfs->seed updates into lfsr_mountinited + lfsr_mdir_commit.

  This avoids readonly operations affecting the seed and should help
  reproducibility.

- Changed rev count in dbg scripts to render as hex, similar to cksums.

  Now that we using most of the bits in the revision count, the decimal
  version is, uh, not helpful...

Code changes:

           code          stack
  before: 33342           2640
  after:  33434 (+0.3%)   2640 (+0.0%)
2024-05-22 18:49:05 -05:00
Christopher Haster
11c948678f Renamed size_limit -> file_limit
This limits the maximum size of a file, which is also implies the
maximum integer size required to mount.

The exact name is a bit of a toss-up. I originally went with size_limit
to avoid confusion around if file_limit reflected the file size or the
number of files, but since this ends up mapping to lfs_off_t and _not_
lfs_size_t, I think size_limit may be a bit of a bad choice.
2024-05-18 13:00:15 -05:00
Christopher Haster
8a75a68d8b Made rbyd cksums erased-state agnostic
Long story short, rbyd checksums are now fully reproducible. If you
write the same set of tags to any block, you will end up with the same
checksum.

This is actually a bit tricky with littlefs's constraints.

---

The main problem boils down to erased-state. littlefs has a fairly
flexible model for erased-state, and this brings some challenges. In
littlefs, storage goes through 2 states:

1. Erase - Prepare storage for progging. Reads after an erase may return
   arbitrary, but consistent, values.

2. Prog - Program storage with data. Storage must be erased and no progs
   attempted. Reads after a prog must return the new data.

Note in this model erased-state may not be all 0xffs, though it likely
will be for flash. This allows littlefs to support a wide range of
other storage devices: SD, RAM, NVRAM, encryption, ECC, etc.

But this model also means erased-state may be different from block to
block, and even different on later erases of the same block.

And if that wasn't enough of a challenge, _erased-state can contain
perfectly valid commits_. Usually you can expect arbitrary valid cksums
to be rare, but thanks to SD, RAM, etc, modeling erase as a noop, valid
cksums in erased-state is actually very common.

So how do we manage erased-state in our rbyds?

First we need some way to detect it, since we can't prog if we're not
erased. This is accomplished by the forward-looking erased-state cksum
(ecksum):

  .---+---+---+---.     \
  |     commit    |     |
  |               |     |
  |               |     |
  +---+---+---+---+     +-.
  |     ecksum -------. | | <-- ecksum - cksum of erased state
  +---+---+---+---+   | / |
  |     cksum --------|---' <-- cksum - cksum of commit,
  +---+---+---+---+   |                 including ecksum
  |    padding    |   |
  |               |   |
  +---+---+---+---+ \ |
  |     erased    | +-'
  |               | /
  .               .
  .               .

You may have already noticed the start of our problems. The ecksum
contains the erased-state, which is different per-block, and our rbyd
cksum contains the ecksum. We need to include the ecksum so we know if
it's valid, but this means our rbyd cksum changes block to block.

Solving this is simple enough: Stop the rbyd's canonical cksum before
the ecksum, but include the ecksum in the actual cksum we write to disk.

Future commits will need to start from the canonical cksum, so the old
ecksum won't be included in new commits, but this shouldn't be a
problem:

  .---+---+---+---. . . \ . \ . . . . .---+---+---+---.     \   \
  |     commit    |     |   |         |     commit    |     |   |
  |               |     |   +- rbyd   |               |     |   |
  |               |     |   |  cksum  |               |     |   |
  +---+---+---+---+     +-. /         +---+---+---+---+     |   |
  |     ecksum -------. | |           |     ecksum    |     .   .
  +---+---+---+---+   | / |           +---+---+---+---+     .   .
  |     cksum --------|---'           |     cksum     |     .   .
  +---+---+---+---+   |               +---+---+---+---+     .   .
  |    padding    |   |               |    padding    |     .   .
  |               |   |               |               |     .   .
  +---+---+---+---+ \ | . . . . . . . +---+---+---+---+     |   |
  |     erased    | +-'               |     commit    |     |   |
  |               | /                 |               |     |   +- rbyd
  .               .                   |               |     |   |  cksum
  .               .                   +---+---+---+---+     +-. /
                                      |     ecksum -------. | |
                                      +---+---+---+---+   | / |
                                      |     cksum ------------'
                                      +---+---+---+---+   |
                                      |    padding    |   |
                                      |               |   |
                                      +---+---+---+---+ \ |
                                      |     erased    | +-'
                                      |               | /
                                      .               .
                                      .               .

The second challenge is the pesky possibility of existing valid commits.
We need some way to ensure that erased-state following a commit does not
accidentally contain a valid old commit.

This is where are tag's valid bits come into play: The valid bit of each
tag must match the parity of all preceding tags (equivalent to the
parity of the crc32c), and we can use some perturb bits in the cksum tag
to make sure any tags in our erased-state do _not_ match:

  .---+---+---+---. \ . . . . . .---+---+---+---. \   \   \
  |v|    tag      | |           |v|    tag      | |   |   |
  +---+---+---+---+ |           +---+---+---+---+ |   |   |
  |     commit    | |           |     commit    | |   |   |
  |               | |           |               | |   |   |
  +---+---+---+---+ +-----.     +---+---+---+---+ +-. |   |
  |v|p|  tag      | |     |     |v|p|  tag      | | | |   |
  +---+---+---+---+ /     |     +---+---+---+---+ / | |   |
  |     cksum     |       |     |     cksum     |   | .   .
  +---+---+---+---+       |     +---+---+---+---+   | .   .
  |    padding    |       |     |    padding    |   | .   .
  |               |       |     |               |   | .   .
  +---+---+---+---+ . . . | . . +---+---+---+---+   | |   |
  |v---------------- != --'     |v------------------' |   |
  |     erased    |             +---+---+---+---+     |   |
  .               .             |     commit    |     |   |
  .               .             |               |     |   |
                                +---+---+---+---+     +-. +-.
                                |v|p|  tag      |     | | | |
                                +---+---+---+---+     / | / |
                                |     cksum ----------------'
                                +---+---+---+---+       |
                                |    padding    |       |
                                |               |       |
                                +---+---+---+---+       |
                                |v---------------- != --'
                                |     erased    |
                                .               .
                                .               .

New problem! The rbyd cksum contains the valid bits, which contain the
perturb bits, which depends on the erased-state!

And you can't just derive the valid bits from the rbyd's canonical
cksum. This avoids erased-state poisoning, sure, but then nothing in the
new commit depends on the perturb bits! The catch-22 here is that we
need the valid bits to both depend on, and ignore, the erased-state
poisoned perturb bits.

As far as I can tell, the only way around this is to make the rybd's
canonical cksum not include the parity bits. Which is annoying, masking
out bits is not great for bulk cksum calculation...

But this does solve our problem:

  .---+---+---+---. \ . . . . . .---+---+---+---. \   \   \   \
  |v|    tag      | |           |v|    tag      | |   |   o   o
  +---+---+---+---+ |           +---+---+---+---+ |   |   |   |
  |     commit    | |           |     commit    | |   |   |   |
  |               | |           |               | |   |   |   |
  +---+---+---+---+ +-----.     +---+---+---+---+ +-. |   |   |
  |v|p|  tag      | |     |     |v|p|  tag      | | | |   .   .
  +---+---+---+---+ /     |     +---+---+---+---+ / | |   .   .
  |     cksum     |       |     |     cksum     |   | .   .   .
  +---+---+---+---+       |     +---+---+---+---+   | .   .   .
  |    padding    |       |     |    padding    |   | .   .   .
  |               |       |     |               |   | .   .   .
  +---+---+---+---+ . . . | . . +---+---+---+---+   | |   |   |
  |v---------------- != --'     |v------------------' |   o   o
  |     erased    |             +---+---+---+---+     |   |   |
  .               .             |     commit    |     |   |   +- rbyd
  .               .             |               |     |   |   |  cksum
                                +---+---+---+---+     +-. +-. /
                                |v|p|  tag      |     | | o |
                                +---+---+---+---+     / | / |
                                |     cksum ----------------'
                                +---+---+---+---+       |
                                |    padding    |       |
                                |               |       |
                                +---+---+---+---+       |
                                |v---------------- != --'
                                |     erased    |
                                .               .
                                .               .

Note that because each commit's cksum derives from the canonical cksum,
the valid bits and commit cksums no longer contain the same data, so our
parity(m) = parity(crc32c(m)) trick no longer works.

However our crc32c still does tell us a bit about each tag's parity, so
with a couple well-placed xors we can at least avoid needing two
parallel calculations:

  cksum' = crc32c(cksum, m)
  valid' = parity(cksum' xor cksum) xor valid

This also means our commit cksums don't include any information about
the valid bits, since we mask these out before cksum calculation. Which
is a bit concerning, but as far as I can tell not a real problem.

---

An alternative design would be to just keep track of two cksums: A
commit cksum and a canonical cksum.

This would be much simpler, but would also require storing two cksums in
RAM in our lfsr_rbyd_t struct. A bit annoying for our 4-byte crc32cs,
and a bit more than a bit annoying for hypothetical 32-byte sha256s.

It's also not entirely clear how you would update both crc32cs
efficiently. There is a way to xor out the initial state before each
tag, but I think it would still require O(n) cycles of crc32c
calculation...

As it is, the extra bit needed to keep track of commit parity is easy
enough to sneak into some unused sign bits in our lfsr_rbyd_t struct.

---

I've also gone ahead and mixed in the current commit parity into our
cksum's perturb bits, so the commit cksum at least contains _some_
information about the previous parity.

But it's not entirely clear this actually adds anything. Our perturb
bits aren't _required_ to reflect the commit parity, so a very unlucky
power-loss could in theory still make a cksum valid for the wrong
parity.

At least this situation will be caught by later valid bits...

I've also carved out a tag encoding, LFSR_TAG_PERTURB, solely for adding
more perturb bits to commit cksums:

  LFSR_TAG_CKSUM          0x3cpp  v-11 cccc -ppp pppp

  LFSR_TAG_CKSUM          0x30pp  v-11 ---- -ppp pppp
  LFSR_TAG_PERTURB        0x3100  v-11 ---1 ---- ----
  LFSR_TAG_ECKSUM         0x3200  v-11 --1- ---- ----
  LFSR_TAG_GCKSUMDELTA+   0x3300  v-11 --11 ---- ----

  + Planned

This allows for more than 7 perturb bits, and could even mix in the
entire previous commit cksum, if we ever think that is worth the RAM
tradeoff.

LFSR_TAG_PERTURB also has the advantage that it is validated by the
cksum tag's valid bit before being included in the commit cksum, which
indirectly includes the current commit parity. We may eventually want to
use this instead of the cksum tag's perturb bits for this reason, but
right now I'm not sure this tiny bit of extra safety is worth the
minimum 5-byte per commit overhead...

Note if you want perturb bits that are also included in the rbyd's
canonical cksum, you can just use an LFSR_TAG_SHRUBDATA tag. Or any
unreferenced shrub tag really.

---

All of these changes required a decent amount of code, I think mostly
just to keep track of the parity bit. But the isolation of rbyd cksums
from erased-state is necessary for several future-planned features:

           code          stack
  before: 33564           2816
  after:  33916 (+1.0%)   2824 (+0.3%)
2024-05-04 17:25:01 -05:00
Christopher Haster
c4fcc78814 Tweaked file types/name tag encoding to be a bit less quirky
The intention behind the quirky encoding was to leverage bit 1 to
indicate if the underlying file type would be backed by the common file
B-tree data structure. Looking forward, there may be several of these
types, compressed files, contiguous files, etc, that for all intents and
purposes are just normal files interpreted differently.

But trying to leverage too many bits like this is probably going to give
us a sparse, awkward, and confusing tag encoding, so I've reverted to a
hopefully more normal encoding:

  LFSR_TAG_NAME           0x02tt  v--- --1- -ttt tttt

  LFSR_TAG_NAME           0x0200  v--- --1- ---- ----
  LFSR_TAG_REG            0x0201  v--- --1- ---- ---1
  LFSR_TAG_DIR            0x0202  v--- --1- ---- --1-
  LFSR_TAG_SYMLINK*       0x0203  v--- --1- ---- --11
  LFSR_TAG_BOOKMARK       0x0204  v--- --1- ---- -1--
  LFSR_TAG_ORPHAN         0x0205  v--- --1- ---- -1-1
  LFSR_TAG_COMPR*         0x0206  v--- --1- ---- -11-
  LFSR_TAG_CONTIG*        0x0207  v--- --1- ---- -111

  * Hypothetical

Note the carve-out for the hypothetical symlink tag. Symlinks are
actually incredibly low in the priority list, but they are also
the only current hypothetical file type that would need to be exposed to
users. Grouping these up makes sense.

This will get a bit messy if we ever end up with a 4th user-facing type,
but there isn't any in POSIX at least (ignoring non-fs types, socket,
fifo, character, block, etc).

The gap also helps line things up so reg/orphan are a single bit flip,
and the non-user facing types all share a bit.

This had no impact on code size:

           code          stack
  before: 33564           2816
  after:  33564 (+0.0%)   2816 (+0.0%)
2024-05-04 17:24:48 -05:00
Christopher Haster
6e5d314c20 Tweaked struct tag encoding so b*/m* tags are earlier
These b*/m* struct tags have a common pattern that would be good to
emphasize in the encoding. The later struct tags get a bit more messy as
they leave space for future possible extensions.

New encoding:

  LFSR_TAG_STRUCT         0x03tt  v--- --11 -ttt ttrr

  LFSR_TAG_DATA           0x0300  v--- --11 ---- ----
  LFSR_TAG_BLOCK          0x0304  v--- --11 ---- -1rr
  LFSR_TAG_BSHRUB         0x0308  v--- --11 ---- 1---
  LFSR_TAG_BTREE          0x030c  v--- --11 ---- 11rr
  LFSR_TAG_MROOT          0x0310  v--- --11 ---1 --rr
  LFSR_TAG_MDIR           0x0314  v--- --11 ---1 -1rr
  LFSR_TAG_MSHRUB*        0x0318  v--- --11 ---1 1---
  LFSR_TAG_MTREE          0x031c  v--- --11 ---1 11rr
  LFSR_TAG_DID            0x0320  v--- --11 --1- ----
  LFSR_TAG_BRANCH         0x032c  v--- --11 --1- 11rr

  * Hypothetical

Note that all shrubs currently end with 1---, and all btrees, including
the awkward branch tag, end with 11rr.

This had no impact on code size:

           code          stack
  before: 33564           2816
  after:  33564 (+0.0%)   2816 (+0.0%)
2024-05-04 17:24:33 -05:00
Christopher Haster
5fa85583cd Dropped block-level erased-state checksums for RAM-tracked erased-state
Unfortunately block-level erased-state checksums (becksums) don't really
work as intended.

An invalid becksum _does_ signal that a prog has been attempted, but a
valid becksum does _not_ prove that a prog has _not_ been attempted.

Rbyd ecksums work, but only thanks to a combination of prioritizing
valid commits and the use of perturb bits to force erased-state changes.
It _is_ possible to end up with an ecksum collision, but only if you
1. lose power before completing a commit, and 2. end up with a
non-trivial crc32c collision. If this does happen, at the very least the
resulting commit will likely end up corrupted and thrown away later.

Block-level becksums, at least as originally designed, don't have either
of these protections. To make matters worse, the blocks these becksums
reference contain only raw user data. Write 0xffs into a file and you
will likely end up with a becksum collision!

This is a problem for a couple of reasons:

1. Progging multiple times to erased-state is likely to result in
   corrupted data, though this is also likely to get caught with
   validating writes.

   Worst case, the resulting data looks valid, but with weakened data
   retention.

2. Because becksums are stored in the copy-on-write metadata of the
   file, attempting to open a file twice for writing (or more advanced
   copy-on-write operations in the future) can lead to a situation where
   a prog is attempted on _already committed_ data.

   This is very bad and breaks copy-on-write guarantees.

---

So clearly becksums are not fit for purpose and should be dropped. What
can we replace them with?

The first option, implemented here, is RAM-tracked erased state. Give
each lfsr_file_t its own eblock/eoff fields to track the last known good
erased-state. And before each prog, clear eblock/eoff so we never
accidentally prog to the same erased-state twice.

It's interesting to note we don't currently clear eblock/eoff in all
file handles, this is ok only because we don't currently share
eblock/eoff across file handles. Each eblock/eoff is exclusive to the
lfsr_file_t and does not appear anywhere else in the system.

The main downside of this approach is that, well, the RAM-tracked
erase-state is only tracked in RAM. Block-level erased-state effectively
does not persist across reboots. I've considered adding some sort of
per-file erased-state tracking to the mdir that would need to be cleared
before use, but such a mechanism ends up quite complicated.

At the moment, I think the best second option is to put erased-state
tracking in the future-planned bmap. This would let you opt-in to
on-disk tracking of all erased-state in the system.

One nice thing about RAM-tracked erased-state is that it's not on disk,
so it's not really a compatibility concern and won't get in the way of
additional future erased-state tracking.

---

Benchmarking becksums vs RAM-tracking has been quite interesting. While
in theory becksums can track much more erased-state, it's quite unlikely
anything but the most recent erased-state actually ends up used. The end
result is no real measurable performance loss, and actually a minor
speedup because we don't need to calculate becksums on every block
write.

There are some pathological cases, such as multiple write heads, but
these are out-of-scope right now (note! multiple explicit file handles
currently handle this case beautifully because we don't share
eblock/eoff!)

Becksums were also relatively complicated, and needed extra scaffolding
to pass around/propagate as secondary tags alongside the primary bptr.
So trading these for RAM-tracking also gives us a nice bit of code/stack
savings, albeit at a 2-word RAM cost in lfsr_file_t:

           code          stack          structs
  before: 33888           2864             1096
  after:  33564 (-1.0%)   2816 (-1.7%)     1104 (+0.7%)

  lfsr_file_t before: 104
  lfsr_file_t after:  112 (+7.7%)
2024-05-04 17:22:56 -05:00
Christopher Haster
81ccfbccd0 Dropped -x/--device from dbg*.py scripts
This hasn't really proven useful.

At one point showing the cksums in dbgrbyd.py was useful, but this is
now possible and easier with dbgblock.py -x/--cksum.
2024-04-28 13:21:46 -05:00
Christopher Haster
86a8582445 Tweaked canonical altn to point to itself
By definition, altns should never be followed, so it doesn't really
matter where they point. But it's not like they can point literally
nowhere, so where should they point?

A couple options:

1. jump=jump - Wherever the old alt pointed
   - Easy, literally a noop
   - Unsafe, bugs could reveal outdated parts of the tree
   - Encoding size eh

2. jump=0 - Point to offset=0
   - Easier, +0 code
   - Safer, branching to 0 should assert
   - Worst possible encoding size

3. jump=itself - Point to itself
   - A bit tricky, +4 code
   - Safe, should assert, even without asserts worst case infinite loop
   - Optimal encoding size

An infinite loop isn't the best failure state, but we can catch this
with an assert, which we would need for jump=0 anyways. And this is only
a concern if there are other fs bugs. jump=0 is actually slightly worse
if asserts are disabled, since we'd end up reading the revision count as
garbage.

Adopting jump=itself gives us the optimal 4-byte encoding:

  altbn w0 = 40 00 00 00
             '-+-'  ^  ^
               '----|--|-- tag = altbn
                    '--|-- weight = 0
                       '-- jump = itself (branch - 0)

This requires tweaking the alt encoder a bit, to avoid relative encoding
jump=0s, but this is pretty cheap:

                code          stack
  jump=jump:   34068           2864
  jump=0:      34068 (+0.0%)   2864 (+0.0%)
  jump=itself: 34072 (+0.0%)   2864 (+0.0%)

I thought we may need to also tweak the decoder, so later trunk copies
don't accidentally point to the old location, but humorously our pruning
kicks in redundantly to reset altbn's jump=itself on every trunk.

Note lfsr_rbyd_lookupnext was also rearranged a bit to make it easier to
assert on infinite loops and this also added some code. Probably just
due to compiler noise:

           code          stack
  before: 34068           2864
  after:  34076 (+0.0%)   2864 (+0.0%)

Also note that we still accept all of the above altbn encoding options.
This only affects encoding and dbg scripts.
2024-04-28 13:21:46 -05:00
Christopher Haster
faf8c4b641 Tweaked alt-tag encoding to match color/dir naming order
This is mainly to avoid mistakes caused by names/encodings disagreeing:

  LFSR_TAG_ALT  0x4kkk  v1cd kkkk -kkk kkkk
                        ^ ^^ '------+-----'
                        '-||--------|------- valid bit
                          '|--------|------- color
                           '--------|------- dir
                                    '------- key

Notably, the LFSR_TAG_ALT() macro has already caused issues by being
both 1. ambiguous, and 2. not really type-checkable. It's easy to get
the order wrong and things not really break, just behave poorly, it's
really not great!

To be honest the exact order is a bit arbitrary, the color->dir naming
appeared by accident because I guess it felt more natural. Maybe because
of English's weird implicit adjective ordering? Maybe because of how
often conditions show up as the last part of the name in other
instruction sets?

At least one plus is that this moves the dir-bit next to the key. This
makes it so all of the condition information is encoding is the lowest
13-bits of the tag, which may lead to minor optimization tricks for
implementing flips and such.

Code changes:

           code          stack
  before: 34080           2864
  after:  34068 (-0.0%)   2864 (+0.0%)
2024-04-28 13:21:41 -05:00
Christopher Haster
37c45e1afc Fixed coloring conflicts in rbyd tree renderers
A bit of a hack, but rather than handling conditional alt branches, our
dbg rbyd tree renderers just represent single-pointer alts as an alt
with both branches pointing to the place.

Unfortunately, the two branches technically have different colors. This
resulted in a bit of contention when chosing how to color the tree.
Basically Python's dict ordering would determine which color won.

Which was a bit confusing when dbgrbyd.py displayed different tree
colorings for the same rbyd. dbgrbyd.py should be idempotent!

This is solved by adding another hack to check explicitly for
same-destination branches.
2024-04-09 20:04:14 -05:00
Christopher Haster
8a646d5b8e Added dbgtag.py for easy tag decoding on the command-line
Example:

  $ ./scripts/dbgtag.py 0x3001
  cksum 0x01

dbgtag.py inherits most of crc32c.py's decoding options. The most useful
probably being -x/--hex:

  $ ./scripts/dbgtag.py -x e1 00 01 8a 09
  altbgt 0x100 w1 -1162

dbgtag.py also supports reading from a block device if either
-b/--block-size or --off are provided. This is mainly for consistency
with the other dbg*.py scripts:

  $ ./scripts/dbgtag.py disk -b4096 0x2.1e4
  bookmark w1 1

This should help when debugging and finding a raw tag/alt in some
register. Manually decoding is just an unnecessary road bump when this
happens.
2024-04-01 16:29:13 -05:00
Christopher Haster
54a03cfe3b Enabled both pruning/non-pruning dbg reprs, -t/--tree and -R/--rbyd
Now that altns/altas are more important structurally, including them in
our dbg script's tree renderers is valuable for debugging. On the other
hand, they do add quite a bit of visual noise when looking at large
multi-rbyd trees topologically.

This commit gives us the best of both worlds by making both tree
renderings available under different options:

-t/--tree, a simplified rbyd tree renderer with altn/alta pruning:

          .->   0 reg w1 4
        .-+->     uattr 0x01 2
        | .->     uattr 0x02 2
    .---+-+->     uattr 0x03 2
    |     .->     uattr 0x04 2
    |   .-+->     uattr 0x05 2
    | .-+--->     uattr 0x06 2
  +-+-+-+-+->   1 reg w1 4
  |     | '->   2 reg w1 4
  |     '--->     uattr 0x01 2
  '---+-+-+->     uattr 0x02 2
      | | '->     uattr 0x03 2
      | '-+->     uattr 0x04 2
      |   '->     uattr 0x05 2
      |   .->     uattr 0x06 2
      | .-+->     uattr 0x07 2
      | | .->     uattr 0x08 2
      '-+-+->     uattr 0x09 2

-R/--rbyd, a full rbyd tree renderer:

            .--->   0 reg w1 4
        .---+-+->     uattr 0x01 2
        |   .--->     uattr 0x02 2
      .-+-+-+-+->     uattr 0x03 2
      |     .--->     uattr 0x04 2
      |   .-+-+->     uattr 0x05 2
      | .-+---+->     uattr 0x06 2
  +---+-+-+-+-+->   1 reg w1 4
  |       |   '->   2 reg w1 4
  |       '----->     uattr 0x01 2
  '-+-+-+-+-+-+->     uattr 0x02 2
    |   |   '--->     uattr 0x03 2
    |   '---+-+->     uattr 0x04 2
    |       '--->     uattr 0x05 2
    |       .--->     uattr 0x06 2
    |     .-+-+->     uattr 0x07 2
    |     |   .->     uattr 0x08 2
    '-----+---+->     uattr 0x09 2

And of course -B/--btree, a simplified B-tree renderer (more useful for
multi-rbyds):

  +->   0 reg w1 4
  |       uattr 0x01 2
  |       uattr 0x02 2
  |       uattr 0x03 2
  |       uattr 0x04 2
  |       uattr 0x05 2
  |       uattr 0x06 2
  |->   1 reg w1 4
  '->   2 reg w1 4
          uattr 0x01 2
          uattr 0x02 2
          uattr 0x03 2
          uattr 0x04 2
          uattr 0x05 2
          uattr 0x06 2
          uattr 0x07 2
          uattr 0x08 2
          uattr 0x09 2
2024-04-01 16:23:31 -05:00
Christopher Haster
abe68c0844 rbyd-rr: Reworking rbyd range removal to try to preserve rby structure
This is the start of (yet another) rework of rybd range removals, this
time in an effort to preserve the rby structure that maps to a balanced
2-3-4 tree. Specifically, the property that all search paths have the
same number of black edges (2-3-4 nodes).

This is currently incomplete, as you can probably tell from the mess,
but this commit at least gets a working altn/alta encoding in place
necessary for representing empty 2-3-4 nodes. More on that below.

---

First the problem:

My assumption, when implementing the previous range removal algorithms,
was that we only needed to maintain the existing height of the tree.

The existing rbyd operations limit the height to strictly log n. And
while we can't _reduce_ the height to maintain perfect balance, we can
at least avoid _increasing_ the height, which means the resulting tree
should have a height <= log n. Since our rbyds are bounded by the
block_size b, this means worst case our rbyd can never exceed a height
<= log b, right?

Well, not quite.

This is true the instance after the remove operation. But there is an
implicit assumption that future rbyd operations will still be able to
maintain height <= log n after the remove operation. This turns out to
not be true.

The problem is that our rbyd appends only maintain height <= log n if
our rby structure is preserved. If the rby structure is broken, rbyd
append assumes an rby structure that doesn't exist, which can lead to an
increasingly unbalanced tree.

Consider this happily balanced tree:

         .-------o-------.                    .--------o
     .---o---.       .---o---.            .---o---.    |
   .-o-.   .-o-.   .-o-.   .-o-.        .-o-.   .-o-.  |
  .o. .o. .o. .o. .o. .o. .o. .o.      .o. .o. .o. .o. |
  a b c d e f g h i j k l m n o p  =>  a b c d e f g h i
                   '------+------'
                        remove

After a range removal it looks pretty bad, but note the height is still
<= log n (old n not the new n). We are still <= log b.

But note what happens if we start to insert attrs into the short half of
the tree:

         .--------o
     .---o---.    |
   .-o-.   .-o-.  |
  .o. .o. .o. .o. |
  a b c d e f g h i

                  .-----o
         .--------o .-+-r
     .---o---.    | | | |
   .-o-.   .-o-.  | | | |
  .o. .o. .o. .o. | | | |
  a b c d e f g h i j'k'l'

                      .-------------o
                  .---o   .---+-----r
         .--------o .-o .-o .-o .-+-r
     .---o---.    | | | | | | | | | |
   .-o-.   .-o-.  | | | | | | | | | |
  .o. .o. .o. .o. | | | | | | | | | |
  a b c d e f g h i j'k'l'm'n'o'p'q'r'

Our right side is generating a perfectly balanced tree as expected, but
the left side is suddenly twice as far from the root! height(r')=3,
height(a)=6!

The problem is when we append l', we don't really know how tall the tree
is. We only know l' has one black edge, which assuming rby structure is
preserved, means all other attrs must have one black edge, so creating a
new root is justified.

In reality this just makes the tree grow increasingly unbalanced,
increasing the height of the tree by worst case log n every range
removal.

---

It's interesting to note this was discovered while debugging
test_fwrite_overwrite, specifically:

  test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10

It turns out the append fragments -> delete fragments -> append/carve
block + becksum loop contains the perfect sequence of attrs necessary to
turn this tree inbalance into a linked-list!

                        .->         0 data w1 1
                      .-b->         1 data w1 1
                      | .->         2 data w1 1
                    .-b-b->         3 data w1 1
                    |   .->         4 data w1 1
                    | .-b->         5 data w1 1
                    | | .->         6 data w1 1
                .---b-b-b->         7 data w1 1
                |       .->         8 data w1 1
                |     .-b->         9 data w1 1
                |     | .->        10 data w1 1
                |   .-b-b->        11 data w1 1
                | .-b----->        12 data w1 1
              .-y-y------->        13 data w1 1
              |         .->        14 data w1 1
            .-y---------y->        15 data w1 1
            |           .->        16 data w1 1
          .-y-----------y->        17 data w1 1
          |             .->        18 data w1 1
        .-y-------------y->        19 data w1 1
        |               .->        20 data w1 1
      .-y---------------y->        21 data w1 1
      |                 .->        22 data w1 1
    .-y-----------------y->        23 data w1 1
    |                   .->        24 data w1 1
  .-y-------------------y->        25 data w1 1
  |                   .--->        26 data w1 1
  |                   | .->   27-2047 block w2021 10
  b-------------------r-b->           becksum 5

Note, to reproduce this you need to step through with a breakpoint on
lfsr_bshrub_commit. This only shows up in the file's intermediary btree,
which at the time of writing ends up at block 0xb8:

  $ ./scripts/test.py \
        test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10 \
        -ddisk --gdb -f

  $ ./scripts/watch.py -Kdisk -b \
        ./scripts/dbgrbyd.py -b4096 disk 0xb8 -t

  (then b lfsr_bshrub_commit and continue a bunch)

---

So, we need to preserve the rby structure.

Note pruning red/yellow alts is not an issue. These aren't black, so we
aren't changing the number of black edges in the tree. We've just
effectively reduced a 3/4 node into a 2/3 node:

      .-> a
  .---b-> b              .-> a <- 2 black
  | .---> c            .-b-> b
  | | .-> d            | .-> c
  b-r-b-> e <- rm  =>  b-b-> d <- 2 black

The tricky bit is pruning black alts. Naively this changes the number of
black edges/2-3-4 nodes in the tree, which is bad:

    .-> a
  .-b-> b              .-> a <- 2 black
  | .-> c            .-b-> b
  b-b-> d <- rm  =>  b---> c <- 1 black

It's tempting to just make the alt red at this point, effectively
merging the sibling 2-3-4 node. This maintains balance in the subtree,
but still removes a black edge, causing problems for our parent:

      .-> a
    .-b-> b                .-> a <- 3 black
    | .-> c              .-b-> b
  .-b-b-> d              | .-> c
  |   .-> e            .-b-b-> d
  | .-b-> f            | .---> e
  | | .-> g            | | .-> f
  b-b-b-> h <- rm  =>  b-r-b-> g <- 2 black

In theory you could propagate this all the way up to the root, and this
_would_ probably give you a perfect self-balancing range removal
algorithm... but it's recursive... and littlefs can't be recursive...

               .-> s
             .-b-> t                              .-> s
             | .-> u                        .-----b-> t
           .-b-b-> v                        |     .-> u
           |   .-> w                        | .---b-> v
           | .-b-> x                        | | .---> w
  | |      | | .-> y           | | | |      | | | .-> x
  b-b- ... b-b-b-> z <- rm =>  r-b-r-b- ... r-b-r-b-> y

So instead, an alternative solution. What if we allowed black alts that
point nowhere? A sort of noop 2-3-4 node that serves only to maintain
the rby structure?

    .-> a
  .-b-> b              .-> a <- 2 black
  | .-> c            .-b-> b
  b-b-> d <- rm  =>  b-b-> c <- 2 black

I guess that would technically make this 1-2-3-4 tree.

This does add extra overhead for writing noop alts, which are otherwise
useless, but it seems to solve most of our problems: 1. does not
increase the height of the tree, 2. maintains the rby structure, 3.
tail-recursive.

And, thanks to the preserved rby structure, we can say that in the worst
case our rbyds will never exceed height <= log b again, even with range
removals.

If we apply this strategy to our original example, you can see how the
preserved rby structure sort of "absorbs" new red alts, preventing
further unbalancing:

         .-------o-------.                    .--------o
     .---o---.       .---o---.            .---o---.    o
   .-o-.   .-o-.   .-o-.   .-o-.        .-o-.   .-o-.  o
  .o. .o. .o. .o. .o. .o. .o. .o.      .o. .o. .o. .o. o
  a b c d e f g h i j k l m n o p  =>  a b c d e f g h i
                   '------+------'
                        remove

Reinserting:

         .--------o
     .---o---.    o
   .-o-.   .-o-.  o
  .o. .o. .o. .o. o
  a b c d e f g h i

         .----------------o
     .---o---.            o
   .-o-.   .-o-.   .------o
  .o. .o. .o. .o. .o. .-+-r
  a b c d e f g h i j'k'l'm'

         .----------------------------o
     .---o---.          .-------------o
   .-o-.   .-o-.    .---o   .---+-----r
  .o. .o. .o. .o. .-o .-o .-o .-o .-+-r
  a b c d e f g h i j'k'l'm'n'o'p'q'r's'

Much better!

---

This commit makes some big steps towards this solution, mainly codifying
a now-special alt-never/alt-always (altn/alta) encoding to represent
these noop 1 nodes.

Technically, since null (0) tags are not allowed, these already exist as
altle 0/altgt 0 and don't need any extra carve-out encoding-wise:

  LFSR_TAG_ALT   0x4kkk  v1dc kkkk -kkk kkkk
  LFSR_TAG_ALTN  0x4000  v10c 0000 -000 0000
  LFSR_TAG_ALTA  0x6000  v11c 0000 -000 0000

We actually already used altas to terminate unreachable tags during
range removals, but this behavior was implicit. Now, altns have very
special treatment as a part of determining bounds during appendattr
(both unreachable gt/le alts are represented as altns). For this reason
I think the new names are warranted.

I've also added these encodings to the dbg*.py scripts for, well,
debuggability, and added a special case to dbgrby.py -j to avoid
unnecessary altn jump noise.

As a part of debugging, I've also extended dbgrbyd.py's tree renderer to
show trivial prunable alts. Unsure about keeping this. On one hand it's
useful to visualize the exact alt structure, on the other hand it likely
adds quite a bit of noise to the more complex dbg scripts.

The current state of things is a mess, but at least tests are passing!

Though we aren't actually reclaiming any altns yet... We're definitely
_not_ preserving the rby structure at the moment, and if you look at the
output from the tests, the resulting tree structure is hilarious bad.

But at least the path forward is clear.
2024-04-01 16:23:14 -05:00
Christopher Haster
62de865103 Eliminated null tag reachability in dbg scripts
This was throwing off tree rendering in dbglfs.py, we attempt to lookup
the null tag because we just want to first tag in the tree to stitch
things together.

Null tag reachability is tricky! You only notice if the tree happens to
create a hole, which isn't that common. I think all lookup
implementations should have this max(tag, 1) pattern from now on to
avoid this.

Note that most dbg scripts wouldn't run into this because we usually use
the traversal tag+1 pattern. Still, the inconsistency in impl between
the dbg scripts and lfs.c is bad.
2024-03-20 13:31:16 -05:00
Christopher Haster
9366674416 Replaced separate BLOCKSIZE/BLOCKCOUNT attrs with single GEOMETRY attr
This saves a bit of rbyd overhead, since these almost always come
together.

Perhaps more interesting, it carves out space for storing mroot-anchor
redundancy information. This uses the lowest two bits of the GEOMETRY
tag to indicate how many redundant blocks belong to the mroot-anchor:

  LFSR_TAG_GEOMETRY       0x0008  v--- ---- ---- 1-rr

This solves a bit of a hole in our redundancy encoding. The plan is for
this info to be stored in the lowest two bits of every pointer, but the
mroot-anchor doesn't really have a pointer.

Though this is just future plans. Right now the redundancy information
is unused. Current implementations should use the GEOMETRY tag 0x0009,
which you may notice implied redundancy level-1. This matches our
current 2-block per mdir default.

Geometry attr encoding:

  .---+---+---+---.      tag (0x0008+r): 1 be16    2 bytes
  |x0008+r| 0 |siz|      weight (0):     1 leb128  1 byte
  +---+---+---+---+      size:           1 leb128  1 byte
  | block_size    |      block_size:     1 leb128  <=4 bytes
  +---+- -+- -+- -+- -.
  | block_count       |  block_count:    1 leb128  <=5 bytes
  '---+- -+- -+- -+- -'  total:                    <=13 bytes

Code changes:

           code          stack
  before: 34092           2880
  after:  34040 (-0.2%)   2880 (+0.0%)
2024-03-19 15:02:02 -05:00
Christopher Haster
130281ac05 Reworked compat flags a bit
Now with a bit more granularity for possibly-future-optional on-disk
data structures:

  LFSR_RCOMPAT_NONSTANDARD  0x0001  ---- ---- ---- ---1 (reserved)
  LFSR_RCOMPAT_MLEAF        0x0002  ---- ---- ---- --1-
  LFSR_RCOMPAT_MSHRUB       0x0004  ---- ---- ---- -1-- (reserved)
  LFSR_RCOMPAT_MTREE        0x0008  ---- ---- ---- 1---
  LFSR_RCOMPAT_BSPROUT      0x0010  ---- ---- ---1 ----
  LFSR_RCOMPAT_BLEAF        0x0020  ---- ---- --1- ----
  LFSR_RCOMPAT_BSHRUB       0x0040  ---- ---- -1-- ----
  LFSR_RCOMPAT_BTREE        0x0080  ---- ---- 1--- ----
  LFSR_RCOMPAT_GRM          0x0100  ---- ---1 ---- ----

  LFSR_WCOMPAT_NONSTANDARD  0x0001  ---- ---- ---- ---1 (reserved)

  LFSR_OCOMPAT_NONSTANDARD  0x0001  ---- ---- ---- ---1 (reserved)

This adds a couple reserved flags:

- LFSR_*COMPAT_NONSTANDARD - This flag will never be set by a standard
  version of littlefs. The idea is to allow implementations with
  non-standard extensions a way to signal potential compatibility issues
  without worrying about future compat flag conflicts.

  This is limited to a single bit, but hey, it's not like it's possible
  to predict all future extensions.

  If a non-standard extension needs more granularity, reservations of
  standard compat flags can always be requested, even if they don't end
  up implemented in standard littlefs. (Though such reservations will
  need a strong motivation, it's not like these flags are free).

- LFSR_RCOMPAT_MSHRUB - In theory littlefs supports a shrubbed mtree,
  where the root is inlined into the mroot. But in practice this turned
  out to be more complicated than it was worth. Still, a future
  implementation may find an mshrub useful, so preserving a compat flag
  for such a case makes sense.

  That being said, I have no plans to add support for mshrubs even in
  the dbg scripts.

  I would like the expected feature-set for debug tools to be
  well-defined, but also conservative. This gets a bit tricky with
  theoretical features like the mshrubs, but until mshrubs are actually
  implemented in littlefs, I would like to consider them non-standard.

  The implication of this is that, while LFSR_RCOMPAT_MSHRUB is
  currently "reserved", it may be repurposed for some other meaning in
  the future.

These changes also rename *COMPATFLAGS -> *COMPAT, and reorder the tags
by decreasing importance. This ordering seems more valuable than the
original intention of making rcompat/wcompat a single bit flip.

Implementation-wise, it's interesting to note the internal-only
LFSR_*COMPAT_OVERFLOW flag. This gets set when out-of-range bits are set
on-disk, and allows us to detect unrepresentable compat flags without
too much extra complexity.

The extra encoding/decoding overhead does add a bit of cost though:

           code          stack
  before: 33944           2880
  after:  34124 (+0.5%)   2880 (+0.0%)
2024-03-16 17:26:04 -05:00
Christopher Haster
d8d6052d90 Dropped -m/--mleaf-weight from dbg scripts
Now that we're assuming a perfect compaction algorithm, and an
infinitely compatible mleaf-bits, there really shouldn't be any reason
to support non-standard mleaf-bits in our scripts, right?

If a configurable mleaf-bits becomes necessary, we can always add this
back in the future.
2024-02-26 14:19:27 -06:00
Christopher Haster
23aab1a238 Increased mleaf-bits to account for better compaction algorithms
As defined previously, mleaf-bits depended on the attr estimate, which
depended on the details of our compaction algorithm:

      block_size
  m = ----------
          a_0

Assuming t=4, the _minimum_ tag encoding:

      block_size   block_size
  m = ---------- = ----------
        3*4 + 4        16

However, with our new compaction algorithm, our attr estimate changes:

      block_size    block_size   block_size
  m = ---------- = ----------- = ----------
          a_1      (5/2)*4 + 2       12

But tying our mleaf-bits to our attr estimate is a bit fragile. Unlike
attr estimate, the calculated mleaf-bits MUST be the same across all
littlefs implementations, or else the filesystem may not be mountable.

We _could_ store mleaf-bits as an fs attr in the mroot, like we do with
name-limit, size-limit, block-size, etc, but I'd prefer to not add fs
attrs unless strictly required. Each fs attr adds complexity to mounting,
which has a non-zero cost and headache.

Instead, we can assume our compaction algorithm is perfect:

      block_size   block_size   block_size
  m = ---------- = ---------- = ----------
         a_inf         2*4           8

This isn't actually achievable without unbounded RAM. But just because
our current implementation is limited to bounded RAM, does not prevent
some other implementation from pushing things further with unbounded
RAM.

In theory, since this is a perfect compaction algorithm, and builds
perfect rbyd trunks, this should be the maximum possible mleaf-bits
achievable in littlefs's current design, and should be compatible with
any future implementation.

---

Worst case, we can always add mleaf-bits as an fs attr retroactively
without breaking backwards compatibility. You would just need to assume
the above block_size-dependent value if the hypothetical mleaf-bits attr
is missing.

This is one nice thing about our fs attr system, it's very flexible.
2024-02-26 14:18:04 -06:00
Christopher Haster
5128522fe2 Renamed script flag -Z/--depth -> -z/--depth
Previously, the intention of upper case -Z was the match -W/--width and
-H/--height, which are uppercase to avoid conflicts with -h/--help.

But -z/--depth isn't _really_ related to -W/-H.

This avoids a conflict with -Z/--lebesgue, but may conflict with
-z/--cat. Fortunately we don't currently have any conflicts with the
latter. Since -z/--depth and -Z/--lebesgue are both disk-layout related,
the risk of conflicts are probably much higher there.
2024-02-14 14:04:45 -06:00
Christopher Haster
2d2c0f19ff Renamed block-size flag in scripts from -B -> b
So now these should be invoked like so:

  $ ./scripts/dbglfs.py -b4096x256 disk

The motivation for this change is to better match other filesystem
tooling. Some prior art:

- mkfs.btrfs
  - -n/--nodesize   => node size in bytes, power of 2 >= sector
  - -s/--sectorsize => sector size in bytes, power of 2
- zfs create
  - -b => block size in bytes
- mkfs.xfs
  - -b => block size in bytes, power of 2 >= sector
  - -s => sector size in bytes, power of 2 >= 512
- mkfs.ext[234]
  - -b => block size in bytes, power of 2 >= 1024
- mkfs.ntfs
  - -c/--cluster-size => cluster size in bytes, power of 2 >= sector
  - -s/--sector-size  => sector size in bytes, power of 2 >= 256
- mkfs.fat
  - -s => cluster size in sectors, power of 2
  - -S => sector size in bytes, power of 2 >= 512

Why care so much about the flag naming for internal scripts? The
intention is for external tooling to eventually use the same set of
flags. And maybe even create publically consumable versions of the dbg
scripts. It's important that if/when this happens flags stay consistent.
Everyone familiar with the ssh -p/scp -P situation knows how annoying
this can be.

It's especially important for littlefs's -b/--block-size flag, since
this will likely end up used everywhere. Unlike other filesystems,
littlefs can't mount without knowing the block-size, so any tool that
mounts littlefs is going to need the -b/--block-size flag.

---

The original motivation for -B was to avoid conflicts with the -b/--by
flag that was already in use in all of the measurement scripts. But
these are internal, and not really littlefs-related, so I don't think
that's a good reason any more. Worst case we can just make the --by flag
-B, or just not have a short form (--by is only 4 letters after all).

Somehow we ended up with no scripts needing both -b/--block-size and
-b/--by so far.

Some other conflicts/inconsistencies tweaks were needed, here are all
the flag changes:

- -B/--block-size   -> -b/--block-size
- -M/--mleaf-weight -> -m/--mleaf-weight
- -b/--btree        -> -B/--btree
- -C/--block-cycles -> -c/--block-cycles  (in tracebd.py)
- -c/--coalesce     -> -S/--coalesce      (in tracebd.py)
- -m/--mdirs        -> -M/--mdirs         (in dbgbmap.py)
- -b/--btrees       -> -B/--btrees        (in dbgbmap.py)
- -d/--datas        -> -D/--datas         (in dbgbmap.py)
2024-02-14 12:45:30 -06:00
Christopher Haster
bea13dcf8e Use sign bit of rbyd.trunk to indicate shrubness of rbyds
Shrubness should have always been a property of lfsr_rbyd_t.

You know you've made a good design decision when things just sort of
fall into place and the code somehow becomes cleaner.

The downside of this change is accessing rbyd trunks requires a mask,
which is annoying, but the upside is we don't need to signal shrubness
via extra booleans in internal functions anymore.

The funny thing is, the actual motivation for this change is was just to
free up a bit in our tag encoding. Simplifying some of the internal
functions was just a nice side effect.

            code          stack
  before:  33940           2928
  after:   33928 (-0.0%)   2912 (-0.5%)
2024-02-03 18:16:45 -06:00
Christopher Haster
15593ccc49 Renamed scratch files -> orphan files
I was originally avoiding naming these orphans, as they're _technically_
not orphans. They do exist in the mtree. But the name orphan just
describes this types purpose too well.

This does lead to some confusing terms, such as the fact that orphan
files can be non-orphaned if there are any in-device references. But I
think this makes sense?

- LFSR_TAG_SCRATCH -> LFSR_TAG_ORPHAN
- LFSR_F_UNCREAT -> LFSR_F_ORPHAN
- test_fscratch.toml -> test_forphan.toml
2024-02-03 18:15:38 -06:00
Christopher Haster
ba505c2a37 Implemented scratch file basics
"Scratch files" are a new file type added to solve the zero-sized
file problem. Though they have a few other uses that may be quite
valuable.

The "zero-sized file problem" is a common surprise for users, where what
seems like a simple file create+write operation:

  lfs_file_open(&lfs, &file, "hi",
          LFS_O_WRONLY | LFS_O_CREAT | LFS_O_EXCL);
  lfs_file_write(&lfs, &file, "hello!", strlen("hello!"));
  lfs_file_close(&lfs, &file);

Can end up create a zero-sized file under powerloss, breaking user
assumptions and their code.

The tricky thing is that this is actually correct behavior as defined by
POSIX. `open` with O_CREAT creats a file entry immediately, which is
initially zero-sized. And the fact that power can be lost between `open`
and `close` isn't really avoidable.

But this is a common enough footgun that it's probably worth deviating
from POSIX here.

But how to avoid zero-sized files exactly? First thought: Delay the file
creation until sync/close, tracking uncreated files in-device until
then. This solves the problem and avoids any intermediary state if we
lose power, but came with a number of headaches:

1. Since we delay file creation, we don't immediately write the filename
   to disk on open. This implies we need to keep the filename allocated
   in RAM until the first sync/close call.

   The requirement to keep the filename allocated for new files until
   first sync/close could be added to open, and with the option to call
   sync immediately to save the filename (and accept the risk of
   zero-sized files), I don't think it would be _that_ bad of an API.

   But it would still be pretty bad. Extra bad because 1. there's no
   way to warn on misuse at compile-time, 2. use-after-free bugs have a
   tendency to go unnoticed annoyingly often, 3. it's a regression from
   the previous API, and 4. who the heck reads the more-or-less same
   `open` documentation for every filesystem they adopt.

2. Without an allocated mid, tracking files internally gets a lot
   harder. The best option I could think of was to keep the opened-file
   linked-list sorted by mid + (in-device) file name.

   This did not feel like a great solutiona and was going to add more
   code cost.

3. Handling mdir splits containing uncreated files adds another
   headache. Complicated lfsr_mdir_estimate further as it needs to
   decide in which mdir the uncreated files will end up, and potentially
   split on a filename that isn't even created yet.

4. Since the number of uncreated files can be potentially unbounded, you
   can't prevent an mdir from filling up with only uncreated files. On
   disk this ends up looking like an "empty" mdir, which need specially
   handling in littlefs to reclaim after powerloss.

   Support for empty mdirs -- the orphaned mdir scan -- was already
   added earlier. We already scan each mdir to build gstate, so it
   doesn't really add much cost.

Notice that last bullet point? We already scan each mdir during mount.
Why not, instead of scanning for orphaned mdirs, scan for orphaned
files?

So this leads to the idea of "scratch files". Instead of actually
delaying file creation, fake it. Create a scratch file during open, and
on the first sync/close, convert it to a regular file. If we lose power,
scan for scratch files during mount, and remove them on first write.

Some tradeoffs:

1. The orphan scan for scratch files is a bit more expensive than for
   mdirs on storage with large block sizes. We need to look at each file
   entry vs just each mdir, which pushed the runtime up to O(BlogB) vs
   O(B).

   Though if you also consider large mtrees, the worst case is still
   O(nlogn).

2. Creating intermediate scratch files adds another commit to file
   creation.

   This is probably not a big issue for flash, but may be more of a
   concern on devices with large prog sizes.

3. Scratch files complicate unrelated mkdir/rename/etc code a bit, since
   we need to consider what happens when the dest is a scratch file.

But the end result is simple. And simple is good. Both for
implementation headaches, and code size. Even if the on-disk state is
conceptually more complicated.

You may have noticed these scratch files are basically isomorphic to
just setting an "uncreated" flag on the file, and that's true. There may
have been a simpler route to end up with the design, but hey, as long as
it works.

As a plus, scratch files present a solution for a couple other things:

1. Removing an open file can become a scratch file until closed.

2. Scratch files can be used as temporary files. Open a file with
   O_DESYNC and never call sync and you have yourself a temporary file.

   Maybe in the future we should add O_TMPFILE to avoid the need for
   unique filenames, but that is low priority.
2024-02-03 18:15:29 -06:00
Christopher Haster
f29a4982c4 Added block-level erased-state checksums
Much like the erased-state checksums in our rbyds (ecksums), these
block-level erased-state checksums (becksums) allow us to detect failed
progs to erased parts of a block and are key to achieving efficient
incremental write performance with large blocks and frequent power
cycles/open-close cycles.

These are also key to achieving _reasonable_ write performance for
simple writes (linear, non-overwriting), since littlefs now relies
solely on becksums to efficiently append to blocks.

Though I suppose the previous block staging logic used with the CTZ
skip-list could be brought back to make becksums optional and avoid
btree lookups during simple writes (we do a _lot_ of btree
lookups)... I'll leave this open as a future optimization...

Unlike in-rbyd ecksums, becksums need to be stored out-of-band so our
data blocks only contain raw data. Since they are optional, an
additional tag in the file's btree makes sense.

Becksums are relatively simple, but they bring some challenges:

1. Adding becksums to file btrees is the first case we have for multiple
   struct tags per btree id.

   This isn't too complicated a problem, but requires some new internal
   btree APIs.

   Looking forward, which I probably shouldn't be doing this often,
   multiple struct tags will also be useful for parity and content ids
   as a part of data redundancy and data deduplication, though I think
   it's uncontroversial to consider this both heavier-weight features...

2. Becksums only work if unfilled blocks are aligned to the prog_size.

   This is the whole point of crystal_size -- to provide temporary
   storage for unaligned writes -- but actually aligning the block
   during writes turns out to be a bit tricky without a bunch of
   unecesssary btree lookups (we already do too many btree lookups!).

   The current implementation here discards the pcache to force
   alignment, taking advantage of the requirement that
   cache_size >= prog_size, but this is corrupting our block checksums.

Code cost:

           code          stack
  before: 31248           2792
  after:  32060 (+2.5%)   2864 (+2.5%)

Also lfsr_ftree_flush needs work. I'm usually open to gotos in C when
they improve internal logic, but even for me, the multiple goto jumps
from every left-neighbor lookup into the block writing loop is a bit
much...
2023-12-14 01:05:34 -06:00
Christopher Haster
6ccd9eb598 Adopted different strategy for hypothetical future configs
Instead of writing every possible config that has the potential to be
useful in the future, stick to just writing the configs that we know are
useful, and error if we see any configs we don't understand.

This prevents unnecessary config bloat, while still allowing configs to
be introduced in a backwards compatible way in the future.

Currently unknown configs are treated as a mount error, but in theory
you could still try to read the filesystem, just with potentially
corrupted data. Maybe this could be behind some sort of "FORCE" mount
flag. littlefs must never write to the filesystem if it finds unknown
configs.

---

This also creates a curious case for the hole in our tag encoding
previously taken up by the OCOMPATFLAGS config. We can query for any
config > SIZELIMIT with lookupnext, but the OCOMPATFLAGS flag would need
an extra lookup which just isn't worth it.

Instead I'm just adding OCOMPATFLAGS back in. To support OCOMPATFLAGS
littlefs has to do literally nothing, so this is really more of a
documentation change. And who know, maybe OCOMPATFLAGS will have some
weird use case in the future...
2023-12-08 14:03:56 -06:00
Christopher Haster
337bdf61ae Rearranged tag encodings to make space for BECKSUM, ORPHAN, etc
Also:

- Renamed GSTATE -> GDELTA for gdelta tags. GSTATE tags added as
  separate in-device flags. The GSTATE tags were already serving
  this dual purpose.

- Renamed BSHRUB* -> SHRUB when the tag is not necessarily operating
  on a file bshrub.

- Renamed TRUNK -> BSHRUB

The tag encoding space now has a couple funky holes:

- 0x0005 - Hole for aligning config tags.

  I guess this could be used for OCOMPATFLAGS in the future?

- 0x0203 - Hole so that ORPHAN can be a 1-bit difference from REG. This
  could be after BOOKMARK, but having a bit to differentiate littlefs
  specific file types (BOOKMARK, ORPHAN) from normal file types (REG,
  DIR) is nice.

  I guess this could be used for SYMLINK if we ever want symlinks in the
  future?

- 0x0314-0x0318 - Hole so that the mdir related tags (MROOT, MDIR,
  MTREE) are nicely aligned.

  This is probably a good place for file-related tags to go in the
  future (BECKSUM, CID, COMPR), but we only have two slots, so will
  probably run out pretty quickly.

- 0x3028 - Hole so that all btree related tags (BTREE, BRANCH, MTREE)
  share a common lower bit-pattern.

  I guess this could be used for MSHRUB if we ever want mshrubs in the
  future?
2023-12-08 13:28:47 -06:00
Christopher Haster
04c6b5a067 Added grm rcompat flag, dropped ocompat, tweaked compat flags a bit
I'm just not seeing a use case for optional compat flags (ocompat), so
dropping for now. It seems their *nix equivalent, feature_compat, is
used to inform fsck of things, but this doesn't really make since in
littlefs since there is no fsck. Or from a different perspective,
littlefs is always running fsck.

Ocompat flags can always be added later (since they do nothing).

Unfortunately this really ruins the alignment of the tag encoding. For
whatever reason config limits tend to come in pairs. For now the best
solution is just leave tag 0x0006 unused. I guess you can consider it
reserved for hypothetical ocompat flags in the future.

---

This adds an rcompat flag for the grm, since in theory a filesystem
doesn't need to support grms if it never renames files (or creates
directories?). But if a filesystem doesn't support grms and a grms gets
written into the filesystem, this can lead to corruption.

I think every piece of gstate will end up with its own compat flag for
this reason.

---

Also renamed r/w/oflags -> r/w/ocompatflags to make their purpose
clearer.

---

The code impact of adding the grm rcompat flag is minimal, and will
probably be less for additional rcompat flags:

            code          stack
  before:  31528           2752
  after:   31584 (+0.2%)   2752 (+0.0%)
2023-12-07 15:05:51 -06:00
Christopher Haster
4793d2f144 Fixed new bshrub roots and related bug fixing
It turned out by implicitly handling root allocation in
lfsr_btree_commit_, we were never allowing lfsr_bshrub_commit to
intercept new roots as new bshrubs. Fixing this required moving the
root allocation logic up into lfsr_btree_commit.

This resulted in quite a bit of small bug fixing because it turns out if
you can never create non-inlined bshrubs you never test non-inlined
bshrubs:

- Our previous rbyd.weight == btree.weight check for if we've reached
  the root no longer works, changed to an explicit check that the blocks
  match. Fortunately, now that new roots set trunk=0 new roots are no
  longer a problematic case.

- We need to only evict when we calculate an accurate estimate, the
  previous code had a bug where eviction occurred early based only on the
  progged-since-last-estimate.

- We need to manually set bshrub.block=mdir.block on new bshrubs,
  otherwise the lfsr_bshrub_isbshrub check fails in mdir commit staging.

Also updated btree/bshrub following code in the dbg scripts, which
mostly meant making them accept both BRANCH and SHRUBBRANCH tags as
btree/bshrub branches. Conveniently very little code needs to change
to extend btree read operations to support bshrubs.
2023-11-21 00:06:08 -06:00
Christopher Haster
6b82e9fb25 Fixed dbg scripts to allow explicit trunks without checksums
Note this is intentionally different from how lfsr_rbyd_fetch behaves
in lfs.c. We only call lfsr_rbyd_fetch when we need validated checksums,
otherwise we just don't fetch.

The dbg scripts, on the other hand, always go through fetch, but it is
useful to be able to inspect the state of incomplete trunks when
debugging.

This use to be how the dbg scripts behaved, but they broke because of
some recent script work.
2023-11-20 23:28:27 -06:00
Christopher Haster
4ecf4cc654 Added dbgbmap.py, tweaked tracebd.py to match
dbgbmap.py parses littlefs's mtree/btrees and displays that status of
every block in use:

  $ ./scripts/dbgbmap.py disk -B4096x256 -Z -H8 -W64
  bd 4096x256,   7.8% mdir,  10.2% btree,  78.1% data
  mmddbbddddddmmddddmmdd--bbbbddddddddddddddbbdddd--ddddddmmdddddd
  mmddddbbddbbddddddddddddddddbbddddbbddddddmmddbbdddddddddddddddd
  bbdddddddddddd--ddddddddddddddddbbddddmmmmddddddddddddmmmmdddddd
  ddddddddddbbdddddddddd--ddddddddddddddmmddddddddddddddddddddmmdd
  ddddddbbddddddddbb--ddddddddddddddddddddbb--mmmmddbbdddddddddddd
  ddddddddddddddddddddbbddbbdddddddddddddddddddddddddddddddddddddd
  dddddddddd--ddddbbddddddddmmbbdd--ddddddddddddddbbmmddddbbdddddd
  ddmmddddddddddmmddddddddmmddddbbbbdddddddd--ddbbddddddmmdd--ddbb

  (ok, it looks a bit better with colors)

dbgbmap.py matches the layout and has the same options as tracebd.py,
allowing the combination of both to provide valuable insight into what
exactly littlefs is doing.

This required a bit of tweaking of tracebd.py to get right, mostly
around conflicting order-based arguments. This also reworks the internal
Bmap class to be more resilient to out-of-window ops, and adds an
optional informative header.
2023-10-30 15:52:33 -05:00
Christopher Haster
46b78de500 Tweaked tracebd.py in a couple of ways, adopted bdgeom/--off/-n
- Tried to do the rescaling a bit better with truncating divisions, so
  there shouldn't be weird cross-pixel updates when things aren't well
  aligned.

- Adopted optional -B<block_size>x<block_count> flag for explicitly
  specifying the block-device geometry in a way that is compatible with
  other scripts. Should adopt this more places.

- Adopted optional <block>.<off> argument for start of range. This
  should match dbgblock.py.

- Adopted '-' for noop/zero-wear.

- Renamed a few internal things.

- Dropped subscript chars for wear, this didn't really add anything and
  can be accomplished by specifying the --wear-chars explicitly.

Also changed dbgblock.py to match, this mostly affects the --off/-n/--size
flags. For example, these are all the same:

  ./scripts/dbgblock.py disk -B4096 --off=10 --size=5
  ./scripts/dbgblock.py disk -B4096 --off=10 -n5
  ./scripts/dbgblock.py disk -B4096 --off=10,15
  ./scripts/dbgblock.py disk -B4096 -n10,15
  ./scripts/dbgblock.py disk -B4096 0.10 -n5

Also also adopted block-device geometry argument across scripts, where
the -B flag can optionally be a full <block_size>x<block_count> geometry:

  ./scripts/tracebd.py disk -B4096x256

Though this is mostly unused outside of tracebd.py right now. It will be
useful for anything that formats littlefs (littlefs-fuse?) and allowing
the format everywhere is a bit of a nice convenience.
2023-10-30 15:52:20 -05:00
Christopher Haster
bfc8021176 Reworked config tags, adopted rflags/wflags/oflags
The biggest change here is the breaking up of the FLAGS config into
RFLAGS/WFLAGS/OFLAGS. This is directly inspired by, and honestly not
much more than a renaming, of the compat/ro_compat/incompat flags found
in Linux/Unix/POSIX filesystems.

I think these were first introduced in ext2? But I need to do a bit more
research on that.

RFLAGS/WFLAGS/OFLAGS provide a much more flexible, and extensible,
feature flag mechanism than the previous minor version bumps.

The (re)naming of these flags is intended to make their requirements
more clear. In order to do the relevant operation, you must understand
every flag set in the relevant flag:

- RFLAGS / incompat flags - All flags must be understood to read the
  filesystem, if not understood the only possible behavior is to fail.

- WFLAGS / ro-compat flags - All flags must be understood to write to the
  filesystem, if not understood the filesystem may be mounted read-only.

- OFLAGS / compat flags - Optional flags, if not understood the relevant
  flag must be cleared before the filesystem can be written to, but other
  than that these flags can mostly be ignored.

Some hypothetical littlefs examples:

- RFLAGS / incompat flags - Transparent compression

  Is this the same as a major disk-version break? Yes kinda? An
  implementation that doesn't understand compression can't read the
  filesystem.

  On the other hand, it's useful to have a filesystem that can read both
  compressed and uncompressed variants.

- WFLAGS / ro-compat flags - Closed block-map

  The idea behind a closed block-map (currently planned), is that
  littlefs maintains in global space a complete mapping of all blocks in
  use by the filesystem.

  For such a mapping to remain consistent means that if you write to the
  filesystem you must understand the closed block-map. Or in other
  words, if you don't understand the closed block-map you must not write
  to the filesystem.

  Reading, on the other hand, can ignore many such write-related
  auxiliary features, so the filesystem can still be read from.

- OFLAGS / compat flags - Global checksums

  Global checksums (currently planned) are extra checksums attached to
  each mdir that when combined self-validate the filesystem.

  But if you don't understand global checksums, you can still read and
  write the filesystem without them. The only catch is that when you write
  to the filesystem, you may end up invalidating the global checksum.

  Clearing the global checksum bit in the OFLAGS is a cheap way to
  signal that the global checksum is no longer valid, allowing you to
  still write to the filesystem without this optional feature.

Other tweaks to note:

- Renamed BLOCKLIMIT/DISKLIMIT -> BLOCKSIZE/BLOCKCOUNT

  Note these are still the _actual_ block_size/block_count minus 1. The
  subtle difference here was the original reason for the name change,
  but after working with it for a bit, I just don't think new, otherwise
  unused, names are worth it.

  The minus 1 stays, however, since it avoids overflow issues at
  extreme boundaries of powers of 2.

- Introduces STAGLIMIT/SATTRLIMIT, sys-attribute parallels to
  UTAGLIMIT/UATTRLIMIT.

  These may be useful if only uattrs are supported, or vice-versa.

- Dropped UATTRLIMIT/SATTRLIMIT to 255 bytes.

  This feels extreme, but matches NAMELIMIT. These _should_ be small,
  and limiting the uattr/sattr size to a single-byte leads to really
  nice packing of the utag+uattrsize in a single integer.

  This can always be expanded in the future if this limit proves to be a
  problem.

- Renamed MLEAFLIMIT -> MDIRLIMIT and (re?)introduced MTREELIMIT.

  These may be useful to limiting the mtree when needed, though it's not
  clear the exact use case quite yet.
2023-10-25 12:08:58 -05:00
Christopher Haster
6dcdf1ed61 Renamed BNAME -> NAME, CCKSUM -> CKSUM
It's probably better to have a separate names for a tag category and any
specific name, but I can't think of a better name for this tag, and I
hadn't noticed that I was already ignoring the C prefix for CCKSUM tags
in many places.

NAME/CKSUM now mean both the specific tag and tag category, which is a
bit of a hack since both happen to be the 0th-subtype of their
categories.
2023-10-25 01:25:39 -05:00
Christopher Haster
240fe4efe4 Changed CKSUM suptype encoding from 0x2000 -> 0x3000
I may be overthinking things, but I'm guessing of all the possible tag
modes we may want to add in the future, we will mostly like want to add
something that looks vaguely tag like. Like the shrub tags, for example.

It's beneficial, ordering wise, for these hypothetical future tags to
come before the cksum tags.

Current tag modes:

  0x0ttt  v--- tttt -ttt tttt  normal tags
  0x1ttt  v--1 tttt -ttt tttt  shrub tags
  0x3tpp  v-11 tttt ---- ---p  cksum tags
  0x4kkk  v1dc kkkk -kkk kkkk  alt tags
2023-10-24 23:46:11 -05:00
Christopher Haster
1fc2f672a2 Tweaked tag encoding a bit post-slice to make space for becksum tags 2023-10-24 22:34:21 -05:00
Christopher Haster
35434f8b54 Removed remnants of slice code, and cleaned things up a bit 2023-10-24 22:26:08 -05:00
Christopher Haster
865477d7e1 Changing coalesce strategy, reimplemented shrub/btree carve
Note this is already showing better code reuse, which is a good sign,
though maybe that's just the benefit of reimplementing similar logic
multiple times.

Now both reading and carving end up in the same lfsr_btree_readnext and
lfsr_btree_buildcarve functions for both btrees and shrubs. Both btrees
and shrubs are fundamentally rbyds, so we can share a lot of
functionality as long as we redirect to the correct commit function at
the last minute. This surprising opportunity for deduplication was
noticed while putting together the dbg scripts.

Planned logic (not actual function names):

  lfsr_file_readnext -> lfsr_shrub_readnext
            |                    |
            |                    v
            '---------> lfsr_btree_readnext

  lfsr_file_flushbuffer -> lfsr_shrub_carve ------------.
            .---------------------'                     |
            v                                           v
  lfsr_file_flushshrub  -> lfsr_btree_carve -> lfsr_btree_buildcarve

Though the btree part of the above statement is only a hypothetical at
the moment. Not even the shrubs can survive compaction now.

The reason is the new SLICE tag which needs low-level support in rbyd
compact. SLICE introduces indirect refernces to data located in the same
rbyd, which removes any copying cost associated with coalescing.
Previously, a large coalesce_size risked O(n^2) runtime when
incrementally append small amounts of data, but with SLICEs we can defer
coalescing to compaction time, where the copy is effectively free.

This compaction-time-coalescing is also hypothetical, which is why our
tests are failing. But the theory is promising.

I was originally against this idea because of how it crosses abstraction
layers, requiring some very low-level code that absolutely can not be
omitted in a simpler littlefs driver. But after working on the actual
file writing code for a while I've become convinced the tradeoff is
worth it.

Note coalesce_size will likely still need to be configurable. Data in
fragmenting/sparse btrees is still susceptible to coalescing, and it's
not clear the impacts of internal fragmentation when data sizes approach
the hard block_size/2 limit.
2023-10-17 23:21:18 -05:00
Christopher Haster
fce1612dc0 Reverted to separate BTREE/BRANCH encodings, reordered on-disk structs
My current thinking is that these are conceptually different types, with
BTREE tags representing the entire btree, and BRANCH tags representing
only the inner btree nodes. We already have multiple btree tags anyways:
btrees attached to files, the mtree, and in the future maybe a bmaptree.

Having separate tags also makes it possible to store a btree in a btree,
though I don't think we'll ever use this functionality.

This also removes the redundant weight field from branches. The
redundant weight field is only a minor cost relative to storage, but it
also takes up a bit of RAM when encoding. Though measurements show this
isn't really significant.

New encodings:

  btree encoding:        branch encoding:
  .---+- -+- -+- -+- -.  .---+- -+- -+- -+- -.
  | weight            |  | blocks            |
  +---+- -+- -+- -+- -+  '                   '
  | blocks            |  '                   '
  '                   '  +---+- -+- -+- -+- -+
  '                   '  | trunk             |
  +---+- -+- -+- -+- -+  +---+- -+- -+- -+- -'
  | trunk             |  |     cksum     |
  +---+- -+- -+- -+- -'  '---+---+---+---'
  |     cksum     |
  '---+---+---+---'

Code/RAM changes:

            code          stack
  before:  30836           2088
  after:   30944 (+0.4%)   2080 (-0.4%)

Also reordered other on-disk structs with weight/size, so such structs
always have weight/size as the first field. This may enable some
optimizations around decoding the weight/size without needing to know
the specific type in some cases.

---

This change shouldn't have affected functionality, but it revealed a bug
in a dtree test, where a did gets caught in an mdir split and the split
name makes the did unreachable.

Marking this as a TODO for now. The fix is going to be a bit involved
(fundamental changes to the opened-mdir list), and similar work is
already planned to make removed files work.
2023-10-15 14:53:07 -05:00
Christopher Haster
b936e33643 Tweaked dbg scripts to resize tag repr based on weight
This a compromise between padding the tag repr correctly and parsing
speed.

If we don't have to traverse an rbyd (for, say, tree printing), we don't
want to since parsing rbyds can get quite slow when things get big
(remember this is a filesystem!). This makes tag padding a bit of a hard
sell.

Previously this was hardcoded to 22 characters, but with the new file
struct printing it quickly became apparently this would be a problematic
limit:

  12288-15711 block w3424 0x1a.0 3424  67 64 79 70 61 69 6e 71  gdypainq

It's interesting to note that this has only become an issue for large
trees, where the weight/size in the tag can be arbitrarily large.

Fortunately we already have the weight of the rbyd after fetch, so we
can use a heuristic similar to the id padding:

  tag padding = 21 + nlog10(max(weight,1)+1)

---

Also dropped extra information with the -x/--device flag. It hasn't
really been useful and was implemented inconsistently. Maybe -x/--device
should just be dropped completely...
2023-10-14 01:25:14 -05:00
Christopher Haster
c8b60f173e Extended dbglfs.py to show file data structures
You can now pass -s/--structs to dbglfs.py to show any file data
structures:

  $ ./scripts/dbglfs.py disk -B4096 -f -s -t
  littlefs v2.0 0x{0,1}.9cf, rev 3, weight 0.256
  {0000,0001}:  -1.1 hello  reg 128, trunk 0x0.993 128
    0000.0993:           .->    0-15 shrubinlined w16 16     6b 75 72 65 65 67 73 63  kureegsc
                       .-+->   16-31 shrubinlined w16 16     6b 65 6a 79 68 78 6f 77  kejyhxow
                       | .->   32-47 shrubinlined w16 16     65 6f 66 75 76 61 6a 73  eofuvajs
                     .-+-+->   48-63 shrubinlined w16 16     6e 74 73 66 67 61 74 6a  ntsfgatj
                     |   .->   64-79 shrubinlined w16 16     70 63 76 79 6c 6e 72 66  pcvylnrf
                     | .-+->   80-95 shrubinlined w16 16     70 69 73 64 76 70 6c 6f  pisdvplo
                     | | .->  96-111 shrubinlined w16 16     74 73 65 69 76 7a 69 6c  tseivzil
                     +-+-+-> 112-127 shrubinlined w16 16     7a 79 70 61 77 72 79 79  zypawryy

This supports the same -b/-t/-i options found in dbgbtree.py, with the
one exception being -z/--struct-depth which is lowercase to avoid
conflict with the -Z/--depth used to indicate the filesystem tree depth.

I think this is a surprisingly reasonable way to show the inner
structure of files without clobbering the user's console with file
contents.

Don't worry, if clobbering is desired, -T/--no-truncate still dumps all
of the file content.

Though it's still up to the user to manually apply the sprout/shrub
overlay. That step is still complex enough to not implement in this
tool yet.

I
2023-10-14 01:25:08 -05:00
Christopher Haster
ef691d4cfe Tweaked rbyd lookup/append to use 0 lower rid bias
Previously our lower/upper bounds were initialized to -1..weight. This
made a lot of the math unintuitive and confusing, and it's not really
necessary to support -1 rids (-1 rids arise naturally in order-statistic
trees the can have weight=0).

The tweak here is to use lower/upper bounds initialized to 0..weight,
which makes the math behave as expected. -1 rids naturally arise from
rid = upper-1.
2023-10-14 00:52:00 -05:00
Christopher Haster
3fb4350ce7 Updated dbg scripts to support shrub trees
- Added shrub tags to tagrepr
- Modified dbgrbyd.py to use last non-shrub trunk by default
- Tweaked dbgrbyd's log mode to find maximum seen weight for id padding
2023-10-13 23:35:03 -05:00
Christopher Haster
2f38822820 Still missing quite a bit, but rudimentary inlined-trees are now working
And by working, I mean you can create inlined trees, just don't
compact/split/move/etc anything. But this does outline the path files
take when writing buffers into inlined trees.

"Inlined trees" in littlefs are entire small rbyd trees embedded as
secondary trees in an mdir's main rbyd tree. When fetching, we can
indicate if a given trunk belongs to the main tree or secondary tree by
setting one of the unused mode bits in the trunk's tag, now called the
"deferred" bit. This bit doesn't need to be included in the alt's "key"
field, so there's no issue with it conflicting with the alt's mode bits.

This requires a bit of tweaking lfsr_rbyd_fetch, since it needs to fall
back to the previous trunk if it discovers the most recent trunk belongs
to an inlined tree. But as a benefit we can leverage the full power of
rbyds in inlined files, including holes, partial updates, etc.

One downside is it looks like these inlined trees may involve more work
in maintining their state correctly, since they need to be sort of
"brought along" when mdirs are compacted, even if they don't actually
have a reference in the mdir yet. But the sheer amount of flexibility
this gives inlined files may make this overhead worth it.
2023-10-13 23:11:35 -05:00