Commit Graph

13 Commits

Author SHA1 Message Date
Christopher Haster
f7900edc1c Updated dbg scripts with changes, adopted mbid.mrid in debug prints
This format for mids is a compromise in readability vs debugability.

For example, if our mbid weight is 256 (4KiB blocks), the 19th entry
in the second mdir would be the raw integer 275. With this mid format,
we would print it as 256.19.

The idea is to make it easy to see it's the 19th entry in the mdir while
still making it relatively easy to see that 256.19 and 275 are
equivalent when debugging.

---

The scripts also took some tweaking due to the mid change. Tried to keep
the names consistent, but I don't think it's worthwhile to change too
much of the scripts while they are working.
2023-09-05 10:10:30 -05:00
Christopher Haster
256430213d Dropped separate BTREE/BRANCH encodings
There is a bit of redundancy here, as we already know the weights of
btree's inner-branches from their parents. But in theory sharing the
same encoding for both the top level btree reference and inner-branches
should offer more chance for deduplication and hopefully less code.

This also moves some members around in the btree encoding so that the
redund blocks are at the beginning. This _might_ simplify decoding of
the variable-length redund blocks at some point.

Current btree encoding:

  .----+----+----+----.
  |       blocks    ...  redund leb128s (1-20 bytes)
  :                   :
  |----+----+----+----|
  |       trunk     ...  1 leb128 (1-5 bytes)
  |----+----+----+----|
  |       weight    ...  1 leb128 (1-5 bytes)
  |----+----+----+----|
  |       cksum       |  1 le32 (4 bytes)
  '----+----+----+----'

This also partially reverts some tag name changes:

- BNAME -> BRANCH
- DMARK -> BOOKMARK
2023-08-22 13:20:37 -05:00
Christopher Haster
314c832588 Adopted new struct encoding scheme with redund tag bits
Struct tags, in littlefs, generally encode pointers to different on-disk
data structures. At this point, they've gotten a bit complex, with the
btree struct, for example, containing 1. a block address, 2. the trunk
offset, 3. the weight of the trunk, and 4. a checksum.

Also some future plans:

1. Block redundancy will make it so these pointers may have a variable
   number of block addresses to contend with.

2. Different checksum types may make the checksum field itself variable
   length, at least on larger builds of littlefs.

   This may also happen if we support truncated checksums in littlefs
   for storage saving reasons.

Having two variable sized fields becomes a bit of a pain. We can use the
encoded tag size to figure out the size of one of these fields, but not
both.

The change here makes it so the tag size now determines the checksum
size, requiring the redundancy amount to go somewhere else. This makes
it so checksums can be variably sized, and the explicit redundancy
amount avoids the need to parse the leb128s fully to know how many
blocks we're expecting.

But where to put the redundancy amount?

This commit carves out 2-bits from the struct tag to store the amount of
redundancy to allow up to 3 blocks of redundancy:

  v0000011 0TTTTTrr
  ^--^---^-^----^-^- valid bit
     '---|-|----|-|- 3-bit mode (0x0 for structs)
         '-|----|-|- 4-bit suptype (0x3 for structs)
           '----|-|- 0 bit (reserved for leb128)
                '-|- 5-bit subtype
                  '- 2-bit redund

3 blocks may sound extremely limiting, but it's a common limit for
filesystems, 1. because you have to keep in mind each redundant block
adds that much more writing/reading overhead and 2. the fact
that 2^(2^n)-1 is always divisible by 3 makes >3 parity blocks much more
complicated mathematically.

Worst case, if we ever have >3 redundant blocks, we can create new
struct subtypes. Maybe adding extended struct types that prefix the
block addresses with a leb128 encoding the redundancy amount.

---

As a part of this, reorganized the on-disk btree and ecksum encodings to
put the checksum last.

Also split out the btree and inner btree branches as separate struct
types. The btree includes the weight, whereas the weight is implicit in
inner btree branches. This came about after realizing context-specific
prefixes are relatively easy to add thanks to the composability of our
parsers.

This led to some name collisions though:

- BRANCH   -> BNAME
- BOOKMARK -> DMARK
2023-08-11 12:55:48 -05:00
Christopher Haster
d2f2b53262 Renamed fcksum -> ecksum
This checksum is used to keep track of if we have erased, and not yet
touched, the unused bytes trailing our current commit in the rbyd.

The working theory is that if any prog attempt is made, it will, most
likely, change the checksum of the contents, allowing littlefs to
determine if trailing erased-state is safe to use, even under powerloss.
littlefs can also perturb future data by a single bit, to force this
checksum to always be invalidated during normal operation.

The original name, "forward erased-state checksums (fcksum)", came from the
idea that the checksum "looks forward" into the next commit.

But after using them for a bit, I think the name is unnecessarily
confusing. It, uh, also looks a lot like a swear word. I think
shortening the name to just "erased-state checksums (ecksum)", even
though the previous name is already in use in  a release, is reasonable.

---

It's probably hard to believe but the name change from fcrc -> ecrc
really was unrelated to the crc -> cksum change. But boy is it
convenient for avoiding an awkward name. A lot of these name changes
involved sed scripts, so I didn't notice how awkward fcksum would be to
use until writing this commit message.
2023-08-07 14:34:47 -05:00
Christopher Haster
7031d6e1b3 Changed most references to crc/csum -> cksum
The reason for this is to move away from the idea that littlefs is
strictly bound to CRCs and make the code more welcoming to other
checksum types, such as SHA256, etc.

Of course, changing the name doesn't really do anything. littlefs
actually _is_ strictly bound to CRCs in a couple ways that other
filesystems aren't. These would need to have workarounds for other
checksum types:

- We leverage the parity-preserving nature of (some) CRCs to not have
  to also calculate the parity of metadata in rbyd commits.

- We leverage the linearity of CRCs to retroactively flip the
  perturb bit in the cksum tag without needing to recalculate the
  checksum. Though the fact we need to do this is because of how we
  use parity above, so this may just not be needed for non-CRC
  checksums.

- The plans for global-CRCs (not yet implemented) rely heavily on the
  mathematical properties of CRC polynomials. This doesn't mean
  global-CRCs can't work with other checksums, you would just need to
  find a different type of polynomial.
2023-08-07 14:18:37 -05:00
Christopher Haster
d77a173d5c Changed source to consistently use rid for rbyd ids
Originally it made sense to name the rbyd ids, well, ids, at least in
the internals of the rbyd functions. But this doesn't work well outside
of the rbyd code, where littlefs has to juggle several different id
types with different purposes:

- rid => rbyd-id, 31-bit index into an rbyd
- bid => btree-id, 31-bit index into a btree
- mid => mdir-id, 15-bit+15-bit index into the mtree
- did => directory-id, 31-bit unique identifier for directories

Even though context makes it clear which id the id refers to in the rbyd
internals, updating the name to rid makes it clearer that these are the
same type of id when looking at code both inside and outside the rbyd
functions.
2023-08-07 14:10:09 -05:00
Christopher Haster
64a1b46ea2 Renamed a couple directory related things
- dstart -> bookmark
- *dnamelookup -> *namelookup
2023-08-07 14:00:44 -05:00
Christopher Haster
da4e86abac Split test_dirs into test_dtree and test_dseek
- test_dtree - Pure directory creation/deletion/move functionality
  testing. This ends up testing the core of littlefs file entry
  manipulation, since directories is all we need for that.

- test_dseek - Tests more of the corner cases specific to directory
  iteration and seeking. This involves an annoying amount of
  interactions with concurrent updates to the filesystem that are
  complicated to test for.

Also generally renaming the "fstree" concept to "dtree". This only
changes dbglfs.py as far as I'm aware. It's useful to have a name for
this thing and "directory tree" fits a bit better than "filesystem tree"
which could be ambiguous when we also have the "metadata tree" as a
different concept.
2023-08-04 14:17:42 -05:00
Christopher Haster
56adc60a80 Added grms and reporting of missing/orphaned dstarts in dbglfs.py
With a bit of color, this is very useful for debugging and finding
incorrect dstart/grm situations.

This was used to find and fix the bugs in the previous commit.
2023-07-25 13:58:11 -05:00
Christopher Haster
e4ba43dd5f Extended grm to support two atomics removes
Ugh. I overlooked a weird corner case in rename's behavior that requires
changes to the grm to support.

POSIX's rename, which lfsr_rename is trying to match, supports renaming
files over existing files, effectively removing the previous file during
the rename.

This is supported, even if the files are directories, but with the
additional requirement that the previous directory is empty (matching
the behavior of lfsr_remove).

This creates a weird situation for littlefs. In order to remove
directories in littlefs, we need to atomically remove both the dstart
entry that reserves the directory's did and the directories entry in its
parent. This is made possible by using the grm to mark one entry as
pending removed while removing the other.

But in order to rename atomically, we need to use the grm to mark the
source of the rename as removed while creating/replacing the destination
of the rename.

So we end up needing two grms simultaneously.

This is extra annoying because the niche case of renaming a directory
over another empty directory is the only case where we need two grms,
but this requirement almost doubles the grm size both in-ram and
reserved in every mdir, from 11 bytes to 21 bytes, and increases the
lfs_t size by 28 bytes.

---

Anyways, this commit extends the grm to support up to two pending removes.

Fortunately the implementation was simple since we already have a type
field that can be extended, and grm operations just needed to be
changed from if statements to for loops.
2023-07-25 13:30:04 -05:00
Christopher Haster
c928ed131f Changed all dir tests to be reentrant
To help with this, added TEST_PL, which is set to true when powerloss
testing. This way tests can check for stronger conditions (no EEXIST)
when not powerloss testing.

With TEST_PL, there's really no reason every test in t5_dirs shouldn't
be reentrant, and this gives us a huge improvement of test coverage very
cheaply.

---

The increased test coverage caught a bug, which is that gstate wasn't
being consumed properly when mtree uninlining. Humorously, this went
unnoticed because the most common form of mtree uninlining, mdir splitting,
ended up incorrectly consuming the gstate twice, which canceled itself
out since the consume operation is basically just xor.

Also added support for printing dstarts to dbglfs.py, to help debugging.
2023-07-18 21:40:43 -05:00
Christopher Haster
97f867b28d Added powerloss testing over lfsr_mkdir, fixed grm bugs
The grm bugs were mostly issues with:

1. Not maintaining the on-disk grm state in RAM (lfs->grm) correctly,
   this needs to be updated correctly after every commit or littlefs
   gets a confused.

2. lfsr_fs_fixgrm got a bit confused when it was missed when changing
   the no-rm encoding from 0 to -2. Added some inline functions to help
   avoid this in the future.

3. Leaking information due to mixing fixed sized and variable sized
   encodings of the grm delta in places. This is a bit tricky to write
   an assert for as we don't parse the full grm when we see a no-rm grm.
2023-07-18 21:40:43 -05:00
Christopher Haster
b98ac119c7 Added scripts/dbglfs.py for debugging the filesystem tree
Currently this can show:

- The filesystem tree:

    $ ./scripts/dbglfs.py disk -B4096
    littlefs v2.0 0x{0,1}.bd4, rev 1, weight 41
    mdir         ids      name                        type
    {00ce,00cf}:      0.1 dir0000                     dir 0x1070c73
    {0090,0091}:     2.30 |-> child0000               dir 0x8ec7fb2
    {0042,0043}:    24.35 |   |-> grandchild0000      dir 0x32d990b
    {0009,000a}:     25.0 |   |-> grandchild0001      dir 0x1461a08
                     25.1 |   |-> grandchild0002      dir 0x216e9fc
                     25.2 |   |-> grandchild0003      dir 0x7d6aff
                     25.3 |   |-> grandchild0004      dir 0x4b70e14
                     25.4 |   |-> grandchild0005      dir 0x6dc8d17
                     25.5 |   |-> grandchild0006      dir 0x58c7ee3
                     25.6 |   '-> grandchild0007      dir 0x7e7fde0
    {0090,0091}:     2.31 |-> child0001               dir 0xa87fcb1
    {0077,0078}:     29.1 |   |-> grandchild0000      dir 0x12194f5
                     29.2 |   |-> grandchild0001      dir 0x34a17f6
    ...

- The on-disk filesystem config:

    $ ./scripts/dbglfs.py disk -B4096 -c
    littlefs v2.0 0x{0,1}.bd4, rev 1, weight 41
    mdir         ids      tag                     data (truncated)
         config: major_version 2                  02                       .
                 minor_version 0                  00                       .
                 csum_type 2                      02                       .
                 flags 0                          00                       .
                 block_size 4096                  80 20                    .
                 block_count 256                  80 02                    ..
    ...

- Any global-state on-disk:

    $ ./scripts/dbglfs.py disk -B4096 -g -d
    littlefs v2.0 0x{0,1}.bd4, rev 1, weight 41
    mdir         ids      tag                     data (truncated)
         gstate: grm none                         00 00 00 cc 05 57 ff 7f .....W..
    {0000,0001}:       -1 grm 8                   01 03 24 cc 05 57 ff 7f ..$..W..
    {00ce,00cf}:        0 grm 3                   00 2f 1b                ./.
    {00d0,00d1}:        1 grm 3                   01 04 01                ...

  Note this already reveals a bug, since grm none should be all zeros.

Also made some other minor tweaks to dbg scripts for consistency.
2023-07-18 21:40:41 -05:00