Commit Graph

19 Commits

Author SHA1 Message Date
Christopher Haster
d77a173d5c Changed source to consistently use rid for rbyd ids
Originally it made sense to name the rbyd ids, well, ids, at least in
the internals of the rbyd functions. But this doesn't work well outside
of the rbyd code, where littlefs has to juggle several different id
types with different purposes:

- rid => rbyd-id, 31-bit index into an rbyd
- bid => btree-id, 31-bit index into a btree
- mid => mdir-id, 15-bit+15-bit index into the mtree
- did => directory-id, 31-bit unique identifier for directories

Even though context makes it clear which id the id refers to in the rbyd
internals, updating the name to rid makes it clearer that these are the
same type of id when looking at code both inside and outside the rbyd
functions.
2023-08-07 14:10:09 -05:00
Christopher Haster
64a1b46ea2 Renamed a couple directory related things
- dstart -> bookmark
- *dnamelookup -> *namelookup
2023-08-07 14:00:44 -05:00
Christopher Haster
c928ed131f Changed all dir tests to be reentrant
To help with this, added TEST_PL, which is set to true when powerloss
testing. This way tests can check for stronger conditions (no EEXIST)
when not powerloss testing.

With TEST_PL, there's really no reason every test in t5_dirs shouldn't
be reentrant, and this gives us a huge improvement of test coverage very
cheaply.

---

The increased test coverage caught a bug, which is that gstate wasn't
being consumed properly when mtree uninlining. Humorously, this went
unnoticed because the most common form of mtree uninlining, mdir splitting,
ended up incorrectly consuming the gstate twice, which canceled itself
out since the consume operation is basically just xor.

Also added support for printing dstarts to dbglfs.py, to help debugging.
2023-07-18 21:40:43 -05:00
Christopher Haster
b98ac119c7 Added scripts/dbglfs.py for debugging the filesystem tree
Currently this can show:

- The filesystem tree:

    $ ./scripts/dbglfs.py disk -B4096
    littlefs v2.0 0x{0,1}.bd4, rev 1, weight 41
    mdir         ids      name                        type
    {00ce,00cf}:      0.1 dir0000                     dir 0x1070c73
    {0090,0091}:     2.30 |-> child0000               dir 0x8ec7fb2
    {0042,0043}:    24.35 |   |-> grandchild0000      dir 0x32d990b
    {0009,000a}:     25.0 |   |-> grandchild0001      dir 0x1461a08
                     25.1 |   |-> grandchild0002      dir 0x216e9fc
                     25.2 |   |-> grandchild0003      dir 0x7d6aff
                     25.3 |   |-> grandchild0004      dir 0x4b70e14
                     25.4 |   |-> grandchild0005      dir 0x6dc8d17
                     25.5 |   |-> grandchild0006      dir 0x58c7ee3
                     25.6 |   '-> grandchild0007      dir 0x7e7fde0
    {0090,0091}:     2.31 |-> child0001               dir 0xa87fcb1
    {0077,0078}:     29.1 |   |-> grandchild0000      dir 0x12194f5
                     29.2 |   |-> grandchild0001      dir 0x34a17f6
    ...

- The on-disk filesystem config:

    $ ./scripts/dbglfs.py disk -B4096 -c
    littlefs v2.0 0x{0,1}.bd4, rev 1, weight 41
    mdir         ids      tag                     data (truncated)
         config: major_version 2                  02                       .
                 minor_version 0                  00                       .
                 csum_type 2                      02                       .
                 flags 0                          00                       .
                 block_size 4096                  80 20                    .
                 block_count 256                  80 02                    ..
    ...

- Any global-state on-disk:

    $ ./scripts/dbglfs.py disk -B4096 -g -d
    littlefs v2.0 0x{0,1}.bd4, rev 1, weight 41
    mdir         ids      tag                     data (truncated)
         gstate: grm none                         00 00 00 cc 05 57 ff 7f .....W..
    {0000,0001}:       -1 grm 8                   01 03 24 cc 05 57 ff 7f ..$..W..
    {00ce,00cf}:        0 grm 3                   00 2f 1b                ./.
    {00d0,00d1}:        1 grm 3                   01 04 01                ...

  Note this already reveals a bug, since grm none should be all zeros.

Also made some other minor tweaks to dbg scripts for consistency.
2023-07-18 21:40:41 -05:00
Christopher Haster
3959545a9f Some dbg script work
- Changed how names are rendered in dbgbtree.py/dbgmtree.py to be
  consistent with non-names. The special rendering isn't really worth it
  now that names aren't just ascii/utf8.

- Changed the ordering of raw/device/human rendering of btree entries to
  be more consistent with rendering of other entries (don't attempt to
  group btree entries).

- Changed dbgmtree.py header to show information about the mtree.
2023-07-18 21:40:38 -05:00
Christopher Haster
c2d9f1b047 Implemented, but untested, global-removes
This implementation is in theory correct, but of course, being untested,
who knows?

Though this does come with remounting added to all of the directory
tests. This effectively tests that all of the directory creation tests
we have so far maintain grm=0 after each unmount-mount cycle. Which is
valuable.
2023-07-18 21:40:36 -05:00
Christopher Haster
cc0ac25b5e Implemented infrastructure necessary for global-removes
This has, in theory, global-removes (grm) being written out as a part of
of directory creation, but they aren't used in any form and so may not
be being written correctly.

But it did require quite a bit of problem solving to get to this point
(the interactions between mtree splitsand grms is really annoying), so
it's worth a commit.
2023-07-18 21:40:30 -05:00
Christopher Haster
da810aca26 Implemented mtree path/dname lookup, rudimentary lfsr_mkdir/lfsr_dir_read
This makes it now possible to create directories in the new system.

The new system now uses a single global "mtree" to store all metadata
entries in the filesystem. In this system, a directory is simply a range
of metadata entries. This has a number of benefits, but does come with
its own problems:

1. We need to indicate which directory each file belongs to. To do this
   the file's name entry has been changed to a tuple of leb128-encoded
   directory-id + actual file name:

     01 66 69 6c 65 2e 74 78 74  .file.txt
      ^ '----------+----------'
      '------------|------------ leb128 directory-id
                   '------------ ascii/utf8 name

   If we include the directory-id as part of filename comparison, files
   should naturally be next to other files in the same directory.

2. We need a way allocate directory-ids for new directories. This turns
   out to be a bit more tricky than I expected.

   We can't use any mid/bid/rid inherent to the mtree, because these
   change on any file creation/deletion. And since we commit the did
   into the tree, that's not acceptable.

   Initially I though you could just find the largest did and increment,
   but this gives you no way to reclaim deleted dids. And sure, deleted
   dids have no storage consumption, but eventually you will overflow
   the did integer. Since this can suddenly happen in a filesystem
   that's been in a steady-state for years, that's pretty unnacceptable.

   One solution is to do a simple linear search over the mtree for an
   unused did. But with a runtime of O(n^2 log(n)), this raises
   performance concerns.

   Sidenote: It's interesting to note that the Linux kernel's allocation
   of process-ids, a very similar problem, is surprisingly complex and
   relies on a radix-tree of bitmaps (struct idr). This suggests I'm not
   missing an obvious solution somewhere.

   The solution I settled on here is to instead treat the set of dids as
   a sort of hash table:

   1. Hash the full directory path into a did.
   2. Perform a linear search until we have no collision.

     leb128(truncate28(crc32c("dir")))
          .--------'
          v
     9e cd c8 30 66 69 6c 65 2e 74 78 74  ...0file.txt
     '----+----' '----------+----------'
          '-----------------|------------ leb128 directory-id
                            '------------ ascii/utf8 name

   Worst case, this can still exhibit the worst case O(n^2 log(n))
   performance when we are close to full dids. However that seems
   unlikely to happen in practice, since we don't truncate our hashes,
   unlike normal hash tables. An additional 32-bit word for each file
   is a small price to pay for a low-chance of collisions.

   In the current implementation, I do truncate the hash to 28-bits.
   Since we encode the hash with leb128, and hashes are statistically
   random, this gives us better usage of the leb128 encoding. However
   it does limit a 32-bit littlefs to 256 Mi directories.

   Maybe this should be a configurable limit in the future.

   But that highlights another benefit of this scheme. It's easy to
   change in the future without disk changes.

3. We need a way to know if a directory-id is allocated, even if the
   directory is empty.

   For this we just introduce a new tag: LFSR_TAG_DSTART, which
   is an empty file entry that indicates the directory at the given did
   in the mtree is allocated.

   To create/delete these atomically with the reference in our parent
   directory, we can use the GRM system for atomic renames.

   Note this isn't implemented yet.

This is also the first time we finally get around to testing all of the
dname lookup functions, so this did find a few bugs, mostly around
reporting the root correctly.
2023-07-05 13:41:21 -05:00
Christopher Haster
cf588ac3fa Dropped alt-always as an rbyd trunk terminator
Now that tree rebalancing is implemented and needed a null terminator
anyways, I think it's clear that the benefit of the alt-always pointers
as trunk terminator has pretty limited value.

Now a null or other tag is needed for every trunk, which simplifies
checks for end-of-trunk.

Alt-always tags are still emitted for deletes, etc, but there their
behavior is implicit, so no special checks are needed. Alt-always tags
are naturally cleaned up as a part of rbyd pruning.
2023-06-27 00:49:31 -05:00
Christopher Haster
43dc3a5c8d Implemented tree rebalancing during rbyd compaction
This isn't actually for performance reasons, but to reduce storage
overhead of the rbyd metadata tree, which was showing signs of being
problematic for small block sizes.

Originally, the plan for compaction was to rely on the self-balancing
rbyd append algorithm and simply append each tag to a new tree.
Unfortunately, since each append requires a rewrite of the trunk
(current search path), this introduces ~n*log(n) alts but only uses ~n alts
for the final tree. This really starts to put pressure on small blocks,
where the exponential-ness of the log doesn't kick in and overhead
limits are already tight.

Measuring lfsr_mdir_commit code size, this shows a ~556 byte cost on
thumb: 16416 -> 16972 (+3.4%). Though there are still some optimizations
on the table, this implementation needs a cleanup pass.

               alt overhead  code cost
  rebalance:        <= 28*n      16972
  append:    <= 24*n*log(n)      16416

Note these all assume worst case alt overhead, but we _need_ to assume
worst case for our rbyd estimations, or else the filesystem can get
stuck in unrecoverable compaction states.

Because of the code cost I'm not sure if rebalancing will stay, be
optional, or replace append-compaction completely yet.

Some implementation notes:

- Most tree balancing algorithms rely on true recursion, I suspect
  recursion may be a hard requirement in general, but it's hard to find
  bounded-ram algorithms.

  This solution gets around the ram requirement by leveraging the fact
  that our tags exist in a log to build up each layer in the tree
  tail-recursively. It's interesting to note that this is a special
  case of having little ram but lots of storage.

- Humorously this shouldn't result in a performance improvement. Rbyd
  trees result in a worst case 2*log(n) height, and rebalancing gives us
  a perfect worst case log(n) height, but, since we need an additional
  alt pointer for each node in our tree, things bump back up to 2*log(n).

- Originally the plan was to terminate each node with an alt-always tag,
  but during implementation I realized there was no easy way to get the
  key that splits the children with awkward tree lookups. As a
  workaround each node is terminated with an altle tag that contains the
  key followed by an unreachable null tag. This is redundant information,
  but makes the algorithm easier to implement.

  Fortunately null tags use the smallest tag encoding, which isn't that
  small, but that means this wastes at most 4*n bytes.

- Note this preserves the first-tag-always-ends-up-at-off=0x4 rule, which
  is necessary for the littlefs magic to end up in a consistent place.

- I've dropped dropping vestigial names for now, which means vestigial
  names can remain in btrees indefinitely. Need to revisit this.
2023-06-25 15:23:46 -05:00
Christopher Haster
799e7cfc81 Flipped btree encoding to allow variable redund blocks
This should have been done as a part of the earlier tag reencoding work,
since having the block at the end was what allowed us to move the
redund-count out of the tag encoding.

New encoding:

  [-- 32-bit csum   --]
  [-- leb128 weight --]
  [-- leb128 trunk  --]
  [-- leb128 block  --]

Note that since our tags have an explicit size, we can store a variable
number of blocks. The plan is to use this to eventually store redundant
copies for error correction:

  [-- 32-bit csum   --]
  [-- leb128 weight --]
  [-- leb128 trunk  --]
  [-- leb128 block  --] -.
  [-- leb128 block  --]  +- n redundant blocks
  [-- leb128 block  --]  |
           ...          -'

This does have a significant tradeoff, we need to know the checksum size
to access the btree structure. This doesn't seem like a big deal, but
with the possibility of different checksum types may be an annoying
issue.

Note that FCRC was also flipped for consistency.
2023-06-19 16:08:50 -05:00
Christopher Haster
e79c15b026 Implemented wide tags for both rbyd commit and lookup
Wide tags are a happy accident that fell out of the realization that we
can view all subtypes of a given tag suptype as a range in our rbyd.
Combining this with how natural it is to operate on ranges in an rbyd
allows us to perform operations on an entire range of subtypes as though
it were a single tag.

- lookup wide tag => find the smallest tag with this tag's suptype, O(log(n))
- remove wide tag => remove all tags with this tag's suptype, O(log(n))
- append wide tag => remove all tags with this tag's suptype, and then
  append our tag, O(log(n))

This is very useful for littlefs, where we've already been using tag's
subtypes to hold extra type info, and have had to rely on awkward
alternatives such as deleting existing subtypes before writing our new
subtype.

For example, when committing file metadata (not yet implemented), we can
append a wide struct tag to update the metadata while also clearing out any
lingering struct tags from previous commits, all in one rbyd append
operation.

This uses another mode bit in-device to change the behavior of
lfsr_rbyd_commit, of which we have a couple:

  vwgrtttt 0TTTTTTT
  ^^^^---^--------^- valid bit (currently unused, maybe errors?)
   '||---|--------|- wide bit, ignores subtype (in-device)
    '|---|--------|- grow bit, don't create new id (in-device)
     '---|--------|- rm bit, remove this tag (in-device)
         '--------|- 4-bit suptype
                  '- leb128 subtype
2023-06-19 16:08:43 -05:00
Christopher Haster
2467d2e486 Added a separate tag encoding for the mtree
This helps with debugging and can avoid weird issues if a file btree
ever accidentally ends up attached to id -1 (due to fs bug).

Though a separate encoding isn't strictly necessary, maybe this should
be reverted at some point.
2023-06-18 15:12:43 -05:00
Christopher Haster
7180b70c9c Allowed "alta" (altbgt 0) to terminate rbyd trunks, dropped rm bit
This replaces unr with null on disk, though note both the rm bit and unr
are used in-device still, they just don't get written to disk.

This removes the need for the rm bit on disk. Since we no longer need to
figure out what's been removed during fetch, we can save this bit for both
internal and future on-disk use.

Special handling of alta allows us to avoid emitting an unr tag (now null) if
the current trunk is truly unreachable. This is minor now, but important
for a theoretical rbyd rebalance operation (planned), which brings the
rbyd overhead down from ~3x to ~2x.

These changes give us two ways to terminate trunks without a tag:

1. With an alta, if the current trunk is unreachable:

     altbgt 0x403 w0 0x7b
     altbgt 0x402 w0 0x29
     alta w0 0x4

2. With a null, if the current trunk is reachable, either for
   code convenience or because emitting an alta is impossible (an empty
   rbyd for example):

     altbgt 0x403 w0 0x7b
     altbgt 0x402 w0 0x29
     altbgt 0x401 w0 0x4
     null
2023-06-17 18:11:45 -05:00
Christopher Haster
2113d877d6 Moved bits around in tag encoding to allow leb128 custom attributes
Yet another tag encoding, but hopefully narrowing in on a good long term
design. This change trades a subtype bit for the ability to extend
subtypes indefinitely via leb128 in the future.

The immediate benefit is ~unlimited custom attributes, though I'm not
sure how to make this configurable yet. Extended custom attributes may
have a significant impact on alt tag sizes, so it may be worth
defaulting to only 8-bit custom attributes still.

Tag encoding:

   vmmmtttt 0TTTTTTT 0wwwwwww 0sssssss
   ^--^---^--------^--------^--------^- valid bit
      '---|--------|--------|--------|- 3-bit mode
          '--------|--------|--------|- 4-bit suptype
                   '--------|--------|- leb128 subtype
                            '--------|- leb128 weight
                                     '- leb128 size/jump

This limits subtypes to 7-bits, but this seems very reasonable at the
moment.

This also seems to limit custom attributes to 7-bits, but we can use two
separate suptypes to bring this back up to 8-bits. I was planning to do
this anyways to have separate "user-attributes" and "system-attributes",
so this actually fits in really well.
2023-06-16 01:51:33 -05:00
Christopher Haster
2339e9865f Tweaked dbgmtree.py -Z flag to include mroots as depth
This helps debug a corrupted mtree with cycles, which has been a problem
in the past.

Also fixed a small rendering issue with dbgmtree.py not connecting inner
tree edges to mdir roots correctly during rendering.
2023-06-01 13:52:14 -05:00
Christopher Haster
c60fa69ce1 Optimized dbg*.py tree generation/rendering by deduplicating edges
Optimizing a script? This might sound premature, but the tree rendering
was, uh, quite slow for any decently sized (>1024) btree.

The main reason is that tree generation is quite hacky in places, repeatedly
spitting out multiple copies of the inner node's rbyd trees for example.

Rather than rewrite the tree generation implementation to be smarter,
this just changes all edge representations to namedtuples (which may
reduce memory pressure a bit), and collects them into a Python set.

This has the effect of deduplicating generated edges efficiently, and
improved the rendering performance significantly.

---

I also considered memoizing rbyd tree, but dropped the idea since the
current renderer performs well enough.
2023-05-30 18:17:51 -05:00
Christopher Haster
af0c3967b4 Adopted new tree renderer in dbgmtree, implemented mtree rendering
In addition to plugging in the rbyd and btree renderers in dbgbtree.py,
this required wiring in rbyd trees in the mdirs and mroots.

A bit tricky, but with a more-or-less straightforward implementation thanks
to the common edge description used for the tree renderer.

For example, a relatively small mtree:

  $ ./scripts/dbgmtree.py disk -B4096 -t -i
  mroot 0x{0,1}.45, rev 1, weight 0
  mdir                     ids   tag                     ...
  {0000,0001}: .--------->    -1 magic 8                 ...
               | .------->       config 21               ...
               +-+-+             btree 7                 ...
    0006.000a:     | .-+       0 mdir w1 2               ...
  {0002,0003}:     | | '->   0.0 inlined w1 1024         ...
    0006.000a:     '-+-+       1 mdir w1 2               ...
  {0004,0005}:         '->   1.0 inlined w1 1024         ...
2023-05-30 18:10:32 -05:00
Christopher Haster
b67fcb0ee5 Added dbgmtree.py for debugging the littlefs metadata-tree
This builds on dbgrbyd.py and dbgbtree.py by allowing for quick
debugging of the littlefs mtree, which is a btree of rbyd pairs with a
few bells and whistles.

This also comes with a number of tweaks to dbgrbyd.py and dbgbtree.py,
mostly changing rbyd addresses to support some more mdir friendly
formats.

The syntax for rbyd addresses is starting to converge into a couple
common patterns, which is nice for quickly determining what type of
address you are looking at at a glance:

- 0x12         => An rbyd at block 0x12
- 0x12.34      => An rbyd at block 0x12 with trunk 0x34
- 0x{12,34}    => An rbyd at either block 0x12 or block 0x34 (an mdir)
- 0x{12,34}.56 => An rbyd at either block 0x12 or block 0x34 with trunk 0x56

These scripts have also been updated to support any number of blocks in
an rbyd address, for example 0x{12,34,56,78}. This is a bit of future
proofing. >2 blocks in mdirs may be explored in the future for the
increased redundancy.
2023-05-30 18:04:54 -05:00