Commit Graph

58 Commits

Author SHA1 Message Date
Christopher Haster
cc0ac25b5e Implemented infrastructure necessary for global-removes
This has, in theory, global-removes (grm) being written out as a part of
of directory creation, but they aren't used in any form and so may not
be being written correctly.

But it did require quite a bit of problem solving to get to this point
(the interactions between mtree splitsand grms is really annoying), so
it's worth a commit.
2023-07-18 21:40:30 -05:00
Christopher Haster
da810aca26 Implemented mtree path/dname lookup, rudimentary lfsr_mkdir/lfsr_dir_read
This makes it now possible to create directories in the new system.

The new system now uses a single global "mtree" to store all metadata
entries in the filesystem. In this system, a directory is simply a range
of metadata entries. This has a number of benefits, but does come with
its own problems:

1. We need to indicate which directory each file belongs to. To do this
   the file's name entry has been changed to a tuple of leb128-encoded
   directory-id + actual file name:

     01 66 69 6c 65 2e 74 78 74  .file.txt
      ^ '----------+----------'
      '------------|------------ leb128 directory-id
                   '------------ ascii/utf8 name

   If we include the directory-id as part of filename comparison, files
   should naturally be next to other files in the same directory.

2. We need a way allocate directory-ids for new directories. This turns
   out to be a bit more tricky than I expected.

   We can't use any mid/bid/rid inherent to the mtree, because these
   change on any file creation/deletion. And since we commit the did
   into the tree, that's not acceptable.

   Initially I though you could just find the largest did and increment,
   but this gives you no way to reclaim deleted dids. And sure, deleted
   dids have no storage consumption, but eventually you will overflow
   the did integer. Since this can suddenly happen in a filesystem
   that's been in a steady-state for years, that's pretty unnacceptable.

   One solution is to do a simple linear search over the mtree for an
   unused did. But with a runtime of O(n^2 log(n)), this raises
   performance concerns.

   Sidenote: It's interesting to note that the Linux kernel's allocation
   of process-ids, a very similar problem, is surprisingly complex and
   relies on a radix-tree of bitmaps (struct idr). This suggests I'm not
   missing an obvious solution somewhere.

   The solution I settled on here is to instead treat the set of dids as
   a sort of hash table:

   1. Hash the full directory path into a did.
   2. Perform a linear search until we have no collision.

     leb128(truncate28(crc32c("dir")))
          .--------'
          v
     9e cd c8 30 66 69 6c 65 2e 74 78 74  ...0file.txt
     '----+----' '----------+----------'
          '-----------------|------------ leb128 directory-id
                            '------------ ascii/utf8 name

   Worst case, this can still exhibit the worst case O(n^2 log(n))
   performance when we are close to full dids. However that seems
   unlikely to happen in practice, since we don't truncate our hashes,
   unlike normal hash tables. An additional 32-bit word for each file
   is a small price to pay for a low-chance of collisions.

   In the current implementation, I do truncate the hash to 28-bits.
   Since we encode the hash with leb128, and hashes are statistically
   random, this gives us better usage of the leb128 encoding. However
   it does limit a 32-bit littlefs to 256 Mi directories.

   Maybe this should be a configurable limit in the future.

   But that highlights another benefit of this scheme. It's easy to
   change in the future without disk changes.

3. We need a way to know if a directory-id is allocated, even if the
   directory is empty.

   For this we just introduce a new tag: LFSR_TAG_DSTART, which
   is an empty file entry that indicates the directory at the given did
   in the mtree is allocated.

   To create/delete these atomically with the reference in our parent
   directory, we can use the GRM system for atomic renames.

   Note this isn't implemented yet.

This is also the first time we finally get around to testing all of the
dname lookup functions, so this did find a few bugs, mostly around
reporting the root correctly.
2023-07-05 13:41:21 -05:00
Christopher Haster
cf588ac3fa Dropped alt-always as an rbyd trunk terminator
Now that tree rebalancing is implemented and needed a null terminator
anyways, I think it's clear that the benefit of the alt-always pointers
as trunk terminator has pretty limited value.

Now a null or other tag is needed for every trunk, which simplifies
checks for end-of-trunk.

Alt-always tags are still emitted for deletes, etc, but there their
behavior is implicit, so no special checks are needed. Alt-always tags
are naturally cleaned up as a part of rbyd pruning.
2023-06-27 00:49:31 -05:00
Christopher Haster
43dc3a5c8d Implemented tree rebalancing during rbyd compaction
This isn't actually for performance reasons, but to reduce storage
overhead of the rbyd metadata tree, which was showing signs of being
problematic for small block sizes.

Originally, the plan for compaction was to rely on the self-balancing
rbyd append algorithm and simply append each tag to a new tree.
Unfortunately, since each append requires a rewrite of the trunk
(current search path), this introduces ~n*log(n) alts but only uses ~n alts
for the final tree. This really starts to put pressure on small blocks,
where the exponential-ness of the log doesn't kick in and overhead
limits are already tight.

Measuring lfsr_mdir_commit code size, this shows a ~556 byte cost on
thumb: 16416 -> 16972 (+3.4%). Though there are still some optimizations
on the table, this implementation needs a cleanup pass.

               alt overhead  code cost
  rebalance:        <= 28*n      16972
  append:    <= 24*n*log(n)      16416

Note these all assume worst case alt overhead, but we _need_ to assume
worst case for our rbyd estimations, or else the filesystem can get
stuck in unrecoverable compaction states.

Because of the code cost I'm not sure if rebalancing will stay, be
optional, or replace append-compaction completely yet.

Some implementation notes:

- Most tree balancing algorithms rely on true recursion, I suspect
  recursion may be a hard requirement in general, but it's hard to find
  bounded-ram algorithms.

  This solution gets around the ram requirement by leveraging the fact
  that our tags exist in a log to build up each layer in the tree
  tail-recursively. It's interesting to note that this is a special
  case of having little ram but lots of storage.

- Humorously this shouldn't result in a performance improvement. Rbyd
  trees result in a worst case 2*log(n) height, and rebalancing gives us
  a perfect worst case log(n) height, but, since we need an additional
  alt pointer for each node in our tree, things bump back up to 2*log(n).

- Originally the plan was to terminate each node with an alt-always tag,
  but during implementation I realized there was no easy way to get the
  key that splits the children with awkward tree lookups. As a
  workaround each node is terminated with an altle tag that contains the
  key followed by an unreachable null tag. This is redundant information,
  but makes the algorithm easier to implement.

  Fortunately null tags use the smallest tag encoding, which isn't that
  small, but that means this wastes at most 4*n bytes.

- Note this preserves the first-tag-always-ends-up-at-off=0x4 rule, which
  is necessary for the littlefs magic to end up in a consistent place.

- I've dropped dropping vestigial names for now, which means vestigial
  names can remain in btrees indefinitely. Need to revisit this.
2023-06-25 15:23:46 -05:00
Christopher Haster
e79c15b026 Implemented wide tags for both rbyd commit and lookup
Wide tags are a happy accident that fell out of the realization that we
can view all subtypes of a given tag suptype as a range in our rbyd.
Combining this with how natural it is to operate on ranges in an rbyd
allows us to perform operations on an entire range of subtypes as though
it were a single tag.

- lookup wide tag => find the smallest tag with this tag's suptype, O(log(n))
- remove wide tag => remove all tags with this tag's suptype, O(log(n))
- append wide tag => remove all tags with this tag's suptype, and then
  append our tag, O(log(n))

This is very useful for littlefs, where we've already been using tag's
subtypes to hold extra type info, and have had to rely on awkward
alternatives such as deleting existing subtypes before writing our new
subtype.

For example, when committing file metadata (not yet implemented), we can
append a wide struct tag to update the metadata while also clearing out any
lingering struct tags from previous commits, all in one rbyd append
operation.

This uses another mode bit in-device to change the behavior of
lfsr_rbyd_commit, of which we have a couple:

  vwgrtttt 0TTTTTTT
  ^^^^---^--------^- valid bit (currently unused, maybe errors?)
   '||---|--------|- wide bit, ignores subtype (in-device)
    '|---|--------|- grow bit, don't create new id (in-device)
     '---|--------|- rm bit, remove this tag (in-device)
         '--------|- 4-bit suptype
                  '- leb128 subtype
2023-06-19 16:08:43 -05:00
Christopher Haster
2467d2e486 Added a separate tag encoding for the mtree
This helps with debugging and can avoid weird issues if a file btree
ever accidentally ends up attached to id -1 (due to fs bug).

Though a separate encoding isn't strictly necessary, maybe this should
be reverted at some point.
2023-06-18 15:12:43 -05:00
Christopher Haster
7180b70c9c Allowed "alta" (altbgt 0) to terminate rbyd trunks, dropped rm bit
This replaces unr with null on disk, though note both the rm bit and unr
are used in-device still, they just don't get written to disk.

This removes the need for the rm bit on disk. Since we no longer need to
figure out what's been removed during fetch, we can save this bit for both
internal and future on-disk use.

Special handling of alta allows us to avoid emitting an unr tag (now null) if
the current trunk is truly unreachable. This is minor now, but important
for a theoretical rbyd rebalance operation (planned), which brings the
rbyd overhead down from ~3x to ~2x.

These changes give us two ways to terminate trunks without a tag:

1. With an alta, if the current trunk is unreachable:

     altbgt 0x403 w0 0x7b
     altbgt 0x402 w0 0x29
     alta w0 0x4

2. With a null, if the current trunk is reachable, either for
   code convenience or because emitting an alta is impossible (an empty
   rbyd for example):

     altbgt 0x403 w0 0x7b
     altbgt 0x402 w0 0x29
     altbgt 0x401 w0 0x4
     null
2023-06-17 18:11:45 -05:00
Christopher Haster
2113d877d6 Moved bits around in tag encoding to allow leb128 custom attributes
Yet another tag encoding, but hopefully narrowing in on a good long term
design. This change trades a subtype bit for the ability to extend
subtypes indefinitely via leb128 in the future.

The immediate benefit is ~unlimited custom attributes, though I'm not
sure how to make this configurable yet. Extended custom attributes may
have a significant impact on alt tag sizes, so it may be worth
defaulting to only 8-bit custom attributes still.

Tag encoding:

   vmmmtttt 0TTTTTTT 0wwwwwww 0sssssss
   ^--^---^--------^--------^--------^- valid bit
      '---|--------|--------|--------|- 3-bit mode
          '--------|--------|--------|- 4-bit suptype
                   '--------|--------|- leb128 subtype
                            '--------|- leb128 weight
                                     '- leb128 size/jump

This limits subtypes to 7-bits, but this seems very reasonable at the
moment.

This also seems to limit custom attributes to 7-bits, but we can use two
separate suptypes to bring this back up to 8-bits. I was planning to do
this anyways to have separate "user-attributes" and "system-attributes",
so this actually fits in really well.
2023-06-16 01:51:33 -05:00
Christopher Haster
c60fa69ce1 Optimized dbg*.py tree generation/rendering by deduplicating edges
Optimizing a script? This might sound premature, but the tree rendering
was, uh, quite slow for any decently sized (>1024) btree.

The main reason is that tree generation is quite hacky in places, repeatedly
spitting out multiple copies of the inner node's rbyd trees for example.

Rather than rewrite the tree generation implementation to be smarter,
this just changes all edge representations to namedtuples (which may
reduce memory pressure a bit), and collects them into a Python set.

This has the effect of deduplicating generated edges efficiently, and
improved the rendering performance significantly.

---

I also considered memoizing rbyd tree, but dropped the idea since the
current renderer performs well enough.
2023-05-30 18:17:51 -05:00
Christopher Haster
9b803f9625 Reimplemented tree rendering in dbg*.py scripts
The goal here was to add the option to show the combined rbyd trees in
dbgbtree.py/dbgmtree.py.

This was quite tricky, (and not really helped by the hackiness of these
scripts), but was made a bit easier by adding a general purpose tree renderer
that can render a precomputed set of branches into the tag output.

For example, a 2-deep rendering of a simple btree with a block size of
1KiB, where you can see a bit of the emergent data-structure:

  $ ./scripts/dbgbtree.py disk -B1024 0x223 -t -Z2 -i
  btree 0x223.90, rev 46, weight 1024
  rbyd                       ids       tag                     ...
  0223.0090:     .-+             0-199 btree w200 9            ...
  00cb.0048:     | |     .->      0-39 btree w40 7             ...
                 | | .---+->     40-79 btree w40 7             ...
                 | | | .--->    80-119 btree w40 7             ...
                 | | | | .->   120-159 btree w40 7             ...
                 | '-+-+-+->   160-199 btree w40 7             ...
  0223.0090: .---+-+           200-399 btree w200 9            ...
  013e.004b: |     |     .->   200-239 btree w40 7             ...
             |     | .---+->   240-279 btree w40 8             ...
             |     | | .--->   280-319 btree w40 8             ...
             |     | | | .->   320-359 btree w40 8             ...
             |     '-+-+-+->   360-399 btree w40 8             ...
  0223.0090: | .---+           400-599 btree w200 9            ...
  01a7.004c: | |   |     .->   400-439 btree w40 8             ...
             | |   | .---+->   440-479 btree w40 8             ...
             | |   | | .--->   480-519 btree w40 8             ...
             | |   | | | .->   520-559 btree w40 8             ...
             | |   '-+-+-+->   560-599 btree w40 8             ...
  0223.0090: | | .-+           600-799 btree w200 9            ...
  021e.004c: | | | |     .->   600-639 btree w40 8             ...
             | | | | .---+->   640-679 btree w40 8             ...
             | | | | | .--->   680-719 btree w40 8             ...
             | | | | | | .->   720-759 btree w40 8             ...
             | | | '-+-+-+->   760-799 btree w40 8             ...
  0223.0090: +-+-+-+          800-1023 btree w224 10           ...
  021f.0298:       |     .->   800-839 btree w40 8             ...
                   |   .-+->   840-879 btree w40 8             ...
                   |   | .->   880-919 btree w40 8             ...
                   '---+-+->  920-1023 btree w104 9            ...

This tree renderer also replaces the adhoc tree rendere in dbgrbyd.py
for consistency.
2023-05-30 18:04:54 -05:00
Christopher Haster
b67fcb0ee5 Added dbgmtree.py for debugging the littlefs metadata-tree
This builds on dbgrbyd.py and dbgbtree.py by allowing for quick
debugging of the littlefs mtree, which is a btree of rbyd pairs with a
few bells and whistles.

This also comes with a number of tweaks to dbgrbyd.py and dbgbtree.py,
mostly changing rbyd addresses to support some more mdir friendly
formats.

The syntax for rbyd addresses is starting to converge into a couple
common patterns, which is nice for quickly determining what type of
address you are looking at at a glance:

- 0x12         => An rbyd at block 0x12
- 0x12.34      => An rbyd at block 0x12 with trunk 0x34
- 0x{12,34}    => An rbyd at either block 0x12 or block 0x34 (an mdir)
- 0x{12,34}.56 => An rbyd at either block 0x12 or block 0x34 with trunk 0x56

These scripts have also been updated to support any number of blocks in
an rbyd address, for example 0x{12,34,56,78}. This is a bit of future
proofing. >2 blocks in mdirs may be explored in the future for the
increased redundancy.
2023-05-30 18:04:54 -05:00
Christopher Haster
975a98b099 Renamed a few superblock-related things
- supermdir -> mroot
- supermagic -> magic
- superconfig -> config
2023-05-30 14:46:56 -05:00
Christopher Haster
738eb52159 Tweaked tag encoding/naming for btrees/branches
LFSR_TAG_BNAME => LFSR_TAG_BRANCH
LFSR_TAG_BRANCH => LFSR_TAG_BTREE

Maybe this will be a problem in the future if our branch structure is
not the same as a standalone btree, but I don't really see that
happening.
2023-05-30 13:41:28 -05:00
Christopher Haster
85ebdd0881 Reintroduced Brent's algorithm for cycle detection in lfsr_mount 2023-05-30 13:28:07 -05:00
Christopher Haster
70a3a2b16e Rough implementation of lfsr_format/mount/unmount
This work already indicates we need more data-related helper
functions. We shouldn't need this many function calls to do "simple"
operations such as fetch the superconfig if it exists.
2023-05-30 13:16:03 -05:00
Christopher Haster
a511696bad Added ability to bypass rbyd fetch during B-tree lookups
This is an absurd optimization that stems from the observation that the
branch encoding for the inner-rbyds in a B-tree is enough information to
jump directly to the trunk of the rbyd without needing an lfsr_rbyd_fetch.

This results in a pretty ridiculous performance jump from O(m log_m(n/m))
to O(log(m) log_m(n/m)).

If the complexity analysis isn't impressive enough, look at some rough
benchmarking of read operations for 4KiB-block, 1K-entry B-trees:

   12KiB ^     ::  :. :: .: .: :. : .: :. : : .. : : . : .: : : :
         |    .:: .::.::.:: ::.::::::::::::.::::::::.::::::::::::.
         |    : :::':: ::'::'::':: :' :':: :'::::::::': ::::::': :
before   |  ::: ::' :' :' :: :' '' '  ' '' : : : '' ' ' '
         | :::            ''
         |:
      0B :'------------------------------------------------------>

  .17KiB ^               ............:::::::::::::::::::::::::::::
         |   .   .....:::::'''''''''  '         '          '
         |  .::::::::::::
after    |  :':''
         |.::
         .:'
      0B :------------------------------------------------------->
         0                                                      1K

In order for this to work, the branch encoding did need to be tweaked
slightly. Before it stored block+off, now it stores block+trunk where
"trunk" is the offset of the entry point into the rbyd tree. Both off
and trunk are enough info to know when to stop fetching, if necessary,
but trunk allows lookups to jump directly into the branches rbyd tree
without a fetch.

With the change to trunk, lfsr_rbyd_fetch has also be extended to allow
fetching of any internal trunks, not just the last trunk in the commit.
This is very useful for dbgrbyd.py, but doesn't currently have a use in
littlefs itself. But it's at least valuable to have the feature available
in case it does become useful.

Note that two cases still requires the slower O(m log_m(n/m)) lookup
with lfsr_rbyd_fetch:

1. Name lookups, since we currently use a linear-search O(m) to find names.

2. Validating B-tree rbyd's, which requires a linear fetch O(m) to
   validate the checksums. We will need to do this at least once
   after mount.

It's also worth mentioning this will likely have a large impact on B-tree
traversal speed. Which is huge as I am expecting B-tree traversal to be
the main bottleneck once garbage-collection (or its replacement) is
involved.
2023-04-14 00:51:34 -05:00
Christopher Haster
7eb0c4763a Reversed LFSR_ATTR id/tag argument order
I've been wanting to make this change for a while now (tag,id => id,tag).
The id,tag order matches the common lexicographic order used for sorting
tuples. Sorting tag,id tuples by their id first is less common.

The reason for this order in the codebase is because all attrs on disk
start with their tag first, since its decoding determines the purpose of
the id field (keep in mind this includes other non-tree tags such as
crcs, alts, etc). But with the move to storing weights instead of tags
on disk, this gives us a clear point to switch from tag,w to id,tag
ordering.

I may be thinking to much about this, but it does affect a significant
amount of the codebase.
2023-04-14 00:43:33 -05:00
Christopher Haster
2142b4a09d Reworked dbgrbyd.py's tree renderer to make more sense
While the previous renderer was "technically correct", the attempt to
map rotated alts to their nearest neighbor just made the resulting tree
an unreadable mess.

Now the renderer prunes alts with unreachable edges (like they would be
during lfsr_rbyd_append). And aligns all alts with their destination
trunk. This results in a much more readable, if slightly less accurate,
rendering of the tree.

Example:

  $ ./scripts/dbgrbyd.py -B4096 disk 0 -t
  rbyd 0x0, rev 1, size 1508, weight 40
  off                     ids   tag                     data (truncated)
  0000032a:         .-+->     0 reg w1 1                73                       s
  00000026:         | '->   1-5 reg w5 1                62                       b
  00000259: .-------+--->  6-11 reg w6 1                6f                       o
  00000224: |     .-+-+-> 12-17 reg w6 1                6e                       n
  0000028e: |     | | '->    18 reg w1 1                70                       p
  00000076: |     | '---> 19-20 reg w2 1                64                       d
  0000038f: |     |   .-> 21-22 reg w2 1                75                       u
  0000041d: | .---+---+->    23 reg w1 1                78                       x
  000001f3: | |       .-> 24-27 reg w4 1                6d                       m
  00000486: | | .-----+-> 28-29 reg w2 1                7a                       z
  000004f3: | | | .-----> 30-31 reg w2 1                62                       b
  000004ba: | | | | .---> 32-35 reg w4 1                61                       a
  0000058d: | | | | | .-> 36-37 reg w2 1                65                       e
  000005c6: +-+-+-+-+-+-> 38-39 reg w2 1                66                       f
2023-04-14 00:41:55 -05:00
Christopher Haster
0ccf283321 Changed in-tree tags to store their weights
Sorting weights instead of ids just had a number of benefits, suggesting
this is a better design:

- Calculating the id and delta of each rbyd trunk is surprisingly
  easier - id is now just lower+w-1, and no extra conditions are
  needed for unr tags, which just have a weight of zero.

- Removes ambiguity around which id unr tags should be assigned to,
  especially unrs that delete ids.

- No more +-1 weirdness when encoding/decoding tag ids - the weight
  can be written as-is and -1 ids are infered from their weight and
  position in the tree (lower+w-1 = 0+0-1 = -1).

- Weights compress better under leb128 encoding, since they are usually
  quite small.
2023-04-14 00:32:05 -05:00
Christopher Haster
13852df071 Switched back to altgt 0 for unreachable tags, made btree tests pass again
This fixed two notable bugs:

1. Using "altle 0xfff0" to terminate unreachable rbyd trunks threw off
   id calculations in lfsr_rbyd_fetch searches. We derive the tag's
   id+weight from the lower bound calculated as the sum of all "altle"s
   and an always-followed "altle 0xfff0" throws this off.

   We _could_ derive the tag's id+weight from the upper bound, inverting
   this relationship, but decided to revert back to using "altgt 0" to
   terminate unreachable rbyd trunks.

   Using the lower bound is more intuitive, and "altgt 0" has the
   benifit of supporting variable-length tags if we ever need to adopt
   those.

   To avoid the previous issues around 0-tag holes (which was the original
   motivation for altle 0xfff0), 0-tags are now automatically adjusted
   in lfsr_rbyd_lookup, and avoided in lfsr_rbyd_append.

   But note! if any implemention tries to look up 0-tags, this will
   eventually break! See previous commits for more info.

2. Unfortunately, we can't combine branch updates and weight updates in
   lfsr_btree_commit in the general case.

   If our btree contains bname tags, the weight is attached to the
   bname tag, separately from the branch tag.

   Branch updates in lfsr_btree_commit need two separate attrs for the
   weight and branch struct for this reason, which is unfortunate.

   The amount of extra conditions to make bname+branch pairs work makes
   me want to redesign the inner-nodes of the btrees, but I can't think
   of a better way to approach the problem.
2023-04-14 00:04:58 -05:00
Christopher Haster
e5ad09b380 Some btree progress, implementing rbyd-tag-weight changes 2023-04-14 00:02:51 -05:00
Christopher Haster
85bd28951c Solved rbyd grow/insert ambiguity by adding a device-only "mk" bit
This "mk" bit must not be written to disk, it would conflict with the
other non-tree tag encodings. But we can use this bit in the context of
lfsr_tag_append to disambiguate tags changing weight from inserting new
tags.

Note that in the context of rbyd compactions, this will make things a bit
weird, since it's no longer just a direct one-to-one copy of each tag.

To make compactions a bit easier, this implementation allows the "mk"
bit to be set on any tag and ignores it when the weight delta is zero.

It turns out that this scheme greatly simplifies the awkward
leaf-split-alt calculation that previously had several if statements to
handle different corner cases, with the caveat that "mk" tags need their
ids adjusted by +1. Added this adjustment directly into lfsr_rbyd_append
for now, so the upper-level interface can be a bit more intuitive.
Though this may need to change later if it is more confusing than
helpful.
2023-04-13 19:00:39 -05:00
Christopher Haster
5a1c36f210 Attempting to add weight changes to every rbyd append
This does not work as is due to ambiguity with grows and insertions.

Before, these were disambiguated by seperate grow and attr tags. You
effectively grew the neighboring id before claiming its weight
as yours. But now that the attr itself creates the grow/insertion,
it's ambiguous which one is intended.
2023-04-13 18:58:56 -05:00
Christopher Haster
8f26b68af2 Derived grows/shrinks from rbyd trunk, no longer needing explicit tags
I only recently noticed there is enough information in each rbyd trunk
to infer the effective grow/shrinks. This has a number of benefits:

- Cleans up the tag encoding a bit, no longer expecting tag size to
  sometimes contain a weight (though this could've been fixed other
  ways).

  0x6 in the lower nibble now reserved exclusively for in-device tags.

- grow/shrinks can be implicit to any tag. Will attempt to leverage this
  in the future.

- The weight of an rbyd can no longer go out-of-sync with itself. While
  this _shouldn't_ happen normally, if it does I imagine it'd be very
  hard to debug.

  Now, there is only one source of knowledge about the weight of the
  rbyd: The most recent set of alt-pointers.

Note that remove/unreachable tags now behave _very_ differently when it
comes to weight calculation, remove tags require the tree to make the
tag unreachable. This is a tradeoff for the above.
2023-03-27 01:45:34 -05:00
Christopher Haster
546fff77fb Adopted full le16 tags instead of 14-bit leb128 tags
The main motivation for this was issues fitting a good tag encoding into
14-bits. The extra 2-bits (though really only 1 bit was needed) from
making this not a leb encoding opens up the space from 3 suptypes to
15 suptypes, which is nothing to shake a stick at.

The main downsides:
1. We can't rely on leb encoding for effectively-infinite extensions.
2. We can't shorten small tags (crcs, grows, shrinks) to one byte.

For 1., extending the leb encoding beyond 14-bits is already
unpalatable, because it would increase RAM costs in the tag
encoder/decoder,` which must assume a worst-case tag size, and would likely
add storage cost to every alt pointer, more on this in the next section.

The current encoding is quite generous, so I think it is unlikely we
will exceed the 16-bit encoding space. But even if we do, it's possible
to use a spare bit for an "extended" set of tags in the future.

As for 2., the lack of compression is a downside, but I've realized the
only tags that really matter storage-wise are the alt pointers. In any
rbyds there will be roughly O(m log m) alt pointers, but at most O(m) of
any other tags. What this means is that the encoding of any other tag is
in the noise of the encoding of our alt pointers.

Our alt pointers are already pretty densely packed. But because the
sparse key part of alt-pointers are stored as-is, the worst-case
encoding of in-tree tags likely ends up as the encoding of our
alt-pointers. So going up to 3-byte tags adds a surprisingly large
storage cost.

As a minor plus, le16s should be slightly cheaper to encode/decode. It
should also be slightly easier to debug tags on-disk.

  tag encoding:
                     TTTTtttt ttttTTTv
                        ^--------^--^^- 4+3-bit suptype
                                 '---|- 8-bit subtype
                                     '- valid bit
  iiii iiiiiii iiiiiii iiiiiii iiiiiii
                                     ^- m-bit id/weight
  llll lllllll lllllll lllllll lllllll
                                     ^- m-bit length/jump

Also renamed the "mk" tags, since they no longer have special behavior
outside of providing names for entries:
- LFSR_TAG_MK       => LFSR_TAG_NAME
- LFSR_TAG_MKBRANCH => LFSR_TAG_BNAME
- LFSR_TAG_MKREG    => LFSR_TAG_REG
- LFSR_TAG_MKDIR    => LFSR_TAG_DIR
2023-03-25 14:36:29 -05:00
Christopher Haster
89d5a5ef80 Working implementation of B-tree name split/lookup with vestigial names
B-trees with names are now working, though this required a number of
changes to the B-tree layout:

1. B-tree no-longer require name entries (LFSR_TAG_MK) on each branch.
   This is a nice optimization to the design, since these name entries
   just waste space in purely weight-based B-trees, which are probably
   going to be most B-trees in the filesystem.

   If a name entry is missing, the struct entry, which is required,
   should have the effective weight of the entry.

   The first entry in every rbyd block is expected to be have no name
   entry, since this is the default path for B-tree lookups.

2. The first entry in every rbyd block _may_ have a name entry, which
   is ignored. I'm calling these "vestigial names" to make them sound
   cooler than they actually are.

   These vestigial names show up in a couple complicated B-tree
   operations:

   - During B-tree split, since pending attributes are calculated before
     the split, we need to play out pending attributes into the rbyd
     before deciding what name becomes the name of entry in the parent.
     This creates a vestigial name which we _could_ immediately remove,
     but the remove adds additional size to the must-fit split operation

   - During B-tree pop/merge, if we remove the leading no-name entry,
     the second, named entry becomes the leading entry. This creates a
     vestigial name that _looks_ easy enough to remove when making the
     pending attributes for pop/merge, but turns out the be surprisingly
     tricky if the parent undergoes a split/merge at the same time.

   It may be possible to remove all these vestigial names proactively,
   but this adds additional rbyd lookups to figure out the exact tag to
   remove, complicates things in a fragile way, and doesn't actually
   reduce storage costs until the rbyd is compacted.

   The main downside is that these B-trees may be a bit more confusing
   to debug.
2023-03-21 12:59:46 -05:00
Christopher Haster
8732904ef6 Implemented lfsr_btree_pop and btree merges
B-tree remove/merge is the most annoying part of B-trees.

The implementation here follows the same ideas implemented in push/split:
1. Defer splits/merges until compaction.
2. Assume our split/merge will succeed and play it out into the rbyd.
3. On the first sign of failure, revert any unnecessary changes by
   appending deletes.
4. Do all of this in a single commit to avoid issues with single-prog
   blocks.

Mapping this onto B-tree merge, the condition that triggers merge is
when our rbyd is <1/4 the block_size after compaction, and the condition
that aborts a merge is when our rbyd is >1/2 the block_size, since that
would trigger a split on a later compact.

Weaving this into lfsr_btree_commit is a bit subtle, but relatively
straightforward all things considered.

One downside is it's not physically possible to try merging with both
siblings, so we have to choose just one to attempt a merge. We handle
the corner case of merging the last sibling in a block explicitly, and
in theory the other sibling will eventually trigger a merge during its
own compaction.

Extra annoying are the corner cases with merges in the root rbyd that
make the root rbyd degenerate. We really should avoid a compaction in
this case, as otherwise we would erase a block that we immediately
inline at a significant cost. However determining if our root rbyd is
degenerate is tricky. We can determine a degenerate root with children
by checking if our rbyd's weight matches the B-tree's weight when we
merge. But determining a degenerate root that is a leaf requires
manually looking up both children in lfsr_btree_pop to see if they will
result in a degenerate root. Ugh.

On the bright side, this does all seem to be working now. Which
completes the last of the core B-tree algorithms.
2023-03-17 14:29:02 -05:00
Christopher Haster
a897b875d3 Implemented lfsr_btree_update and added more tests
This was a rather simple exercise. lfsr_btree_commit does most of the
work already, so all this needed was setting up the pending attributes
correctly.

Also:
- Tweaked dbgrbyd.py's tree rendering to match dbgbtree.py's.
- Added a print to each B-tree test to help find the resulting B-tree
  when debugging.
2023-03-17 14:20:40 -05:00
Christopher Haster
89ab174f33 Reworked dbgrbyd.py's --lifetimes so it actually works
Changed so there is no 1-to-1 mk-tag/id assumption, any unique ids
create a simulated lifetime to render. This fixes the issue where
grows/shrinks left-aligned ids confused dbgrbyd.py.

As a plus, now dbgrbyd.py can actually handle multi-id grow/shrinks, and
is more robust against out-of-sync grow/shrinks. This sort of lifetime issues
are when you'd want to run dgbrbyd.py, so it's a bit important this is handled
gracefully.
2023-03-17 14:20:40 -05:00
Christopher Haster
ce599be70d Added scripts/dbgbtree.py for debugging B-trees, tweaked dbgrbyd.py
An example:

  $ ./scripts/dbgbtree.py -B4096 disk 0xaa -t -i
  btree 0xaa.1000, rev 35, weight 278
  block            ids     name     tag                     data
  (truncated)
  00aa.1000: +-+      0-16          branch id16 3           7e d4 10                 ~..
  007e.0854: | |->       0          inlined id0 1           73                       s
             | |->       1          inlined id1 1           74                       t
             | |->       2          inlined id2 1           75                       u
             | |->       3          inlined id3 1           76                       v
             | |->       4          inlined id4 1           77                       w
             | |->       5          inlined id5 1           78                       x
             | |->       6          inlined id6 1           79                       y
             | |->       7          inlined id7 1           7a                       z
             | |->       8          inlined id8 1           61                       a
             | |->       9          inlined id9 1           62                       b
  ...

This added the idea of block+limit addresses such as 0xaa.1000. Added
this as an option to dbgrbyd.py along with a couple other tweaks:

- Added block+limit support (0x<block>.<limit>).
- Fixed in-device representation indentation when trees are present.
- Changed fromtag to implicitly fixup ids/weights off-by-one-ness, this
  is consistent with lfs.c.
2023-03-17 14:20:10 -05:00
Christopher Haster
88e3db98a9 Rough implementation of btree append
This involves many, many hacks, but is enough to test the concept
and start looking at how it interacts with different block sizes.

Note only append (lfsr_btree_push on the end) is implemented, and it
makes some assumption about how the ids can interact when splitting
rbyds.
2023-03-17 14:20:09 -05:00
Christopher Haster
6f4704474b Changed GROW/SHRINK to always be explicit, dropped LFSR_TAG_RM
Generally, less implicit behavior => simpler systems, which is the goal
here.
2023-03-17 14:20:09 -05:00
Christopher Haster
1709aec95b Rough draft of general btree implementation, needs work
This implements a common B-tree using rbyd's as inner nodes.

Since our rbyds actually map to sorted arrays, this fits together quite
well.

The main caveat/concern is that we can't rely on strict knowledge on the
on-disk size of these things. This first shows up with B-tree insertion,
we can't split in preparation to insert as we descend down the tree.

Normally, this means our B-tree would require recursion in order to keep
track of each parent as we descend down our tree. However, we can
avoid this by not storing our parent, but by looking it up again on each
step of the splitting operation.

This brute-force-ish approach makes our algorithm tail-recursive, so
bounded RAM, but raises our runtime from O(logB(n)) to O(logB(n)^2)

That being said, O(logB(n)^2) is still sublinear, and, thanks to
B-tree's extremely high branching factor, may be insignificant.
2023-03-17 14:20:09 -05:00
Christopher Haster
98532f3287 Adding sparse ids to rbyd trees
The way sparse ids interact with our flat id+attr tree is a bit wonky.

Normally, with weighted trees, one entry is associated with one weight.
But since our rbyd trees use id+attr pairs as keys, in theory each set of
id+attr pairs should share a single weight.

  +-+-+-+-> id0,attr0   -.
  | | | '-> id0,attr1    +- weight 5
  | | '-+-> id0,attr2   -'
  | |   |
  | |   '-> id5,attr0   -.
  | '-+-+-> id5,attr1    +- weight 5
  |   | '-> id5,attr2   -'
  |   |
  |   '-+-> id10,attr0  -.
  |     '-> id10,attr1   +- weight 5
  '-------> id10,attr2  -'

To make this representable, we could give a single id+attr pair the
weight, and make the other attrs have a weight of zero. In our current
scheme, attr0 (actually LFSR_TAG_MK) is the only attr required for every
id, and it has the benefit of being the first attr found during
traversal. So it is the obvious choice for storing the id's effective weight.

But there's still some trickiness. Keep in mind our ids are derived from
the weights in the rbyd tree. So if follow intuition and implement this naively:

  +-+-+-+-> id0,attr0   weight 5
  | | | '-> id5,attr1   weight 0
  | | '-+-> id5,attr2   weight 0
  | |   |
  | |   '-> id5,attr0   weight 5
  | '-+-+-> id10,attr1  weight 0
  |   | '-> id10,attr2  weight 0
  |   |
  |   '-+-> id10,attr0  weight 5
  |     '-> id15,attr1  weight 0
  '-------> id15,attr2  weight 0

Suddenly the ids in the attr sets don't match!

It may be possible to work around this with special cases for attr0, but
this would complicate the code and make the presence of attr0 a strict
requirement.

Instead, if we associate each attr set with not the smallest id in the
weight but the largest id in the weight, so id' = id+(weight-1), then
our requirements work out while still keeping each attr set on the same
low-level id:

  +-+-+-+-> id4,attr0   weight 5
  | | | '-> id4,attr1   weight 0
  | | '-+-> id4,attr2   weight 0
  | |   |
  | |   '-> id9,attr0   weight 5
  | '-+-+-> id9,attr1   weight 0
  |   | '-> id9,attr2   weight 0
  |   |
  |   '-+-> id14,attr0  weight 5
  |     '-> id14,attr1  weight 0
  '-------> id14,attr2  weight 0

To be blunt, this is unintuitive, and I'm worried it may be its own
source of complexity/bugs. But this representation does solve the problem
at hand, so I'm just going to see how it works out.
2023-03-17 14:19:49 -05:00
Christopher Haster
27248ad3b6 Some script tweaks around dbgrbyd.py
- Fixed off-by-one id for unknown tags.

- Allowed block_size and block to go unspecified, assumes the block
  device is one big block in that case.

- Added --buffer and --ignore-errors to watch.py, making it a bit better
  for watching slow and sometimes error scripts, such as dbgrbyd.py when
  watching a block device under test.
2023-02-14 14:59:20 -06:00
Christopher Haster
745b89d02b Fixed issue where looking up tag 0 fails after a delete id0
Well not really fixed, more just added an assert to make sure
lfsr_rbyd_lookup is not called with tag 0. Because our alt tags only
encode less-than-or-equal and greater-than, which can be flipped
trivially, it's not possible to encode removal of tag 0 during deletes.

Fortunately, this tag should already not exist for other pragmatic
reasons, it was just used as the initial value for traversals, where it
could cause this bug.
2023-02-12 17:14:57 -06:00
Christopher Haster
11e91e6612 Cleaned up dbgrbyd.py, implemented tree rendering for the new 3-leb encoding 2023-02-12 17:14:57 -06:00
Christopher Haster
588a103db7 Working through 3-leb range deletes, proving to be problematic
The seperate interactions between ids and keys is new and confusing.
This was something that the previous combined weights hid.
2023-02-12 17:14:57 -06:00
Christopher Haster
08f5d9ddf4 Middle of a rewrite for 3-leb encoding, but rbyd appends and creates both work
If we combine rbyd ids and B-tree weights, we need 32-bit ids since this
will eventually need to cover the full range of a file. This simply
doesn't fit into a single word anymore, unless littlefs uses 64-bit tags.
Generally not a great idea for a filesystem targeting even 8-bit
microcontrollers.

So here is a tag encoding that uses 3 leb128 words. This will likely
have more code cost and slightly more disk usage (we can no longer fit
tags into 2 bytes), though with most tags being alt pointers (O(m log m)
vs O(m)), this may not be that significant.

Note that we try to keep tags limited to 14-bits to avoid an extra leb128 byte,
which would likely affect all alt pointers. To pull this off we do away
with the subtype/suptype distinction, limiting in-tree tag types to
10-bits encoded on a per-suptype basis:

  in-tree tags:
                       ttttttt ttt00rv
                                 ^--^^- 10-bit type
                                    '|- removed bit
                                     '- valid bit
  iiii iiiiiii iiiiiii iiiiiii iiiiiii
                                     ^- n-bit id
       lllllll lllllll lllllll lllllll
                                     ^- m-bit length

  out-of-tree tags:
                       ttttttt ttt010v
                                 ^---^- 10-bit type
                                     '- valid bit
                               0000000
       lllllll lllllll lllllll lllllll
                                     ^- m-bit length

  alt tags:
                       kkkkkkk kkk1dcv
                                 ^-^^^- 10-bit key
                                   '||- direction bit
                                    '|- color bit
                                     '- valid bit
  wwww wwwwwww wwwwwww wwwwwww wwwwwww
                                     ^- n-bit weight
       jjjjjjj jjjjjjj jjjjjjj jjjjjjj
                                     ^- m-bit jump

The real pain is that with separate integers for id and tag, it no
longer makes sense to combine these into one big weight field. This
requires a significant rewrite.
2023-02-12 17:14:44 -06:00
Christopher Haster
d08497c299 Rearranged type encoding for crcs so they mostly fit in a single byte
I'm still not sure this is the best decision, since it may add some
complexity to tag parsing, but making most crcs one byte may be valuable
since these exist in every single commit.

This gives tags three high-level encodings:

  in-tree tags:
  iiiiiii iiiiitt ttTTTTT TTT00rv
              ^----^--------^--^^- 16-bit id
                   '--------|--||- 4-bit suptype
                            '--||- 8-bit subtype
                               '|- removed bit
                                '- valid bit
  lllllll lllllll lllllll lllllll
                                ^- n-bit length

  out-of-tree tags:
  ------- -----TT TTTTTTt ttt01pv
                       ^----^--^^- 8-bit subtype
                            '--||- 4-bit suptype
                               '|- perturb bit
                                '- valid bit
  lllllll lllllll lllllll lllllll
                                ^- n-bit length

  alt tags:
  wwwwwww wwwwwww wwwwwww www1dcv
                            ^-^^^- 28-bit weight
                              '||- direction bit
                               '|- color bit
                                '- valid bit
  jjjjjjj jjjjjjj jjjjjjj jjjjjjj
                                ^- n-bit jump

Having the location of the subtype flipped for crc tags vs tree tags is
unintuitive, but it makes more crc tags fit in a single byte, while
preserving expected tag ordering for tree tags.

The only case where crc tags don't fit in a single byte if is non-crc
checksums (sha256?) are added, at which point I expect the subtype to
indicate which checksum algorithm is in use.
2023-02-12 17:14:14 -06:00
Christopher Haster
01dfd1feef Added tree rendering to dbgrbyd.py
$ ./scripts/dbgrbyd.py disk 4096 0 -t
  mdir 0x0, rev 1, size 121
  off                tag                     data (truncated)
  0000005e: +-+-+--> uattr 0x01 4            aa aa aa aa              ....
  0000000f: | | '--> uattr 0x02 4            aa aa aa aa              ....
  0000001d: | '----> uattr 0x03 4            aa aa aa aa              ....
  0000002d: | .----> uattr 0x04 4            aa aa aa aa              ....
  0000003d: | | .--> uattr 0x05 4            aa aa aa aa              ....
  0000004f: '-+-+-+> uattr 0x06 4            aa aa aa aa              ....
  00000004:       '> uattr 0x07 4            aa aa aa aa              ....

Unfortunately this tree can end up a bit confusing when alt pointers
live in unrelated search paths...
2023-02-12 17:14:12 -06:00
Christopher Haster
8581eec433 Added lfs_rbyd_rangesize (untested), some cleanup
Toying around with the idea that since rbyd trees have strict height
gaurantees after compaction (2*log2(n)+1), we can proactively calculate
the maximum on-disk space required for a worst case tree+leb128
encoding.

This would _greatly_ simplify things such as metadata compaction and
splitting, and allow unstorable file metadata (too many custom
attributes) to error early.

One issue is that this calculated worst case will likely be ~4-5x worst
than the actual encoding due to leb128 compression. Though this may be an
acceptable tradeoff for the simplification and more reliable behavior.
2023-02-12 17:14:12 -06:00
Christopher Haster
4aabb8f631 Reworked tag representation so that sup/sub types have expected order
Previously the subtype was encoded above the suptype. This was an issue
if you wanted to, say, traverse all tags in a given suptype.

I'm not sure yet if this sort of functionality is needed, it may be
useful for cleaning up/replacing classes of tags, such as file struct
tags, but not sure yet. At the very least is avoids unintuitive tag
ordering in the tree, which could potential cause problems for
create/deletes.

New encoding:

  tags:
  iiiiiii iiiiitt ttTTTTT TTT0trv
              ^----^--------^-^^^- 16-bit id
                   '--------|-'||- 5-bit suptype (split)
                            '--||- 8-bit subtype
                               '|- perturb/remove bit
                                '- valid bit
  lllllll lllllll lllllll lllllll
                                ^- n-bit length

  alts:
  wwwwwww wwwwwww wwwwwww www1dcv
                            ^^^-^- 28-bit weight
                             '|-|- color bit
                              '-|- direction bit
                                '- valid bit
  jjjjjjj jjjjjjj jjjjjjj jjjjjjj
                                ^- n-bit jump

Also a large amount of name changes and other cleanup.
2023-02-12 17:13:57 -06:00
Christopher Haster
d8540974d4 Significant cleanup of lfs_rbyd_append, simplified pruning rules 2023-02-12 15:17:54 -06:00
Christopher Haster
ef7ee6eb7d Added delete permutation testing (failing as expected) and some minor tweaks 2023-02-12 14:48:33 -06:00
Christopher Haster
5d9e7c8e86 Moved lifetimes in dbgrbyd.py so lifetimes and jumps can both be rendered
$ ./scripts/dbgrbyd.py disk 4096 0 -g -j
  mdir 0x0, rev 1, size 59
  off             tag                     data (truncated)
  00000004: .     createreg id1 4         aa aa aa aa              ....  <--.
  0000000c: |     altblt x80d0 x4                                        -' |
  00000010: | .   createreg id2 4         cc cc cc cc              ....  <. |
  00000018: | |   altrlt x80d0 x4                                        -|-'
  0000001c: | |   altbgt x8000 x10                                       -'
  00000020: | .\  createreg id2 4         bb bb bb bb              ....
  00000028: | | | fcrc 5                  51 53 7d 52 01           QS}R.
  0000002f: | | | crc0 7                  5f db 22 8a 1b 1b 1b     _."....
2023-02-12 14:44:28 -06:00
Christopher Haster
ca710b5a29 Initial, very, very rough implementation of rbyd range deletion
Tree deletion is such a pain. It always seems like an easy addition to
the core algorithm but always comes with problems.

The initial plan for deletes was to iterate through all tags, tombstone,
and then adjust weights as needed. This accomplishes deletes with little
change to the rbyd algorithm, but adds a complex traversal inside the
commit logic. Doable in one commit, but complex. It also risks weird
unintuitive corner cases since the cost of deletion grows with the number
of tags being deleted (O(m log n)).

But this rbyd data structure is a tree, so in theory it's possible to
delete a whole range of tags in a single O(log n) operation.

---

This is a proof-of-concept range deletion algorithm for rbyd trees.

Note, this does not preserve rbyd's balancing properties! But it is no
worse than tombstoning. This is acceptable for littlefs as any
unbalanced trees will be rebalanced during compaction.

The idea is to follow the same underlying dhara algorithm, where we
follow a search path and save any alt pointers not taken, but we follow
both search paths that form the outside of the range, and only keep
outside edges.

For example, a tree:

        .-------o-------.
        |               |
    .---o---.       .---o---.
    |       |       |       |
  .-o-.   .-o-.   .-o-.   .-o-.
  |   |   |   |   |   |   |   |
  a   b   c   d   e   f   g   h

To delete the range d-e, we would search for d, and search for e:

        ********o********
        *               *
    .---*****       *****---.
    |       *       *       |
  .-o-.   .-***   ***-.   .-o-.
  |   |   |   *   *   |   |   |
  a   b   c   d   e   f   g   h

And keep the outside edges:

    .---                 ---.
    |                       |
  .-o-.   .-         -.   .-o-.
  |   |   |           |   |   |
  a   b   c           f   g   h

But how do we combine the outside edges? The simpler option is to do
both searches seperately, one after the other. This would end up with a
tree like this:

    .---------o
    |         |
  .-o-.   .---o
  |   |   |   |
  a   b   c   o---------.
              |         |
              o---.   .-o-.
              |   |   |   |
              _   f   g   h

But this horribly throws off the balance of our tree! It's worse than
tombstoning, and gets worse with more tags.

An alternative strategy, which is used here, is to alternate edges as we
descend down the tree. This unfortunately is more complex, and requires
~2x the RAM, but better preserves the balance of our tree. It isn't
perfect, because we lose color information, but we can leave that up to
compaction:

  .---------o
  |         |
.-o-.       o---------.
|   |       |         |
a   b   .---o       .-o-.
        |   |       |   |
        c   o---.   g   h
            |   |
            _   f

I also hope this can be merged into lfs_rbyd_append, deduplicating the
entire core rbyd append algorithm.
2023-02-12 13:29:06 -06:00
Christopher Haster
12edc5aee3 Added some ascii art to dbgrbyd.py to help debug how ids change over time
An example:

  $ ./scripts/dbgrbyd.py disk 4096 0 -i
  mdir 0x0, rev 1, size 59
  off       tag                     data (truncated)
  00000004: create x01 id1 4        aa aa aa aa              ....      .
  0000000c: altblt x80d0 x4                                            |
  00000010: create x01 id2 4        cc cc cc cc              ....      | .
  00000018: altrlt x80d0 x4                                            | |
  0000001c: altbgt x8000 x10                                           | |
  00000020: create x01 id2 4        bb bb bb bb              ....      | .\
  00000028: fcrc 5                  51 53 7d 52 01           QS}R.     | | |
  0000002f: crc0 7                  5f db 22 8a 1b 1b 1b     _."....   | | |
2023-02-12 13:23:56 -06:00
Christopher Haster
8d4991df6a Added the option to error on no valid commit to dbgrbyd.py
Considered adding --ignore-errors to watch.py, but it doesn't really
make sense with watch.py's implementation. watch.py would need to not update
in realtime, which conflicts with other use cases.
2023-02-12 13:19:46 -06:00
Christopher Haster
5cdda57373 Added the ability to remove rbyd tags via tombstoning
It's quite lucky a spare bit is free in the tag encoding, this means we
don't need a reserved length value as originally planned. We end up using
all of the bits that overlap the alt pointer encoding, which is nice and
unexpected.
2023-02-12 13:16:55 -06:00