Commit Graph

6 Commits

Author SHA1 Message Date
Christopher Haster
8f26b68af2 Derived grows/shrinks from rbyd trunk, no longer needing explicit tags
I only recently noticed there is enough information in each rbyd trunk
to infer the effective grow/shrinks. This has a number of benefits:

- Cleans up the tag encoding a bit, no longer expecting tag size to
  sometimes contain a weight (though this could've been fixed other
  ways).

  0x6 in the lower nibble now reserved exclusively for in-device tags.

- grow/shrinks can be implicit to any tag. Will attempt to leverage this
  in the future.

- The weight of an rbyd can no longer go out-of-sync with itself. While
  this _shouldn't_ happen normally, if it does I imagine it'd be very
  hard to debug.

  Now, there is only one source of knowledge about the weight of the
  rbyd: The most recent set of alt-pointers.

Note that remove/unreachable tags now behave _very_ differently when it
comes to weight calculation, remove tags require the tree to make the
tag unreachable. This is a tradeoff for the above.
2023-03-27 01:45:34 -05:00
Christopher Haster
546fff77fb Adopted full le16 tags instead of 14-bit leb128 tags
The main motivation for this was issues fitting a good tag encoding into
14-bits. The extra 2-bits (though really only 1 bit was needed) from
making this not a leb encoding opens up the space from 3 suptypes to
15 suptypes, which is nothing to shake a stick at.

The main downsides:
1. We can't rely on leb encoding for effectively-infinite extensions.
2. We can't shorten small tags (crcs, grows, shrinks) to one byte.

For 1., extending the leb encoding beyond 14-bits is already
unpalatable, because it would increase RAM costs in the tag
encoder/decoder,` which must assume a worst-case tag size, and would likely
add storage cost to every alt pointer, more on this in the next section.

The current encoding is quite generous, so I think it is unlikely we
will exceed the 16-bit encoding space. But even if we do, it's possible
to use a spare bit for an "extended" set of tags in the future.

As for 2., the lack of compression is a downside, but I've realized the
only tags that really matter storage-wise are the alt pointers. In any
rbyds there will be roughly O(m log m) alt pointers, but at most O(m) of
any other tags. What this means is that the encoding of any other tag is
in the noise of the encoding of our alt pointers.

Our alt pointers are already pretty densely packed. But because the
sparse key part of alt-pointers are stored as-is, the worst-case
encoding of in-tree tags likely ends up as the encoding of our
alt-pointers. So going up to 3-byte tags adds a surprisingly large
storage cost.

As a minor plus, le16s should be slightly cheaper to encode/decode. It
should also be slightly easier to debug tags on-disk.

  tag encoding:
                     TTTTtttt ttttTTTv
                        ^--------^--^^- 4+3-bit suptype
                                 '---|- 8-bit subtype
                                     '- valid bit
  iiii iiiiiii iiiiiii iiiiiii iiiiiii
                                     ^- m-bit id/weight
  llll lllllll lllllll lllllll lllllll
                                     ^- m-bit length/jump

Also renamed the "mk" tags, since they no longer have special behavior
outside of providing names for entries:
- LFSR_TAG_MK       => LFSR_TAG_NAME
- LFSR_TAG_MKBRANCH => LFSR_TAG_BNAME
- LFSR_TAG_MKREG    => LFSR_TAG_REG
- LFSR_TAG_MKDIR    => LFSR_TAG_DIR
2023-03-25 14:36:29 -05:00
Christopher Haster
f4e2a1a9b4 Tweaked dbgbtree.py to show higher names when -i is not specified
This avoids showing vestigial names in a case where they could be really
confusing.
2023-03-21 13:29:55 -05:00
Christopher Haster
89d5a5ef80 Working implementation of B-tree name split/lookup with vestigial names
B-trees with names are now working, though this required a number of
changes to the B-tree layout:

1. B-tree no-longer require name entries (LFSR_TAG_MK) on each branch.
   This is a nice optimization to the design, since these name entries
   just waste space in purely weight-based B-trees, which are probably
   going to be most B-trees in the filesystem.

   If a name entry is missing, the struct entry, which is required,
   should have the effective weight of the entry.

   The first entry in every rbyd block is expected to be have no name
   entry, since this is the default path for B-tree lookups.

2. The first entry in every rbyd block _may_ have a name entry, which
   is ignored. I'm calling these "vestigial names" to make them sound
   cooler than they actually are.

   These vestigial names show up in a couple complicated B-tree
   operations:

   - During B-tree split, since pending attributes are calculated before
     the split, we need to play out pending attributes into the rbyd
     before deciding what name becomes the name of entry in the parent.
     This creates a vestigial name which we _could_ immediately remove,
     but the remove adds additional size to the must-fit split operation

   - During B-tree pop/merge, if we remove the leading no-name entry,
     the second, named entry becomes the leading entry. This creates a
     vestigial name that _looks_ easy enough to remove when making the
     pending attributes for pop/merge, but turns out the be surprisingly
     tricky if the parent undergoes a split/merge at the same time.

   It may be possible to remove all these vestigial names proactively,
   but this adds additional rbyd lookups to figure out the exact tag to
   remove, complicates things in a fragile way, and doesn't actually
   reduce storage costs until the rbyd is compacted.

   The main downside is that these B-trees may be a bit more confusing
   to debug.
2023-03-21 12:59:46 -05:00
Christopher Haster
a897b875d3 Implemented lfsr_btree_update and added more tests
This was a rather simple exercise. lfsr_btree_commit does most of the
work already, so all this needed was setting up the pending attributes
correctly.

Also:
- Tweaked dbgrbyd.py's tree rendering to match dbgbtree.py's.
- Added a print to each B-tree test to help find the resulting B-tree
  when debugging.
2023-03-17 14:20:40 -05:00
Christopher Haster
ce599be70d Added scripts/dbgbtree.py for debugging B-trees, tweaked dbgrbyd.py
An example:

  $ ./scripts/dbgbtree.py -B4096 disk 0xaa -t -i
  btree 0xaa.1000, rev 35, weight 278
  block            ids     name     tag                     data
  (truncated)
  00aa.1000: +-+      0-16          branch id16 3           7e d4 10                 ~..
  007e.0854: | |->       0          inlined id0 1           73                       s
             | |->       1          inlined id1 1           74                       t
             | |->       2          inlined id2 1           75                       u
             | |->       3          inlined id3 1           76                       v
             | |->       4          inlined id4 1           77                       w
             | |->       5          inlined id5 1           78                       x
             | |->       6          inlined id6 1           79                       y
             | |->       7          inlined id7 1           7a                       z
             | |->       8          inlined id8 1           61                       a
             | |->       9          inlined id9 1           62                       b
  ...

This added the idea of block+limit addresses such as 0xaa.1000. Added
this as an option to dbgrbyd.py along with a couple other tweaks:

- Added block+limit support (0x<block>.<limit>).
- Fixed in-device representation indentation when trees are present.
- Changed fromtag to implicitly fixup ids/weights off-by-one-ness, this
  is consistent with lfs.c.
2023-03-17 14:20:10 -05:00