To help with this, added TEST_PL, which is set to true when powerloss
testing. This way tests can check for stronger conditions (no EEXIST)
when not powerloss testing.
With TEST_PL, there's really no reason every test in t5_dirs shouldn't
be reentrant, and this gives us a huge improvement of test coverage very
cheaply.
---
The increased test coverage caught a bug, which is that gstate wasn't
being consumed properly when mtree uninlining. Humorously, this went
unnoticed because the most common form of mtree uninlining, mdir splitting,
ended up incorrectly consuming the gstate twice, which canceled itself
out since the consume operation is basically just xor.
Also added support for printing dstarts to dbglfs.py, to help debugging.
- Changed how names are rendered in dbgbtree.py/dbgmtree.py to be
consistent with non-names. The special rendering isn't really worth it
now that names aren't just ascii/utf8.
- Changed the ordering of raw/device/human rendering of btree entries to
be more consistent with rendering of other entries (don't attempt to
group btree entries).
- Changed dbgmtree.py header to show information about the mtree.
This implementation is in theory correct, but of course, being untested,
who knows?
Though this does come with remounting added to all of the directory
tests. This effectively tests that all of the directory creation tests
we have so far maintain grm=0 after each unmount-mount cycle. Which is
valuable.
This has, in theory, global-removes (grm) being written out as a part of
of directory creation, but they aren't used in any form and so may not
be being written correctly.
But it did require quite a bit of problem solving to get to this point
(the interactions between mtree splitsand grms is really annoying), so
it's worth a commit.
This makes it now possible to create directories in the new system.
The new system now uses a single global "mtree" to store all metadata
entries in the filesystem. In this system, a directory is simply a range
of metadata entries. This has a number of benefits, but does come with
its own problems:
1. We need to indicate which directory each file belongs to. To do this
the file's name entry has been changed to a tuple of leb128-encoded
directory-id + actual file name:
01 66 69 6c 65 2e 74 78 74 .file.txt
^ '----------+----------'
'------------|------------ leb128 directory-id
'------------ ascii/utf8 name
If we include the directory-id as part of filename comparison, files
should naturally be next to other files in the same directory.
2. We need a way allocate directory-ids for new directories. This turns
out to be a bit more tricky than I expected.
We can't use any mid/bid/rid inherent to the mtree, because these
change on any file creation/deletion. And since we commit the did
into the tree, that's not acceptable.
Initially I though you could just find the largest did and increment,
but this gives you no way to reclaim deleted dids. And sure, deleted
dids have no storage consumption, but eventually you will overflow
the did integer. Since this can suddenly happen in a filesystem
that's been in a steady-state for years, that's pretty unnacceptable.
One solution is to do a simple linear search over the mtree for an
unused did. But with a runtime of O(n^2 log(n)), this raises
performance concerns.
Sidenote: It's interesting to note that the Linux kernel's allocation
of process-ids, a very similar problem, is surprisingly complex and
relies on a radix-tree of bitmaps (struct idr). This suggests I'm not
missing an obvious solution somewhere.
The solution I settled on here is to instead treat the set of dids as
a sort of hash table:
1. Hash the full directory path into a did.
2. Perform a linear search until we have no collision.
leb128(truncate28(crc32c("dir")))
.--------'
v
9e cd c8 30 66 69 6c 65 2e 74 78 74 ...0file.txt
'----+----' '----------+----------'
'-----------------|------------ leb128 directory-id
'------------ ascii/utf8 name
Worst case, this can still exhibit the worst case O(n^2 log(n))
performance when we are close to full dids. However that seems
unlikely to happen in practice, since we don't truncate our hashes,
unlike normal hash tables. An additional 32-bit word for each file
is a small price to pay for a low-chance of collisions.
In the current implementation, I do truncate the hash to 28-bits.
Since we encode the hash with leb128, and hashes are statistically
random, this gives us better usage of the leb128 encoding. However
it does limit a 32-bit littlefs to 256 Mi directories.
Maybe this should be a configurable limit in the future.
But that highlights another benefit of this scheme. It's easy to
change in the future without disk changes.
3. We need a way to know if a directory-id is allocated, even if the
directory is empty.
For this we just introduce a new tag: LFSR_TAG_DSTART, which
is an empty file entry that indicates the directory at the given did
in the mtree is allocated.
To create/delete these atomically with the reference in our parent
directory, we can use the GRM system for atomic renames.
Note this isn't implemented yet.
This is also the first time we finally get around to testing all of the
dname lookup functions, so this did find a few bugs, mostly around
reporting the root correctly.
Now that tree rebalancing is implemented and needed a null terminator
anyways, I think it's clear that the benefit of the alt-always pointers
as trunk terminator has pretty limited value.
Now a null or other tag is needed for every trunk, which simplifies
checks for end-of-trunk.
Alt-always tags are still emitted for deletes, etc, but there their
behavior is implicit, so no special checks are needed. Alt-always tags
are naturally cleaned up as a part of rbyd pruning.
This isn't actually for performance reasons, but to reduce storage
overhead of the rbyd metadata tree, which was showing signs of being
problematic for small block sizes.
Originally, the plan for compaction was to rely on the self-balancing
rbyd append algorithm and simply append each tag to a new tree.
Unfortunately, since each append requires a rewrite of the trunk
(current search path), this introduces ~n*log(n) alts but only uses ~n alts
for the final tree. This really starts to put pressure on small blocks,
where the exponential-ness of the log doesn't kick in and overhead
limits are already tight.
Measuring lfsr_mdir_commit code size, this shows a ~556 byte cost on
thumb: 16416 -> 16972 (+3.4%). Though there are still some optimizations
on the table, this implementation needs a cleanup pass.
alt overhead code cost
rebalance: <= 28*n 16972
append: <= 24*n*log(n) 16416
Note these all assume worst case alt overhead, but we _need_ to assume
worst case for our rbyd estimations, or else the filesystem can get
stuck in unrecoverable compaction states.
Because of the code cost I'm not sure if rebalancing will stay, be
optional, or replace append-compaction completely yet.
Some implementation notes:
- Most tree balancing algorithms rely on true recursion, I suspect
recursion may be a hard requirement in general, but it's hard to find
bounded-ram algorithms.
This solution gets around the ram requirement by leveraging the fact
that our tags exist in a log to build up each layer in the tree
tail-recursively. It's interesting to note that this is a special
case of having little ram but lots of storage.
- Humorously this shouldn't result in a performance improvement. Rbyd
trees result in a worst case 2*log(n) height, and rebalancing gives us
a perfect worst case log(n) height, but, since we need an additional
alt pointer for each node in our tree, things bump back up to 2*log(n).
- Originally the plan was to terminate each node with an alt-always tag,
but during implementation I realized there was no easy way to get the
key that splits the children with awkward tree lookups. As a
workaround each node is terminated with an altle tag that contains the
key followed by an unreachable null tag. This is redundant information,
but makes the algorithm easier to implement.
Fortunately null tags use the smallest tag encoding, which isn't that
small, but that means this wastes at most 4*n bytes.
- Note this preserves the first-tag-always-ends-up-at-off=0x4 rule, which
is necessary for the littlefs magic to end up in a consistent place.
- I've dropped dropping vestigial names for now, which means vestigial
names can remain in btrees indefinitely. Need to revisit this.
This should have been done as a part of the earlier tag reencoding work,
since having the block at the end was what allowed us to move the
redund-count out of the tag encoding.
New encoding:
[-- 32-bit csum --]
[-- leb128 weight --]
[-- leb128 trunk --]
[-- leb128 block --]
Note that since our tags have an explicit size, we can store a variable
number of blocks. The plan is to use this to eventually store redundant
copies for error correction:
[-- 32-bit csum --]
[-- leb128 weight --]
[-- leb128 trunk --]
[-- leb128 block --] -.
[-- leb128 block --] +- n redundant blocks
[-- leb128 block --] |
... -'
This does have a significant tradeoff, we need to know the checksum size
to access the btree structure. This doesn't seem like a big deal, but
with the possibility of different checksum types may be an annoying
issue.
Note that FCRC was also flipped for consistency.
Wide tags are a happy accident that fell out of the realization that we
can view all subtypes of a given tag suptype as a range in our rbyd.
Combining this with how natural it is to operate on ranges in an rbyd
allows us to perform operations on an entire range of subtypes as though
it were a single tag.
- lookup wide tag => find the smallest tag with this tag's suptype, O(log(n))
- remove wide tag => remove all tags with this tag's suptype, O(log(n))
- append wide tag => remove all tags with this tag's suptype, and then
append our tag, O(log(n))
This is very useful for littlefs, where we've already been using tag's
subtypes to hold extra type info, and have had to rely on awkward
alternatives such as deleting existing subtypes before writing our new
subtype.
For example, when committing file metadata (not yet implemented), we can
append a wide struct tag to update the metadata while also clearing out any
lingering struct tags from previous commits, all in one rbyd append
operation.
This uses another mode bit in-device to change the behavior of
lfsr_rbyd_commit, of which we have a couple:
vwgrtttt 0TTTTTTT
^^^^---^--------^- valid bit (currently unused, maybe errors?)
'||---|--------|- wide bit, ignores subtype (in-device)
'|---|--------|- grow bit, don't create new id (in-device)
'---|--------|- rm bit, remove this tag (in-device)
'--------|- 4-bit suptype
'- leb128 subtype
This helps with debugging and can avoid weird issues if a file btree
ever accidentally ends up attached to id -1 (due to fs bug).
Though a separate encoding isn't strictly necessary, maybe this should
be reverted at some point.
This replaces unr with null on disk, though note both the rm bit and unr
are used in-device still, they just don't get written to disk.
This removes the need for the rm bit on disk. Since we no longer need to
figure out what's been removed during fetch, we can save this bit for both
internal and future on-disk use.
Special handling of alta allows us to avoid emitting an unr tag (now null) if
the current trunk is truly unreachable. This is minor now, but important
for a theoretical rbyd rebalance operation (planned), which brings the
rbyd overhead down from ~3x to ~2x.
These changes give us two ways to terminate trunks without a tag:
1. With an alta, if the current trunk is unreachable:
altbgt 0x403 w0 0x7b
altbgt 0x402 w0 0x29
alta w0 0x4
2. With a null, if the current trunk is reachable, either for
code convenience or because emitting an alta is impossible (an empty
rbyd for example):
altbgt 0x403 w0 0x7b
altbgt 0x402 w0 0x29
altbgt 0x401 w0 0x4
null
Yet another tag encoding, but hopefully narrowing in on a good long term
design. This change trades a subtype bit for the ability to extend
subtypes indefinitely via leb128 in the future.
The immediate benefit is ~unlimited custom attributes, though I'm not
sure how to make this configurable yet. Extended custom attributes may
have a significant impact on alt tag sizes, so it may be worth
defaulting to only 8-bit custom attributes still.
Tag encoding:
vmmmtttt 0TTTTTTT 0wwwwwww 0sssssss
^--^---^--------^--------^--------^- valid bit
'---|--------|--------|--------|- 3-bit mode
'--------|--------|--------|- 4-bit suptype
'--------|--------|- leb128 subtype
'--------|- leb128 weight
'- leb128 size/jump
This limits subtypes to 7-bits, but this seems very reasonable at the
moment.
This also seems to limit custom attributes to 7-bits, but we can use two
separate suptypes to bring this back up to 8-bits. I was planning to do
this anyways to have separate "user-attributes" and "system-attributes",
so this actually fits in really well.
This helps debug a corrupted mtree with cycles, which has been a problem
in the past.
Also fixed a small rendering issue with dbgmtree.py not connecting inner
tree edges to mdir roots correctly during rendering.
Optimizing a script? This might sound premature, but the tree rendering
was, uh, quite slow for any decently sized (>1024) btree.
The main reason is that tree generation is quite hacky in places, repeatedly
spitting out multiple copies of the inner node's rbyd trees for example.
Rather than rewrite the tree generation implementation to be smarter,
this just changes all edge representations to namedtuples (which may
reduce memory pressure a bit), and collects them into a Python set.
This has the effect of deduplicating generated edges efficiently, and
improved the rendering performance significantly.
---
I also considered memoizing rbyd tree, but dropped the idea since the
current renderer performs well enough.
In addition to plugging in the rbyd and btree renderers in dbgbtree.py,
this required wiring in rbyd trees in the mdirs and mroots.
A bit tricky, but with a more-or-less straightforward implementation thanks
to the common edge description used for the tree renderer.
For example, a relatively small mtree:
$ ./scripts/dbgmtree.py disk -B4096 -t -i
mroot 0x{0,1}.45, rev 1, weight 0
mdir ids tag ...
{0000,0001}: .---------> -1 magic 8 ...
| .-------> config 21 ...
+-+-+ btree 7 ...
0006.000a: | .-+ 0 mdir w1 2 ...
{0002,0003}: | | '-> 0.0 inlined w1 1024 ...
0006.000a: '-+-+ 1 mdir w1 2 ...
{0004,0005}: '-> 1.0 inlined w1 1024 ...
This builds on dbgrbyd.py and dbgbtree.py by allowing for quick
debugging of the littlefs mtree, which is a btree of rbyd pairs with a
few bells and whistles.
This also comes with a number of tweaks to dbgrbyd.py and dbgbtree.py,
mostly changing rbyd addresses to support some more mdir friendly
formats.
The syntax for rbyd addresses is starting to converge into a couple
common patterns, which is nice for quickly determining what type of
address you are looking at at a glance:
- 0x12 => An rbyd at block 0x12
- 0x12.34 => An rbyd at block 0x12 with trunk 0x34
- 0x{12,34} => An rbyd at either block 0x12 or block 0x34 (an mdir)
- 0x{12,34}.56 => An rbyd at either block 0x12 or block 0x34 with trunk 0x56
These scripts have also been updated to support any number of blocks in
an rbyd address, for example 0x{12,34,56,78}. This is a bit of future
proofing. >2 blocks in mdirs may be explored in the future for the
increased redundancy.
LFSR_TAG_BNAME => LFSR_TAG_BRANCH
LFSR_TAG_BRANCH => LFSR_TAG_BTREE
Maybe this will be a problem in the future if our branch structure is
not the same as a standalone btree, but I don't really see that
happening.
This became surprisingly tricky.
The main issue is knowing when to split mdirs, and how to determine
this without wasting erase cycles.
Unlike splitting btree nodes, we can't salvage failed compacts here. As
soon as the salvage commit is written to disk, the commit becomes immediately
visibile to the filesystem because it still exists in the mtree. This is
a problem if we lose power.
We're likely going to need to implement rbyd estimates. This is
something I hoped to avoid because it brings in quite a bit of
complexity and might lead to an annoying amount of storage waste since
our estimates will need to be conservative to avoid unrecoverable
situations.
---
Also changed the on-disk btree/branch struct to store a copy of the weight.
This was already required for the root of the btree, requiring the
weight to be stored in every btree pointer allows better code
deduplication at the cost of some redundancy on btree branches, where
the weight is already implied by the rbyd structure.
This weight is usually a single byte for most branches anyways.
This may be worth revisiting at some point to see if there's any other
unexpected tradeoffs.
This work already indicates we need more data-related helper
functions. We shouldn't need this many function calls to do "simple"
operations such as fetch the superconfig if it exists.
This is an absurd optimization that stems from the observation that the
branch encoding for the inner-rbyds in a B-tree is enough information to
jump directly to the trunk of the rbyd without needing an lfsr_rbyd_fetch.
This results in a pretty ridiculous performance jump from O(m log_m(n/m))
to O(log(m) log_m(n/m)).
If the complexity analysis isn't impressive enough, look at some rough
benchmarking of read operations for 4KiB-block, 1K-entry B-trees:
12KiB ^ :: :. :: .: .: :. : .: :. : : .. : : . : .: : : :
| .:: .::.::.:: ::.::::::::::::.::::::::.::::::::::::.
| : :::':: ::'::'::':: :' :':: :'::::::::': ::::::': :
before | ::: ::' :' :' :: :' '' ' ' '' : : : '' ' ' '
| ::: ''
|:
0B :'------------------------------------------------------>
.17KiB ^ ............:::::::::::::::::::::::::::::
| . .....:::::''''''''' ' ' '
| .::::::::::::
after | :':''
|.::
.:'
0B :------------------------------------------------------->
0 1K
In order for this to work, the branch encoding did need to be tweaked
slightly. Before it stored block+off, now it stores block+trunk where
"trunk" is the offset of the entry point into the rbyd tree. Both off
and trunk are enough info to know when to stop fetching, if necessary,
but trunk allows lookups to jump directly into the branches rbyd tree
without a fetch.
With the change to trunk, lfsr_rbyd_fetch has also be extended to allow
fetching of any internal trunks, not just the last trunk in the commit.
This is very useful for dbgrbyd.py, but doesn't currently have a use in
littlefs itself. But it's at least valuable to have the feature available
in case it does become useful.
Note that two cases still requires the slower O(m log_m(n/m)) lookup
with lfsr_rbyd_fetch:
1. Name lookups, since we currently use a linear-search O(m) to find names.
2. Validating B-tree rbyd's, which requires a linear fetch O(m) to
validate the checksums. We will need to do this at least once
after mount.
It's also worth mentioning this will likely have a large impact on B-tree
traversal speed. Which is huge as I am expecting B-tree traversal to be
the main bottleneck once garbage-collection (or its replacement) is
involved.
I've been wanting to make this change for a while now (tag,id => id,tag).
The id,tag order matches the common lexicographic order used for sorting
tuples. Sorting tag,id tuples by their id first is less common.
The reason for this order in the codebase is because all attrs on disk
start with their tag first, since its decoding determines the purpose of
the id field (keep in mind this includes other non-tree tags such as
crcs, alts, etc). But with the move to storing weights instead of tags
on disk, this gives us a clear point to switch from tag,w to id,tag
ordering.
I may be thinking to much about this, but it does affect a significant
amount of the codebase.
While the previous renderer was "technically correct", the attempt to
map rotated alts to their nearest neighbor just made the resulting tree
an unreadable mess.
Now the renderer prunes alts with unreachable edges (like they would be
during lfsr_rbyd_append). And aligns all alts with their destination
trunk. This results in a much more readable, if slightly less accurate,
rendering of the tree.
Example:
$ ./scripts/dbgrbyd.py -B4096 disk 0 -t
rbyd 0x0, rev 1, size 1508, weight 40
off ids tag data (truncated)
0000032a: .-+-> 0 reg w1 1 73 s
00000026: | '-> 1-5 reg w5 1 62 b
00000259: .-------+---> 6-11 reg w6 1 6f o
00000224: | .-+-+-> 12-17 reg w6 1 6e n
0000028e: | | | '-> 18 reg w1 1 70 p
00000076: | | '---> 19-20 reg w2 1 64 d
0000038f: | | .-> 21-22 reg w2 1 75 u
0000041d: | .---+---+-> 23 reg w1 1 78 x
000001f3: | | .-> 24-27 reg w4 1 6d m
00000486: | | .-----+-> 28-29 reg w2 1 7a z
000004f3: | | | .-----> 30-31 reg w2 1 62 b
000004ba: | | | | .---> 32-35 reg w4 1 61 a
0000058d: | | | | | .-> 36-37 reg w2 1 65 e
000005c6: +-+-+-+-+-+-> 38-39 reg w2 1 66 f
Sorting weights instead of ids just had a number of benefits, suggesting
this is a better design:
- Calculating the id and delta of each rbyd trunk is surprisingly
easier - id is now just lower+w-1, and no extra conditions are
needed for unr tags, which just have a weight of zero.
- Removes ambiguity around which id unr tags should be assigned to,
especially unrs that delete ids.
- No more +-1 weirdness when encoding/decoding tag ids - the weight
can be written as-is and -1 ids are infered from their weight and
position in the tree (lower+w-1 = 0+0-1 = -1).
- Weights compress better under leb128 encoding, since they are usually
quite small.
This does not work as is due to ambiguity with grows and insertions.
Before, these were disambiguated by seperate grow and attr tags. You
effectively grew the neighboring id before claiming its weight
as yours. But now that the attr itself creates the grow/insertion,
it's ambiguous which one is intended.
Changed always-follow alts that we use to terminated grow/shrink/remove
operations to use `altle 0xfff0` instead of `altgt 0`.
`altgt 0` gets the job done as long as you make sure tag 0 never ends up
in an rbyd query. But this kept showing up as a problem, and recent
debugging revealed some erronous 0 tag lookups created vestigial alt
pointers (not necessarily a problem, but space-wasting).
Since we moved to a strict 16-bit tag, making these `altle 0xfff0`
doesn't really have a downside, and means we can expect rbyd lookups
around 0 to behave how one would normally expect.
As a (very minor) plus, the value zero usually has special encodings in
instruction sets, so being able to use it for rbyd_lookups offers a
(very minor) code size saving.
---
Sidenote: The reasons altle/altgt is how it is and asymmetric:
1. Flipping these alts is a single bit-flip, which only happens if they
are asymmetric (only one includes the equal case).
2. Our branches are biased to prefer the larger tag. This makes
traversal trivial. It might be possible to make this still work with
altlt/altge, but would require some increments/decrements, which
might cause problems with boundary conditions around the 16-bit tag
limit.
I only recently noticed there is enough information in each rbyd trunk
to infer the effective grow/shrinks. This has a number of benefits:
- Cleans up the tag encoding a bit, no longer expecting tag size to
sometimes contain a weight (though this could've been fixed other
ways).
0x6 in the lower nibble now reserved exclusively for in-device tags.
- grow/shrinks can be implicit to any tag. Will attempt to leverage this
in the future.
- The weight of an rbyd can no longer go out-of-sync with itself. While
this _shouldn't_ happen normally, if it does I imagine it'd be very
hard to debug.
Now, there is only one source of knowledge about the weight of the
rbyd: The most recent set of alt-pointers.
Note that remove/unreachable tags now behave _very_ differently when it
comes to weight calculation, remove tags require the tree to make the
tag unreachable. This is a tradeoff for the above.
The main motivation for this was issues fitting a good tag encoding into
14-bits. The extra 2-bits (though really only 1 bit was needed) from
making this not a leb encoding opens up the space from 3 suptypes to
15 suptypes, which is nothing to shake a stick at.
The main downsides:
1. We can't rely on leb encoding for effectively-infinite extensions.
2. We can't shorten small tags (crcs, grows, shrinks) to one byte.
For 1., extending the leb encoding beyond 14-bits is already
unpalatable, because it would increase RAM costs in the tag
encoder/decoder,` which must assume a worst-case tag size, and would likely
add storage cost to every alt pointer, more on this in the next section.
The current encoding is quite generous, so I think it is unlikely we
will exceed the 16-bit encoding space. But even if we do, it's possible
to use a spare bit for an "extended" set of tags in the future.
As for 2., the lack of compression is a downside, but I've realized the
only tags that really matter storage-wise are the alt pointers. In any
rbyds there will be roughly O(m log m) alt pointers, but at most O(m) of
any other tags. What this means is that the encoding of any other tag is
in the noise of the encoding of our alt pointers.
Our alt pointers are already pretty densely packed. But because the
sparse key part of alt-pointers are stored as-is, the worst-case
encoding of in-tree tags likely ends up as the encoding of our
alt-pointers. So going up to 3-byte tags adds a surprisingly large
storage cost.
As a minor plus, le16s should be slightly cheaper to encode/decode. It
should also be slightly easier to debug tags on-disk.
tag encoding:
TTTTtttt ttttTTTv
^--------^--^^- 4+3-bit suptype
'---|- 8-bit subtype
'- valid bit
iiii iiiiiii iiiiiii iiiiiii iiiiiii
^- m-bit id/weight
llll lllllll lllllll lllllll lllllll
^- m-bit length/jump
Also renamed the "mk" tags, since they no longer have special behavior
outside of providing names for entries:
- LFSR_TAG_MK => LFSR_TAG_NAME
- LFSR_TAG_MKBRANCH => LFSR_TAG_BNAME
- LFSR_TAG_MKREG => LFSR_TAG_REG
- LFSR_TAG_MKDIR => LFSR_TAG_DIR
B-trees with names are now working, though this required a number of
changes to the B-tree layout:
1. B-tree no-longer require name entries (LFSR_TAG_MK) on each branch.
This is a nice optimization to the design, since these name entries
just waste space in purely weight-based B-trees, which are probably
going to be most B-trees in the filesystem.
If a name entry is missing, the struct entry, which is required,
should have the effective weight of the entry.
The first entry in every rbyd block is expected to be have no name
entry, since this is the default path for B-tree lookups.
2. The first entry in every rbyd block _may_ have a name entry, which
is ignored. I'm calling these "vestigial names" to make them sound
cooler than they actually are.
These vestigial names show up in a couple complicated B-tree
operations:
- During B-tree split, since pending attributes are calculated before
the split, we need to play out pending attributes into the rbyd
before deciding what name becomes the name of entry in the parent.
This creates a vestigial name which we _could_ immediately remove,
but the remove adds additional size to the must-fit split operation
- During B-tree pop/merge, if we remove the leading no-name entry,
the second, named entry becomes the leading entry. This creates a
vestigial name that _looks_ easy enough to remove when making the
pending attributes for pop/merge, but turns out the be surprisingly
tricky if the parent undergoes a split/merge at the same time.
It may be possible to remove all these vestigial names proactively,
but this adds additional rbyd lookups to figure out the exact tag to
remove, complicates things in a fragile way, and doesn't actually
reduce storage costs until the rbyd is compacted.
The main downside is that these B-trees may be a bit more confusing
to debug.
This was a rather simple exercise. lfsr_btree_commit does most of the
work already, so all this needed was setting up the pending attributes
correctly.
Also:
- Tweaked dbgrbyd.py's tree rendering to match dbgbtree.py's.
- Added a print to each B-tree test to help find the resulting B-tree
when debugging.
An example:
$ ./scripts/dbgbtree.py -B4096 disk 0xaa -t -i
btree 0xaa.1000, rev 35, weight 278
block ids name tag data
(truncated)
00aa.1000: +-+ 0-16 branch id16 3 7e d4 10 ~..
007e.0854: | |-> 0 inlined id0 1 73 s
| |-> 1 inlined id1 1 74 t
| |-> 2 inlined id2 1 75 u
| |-> 3 inlined id3 1 76 v
| |-> 4 inlined id4 1 77 w
| |-> 5 inlined id5 1 78 x
| |-> 6 inlined id6 1 79 y
| |-> 7 inlined id7 1 7a z
| |-> 8 inlined id8 1 61 a
| |-> 9 inlined id9 1 62 b
...
This added the idea of block+limit addresses such as 0xaa.1000. Added
this as an option to dbgrbyd.py along with a couple other tweaks:
- Added block+limit support (0x<block>.<limit>).
- Fixed in-device representation indentation when trees are present.
- Changed fromtag to implicitly fixup ids/weights off-by-one-ness, this
is consistent with lfs.c.