This was a surprising side-effect the script rework: Realizing the
internal btree/rbyd lookup APIs were awkwardly inconsistent and could be
improved with a couple tweaks:
- Adopted lookupleaf name for functions that return leaf rbyds/mdirs.
There's an argument this should be called lookupnextleaf, since it
returns the next bid, unlike lookup, but I'm going to ignore that
argument because:
1. A non-next lookupleaf doesn't really make sense for trees where
you don't have to fetch the leaf (the mtree)
2. It would be a bit too verbose
- Adopted commitleaf name for functions that accept leaf rbyds.
This makes the lfsr_bshrub_commit -> lfsr_btree_commit__ mess a bit
more readable.
- Strictly limited lookup and lookupnext to return rattrs, even in
complex trees like the mtree.
Most use cases will probably stick to the lookupleaf variants, but at
least the behavior will be consistent.
- Strictly limited lookup to expect a known bid/rid.
This only really matters for lfsr_btree/bshrub_lookup, which as a
quirk of their implementation _can_ lookup both bid + rattr at the
same time. But I don't think we'll need this functionality, and
limited the behavior may allow for future optimizations.
Note there is no lfsr_file_lookup. File btrees currently only ever
have a single leaf rattr, so this API doesn't really make sense.
Internal API changes:
- lfsr_btree_lookupnext_ -> lfsr_btree_lookupleaf
- lfsr_btree_lookupnext -> lfsr_btree_lookupnext
- lfsr_btree_lookup -> lfsr_btree_lookup
- added lfsr_btree_namelookupleaf
- lfsr_btree_namelookup -> lfsr_btree_namelookup
- lfsr_btree_commit__ -> lfsr_btree_commit_
- lfsr_btree_commit_ -> lfsr_btree_commitleaf
- lfsr_btree_commit -> lfsr_btree_commit
- added lfsr_bshrub_lookupleaf
- lfsr_bshrub_lookupnext -> lfsr_bshrub_lookupnext
- lfsr_bshrub_lookup -> lfsr_bshrub_lookup
- lfsr_bshrub_commit_ -> lfsr_bshrub_commitleaf
- lfsr_bshrub_commit -> lfsr_bshrub_commit
- lfsr_mtree_lookup -> lfsr_mtree_lookupleaf
- added lfsr_mtree_lookupnext
- added lfsr_mtree_lookup
- added lfsr_mtree_namelookupleaf
- lfsr_mtree_namelookup -> lfsr_mtree_namelookup
- added lfsr_file_lookupleaf
- lfsr_file_lookupnext -> lfsr_file_lookupnext
- added lfsr_file_commitleaf
- lfsr_file_commit -> lfsr_file_commit
Also added lookupnext to Mdir/Mtree in the dbg scripts.
Unfortunately this did add both code and stack, but only because of the
optional mdir returns in the mtree lookups:
code stack ctx
before: 35520 2440 636
after: 35548 (+0.1%) 2472 (+1.3%) 636 (+0.0%)
So mbid=0 now implies the mdir is not inlined.
Downsides:
- A bit more work to calculate
- May lose information due to masking everything when mtree.weight==0
- Risk of confusion when in-lfs.c state doesn't match (mbid=-1 is
implied by mtree.weight==0)
Upsides:
- Includes more information about the topology of the mtree
- Avoids multiple dbgmbids for the same physical mdir
Also added lfsr_dbgmbid and lfsr_dbgmrid to help make logging
easier/more consistent.
And updated dbg scripts.
So now:
(block_size)
mbits = nlog2(----------) = nlog2(block_size) - 3
( 8 )
Instead of:
( (block_size))
mbits = nlog2(floor(----------)) = nlog2(block_size & ~0x7) - 3
( ( 8 ))
This makes the post-log - 3 formula simpler, which we probably want to
prefer as it avoids a division. And ceiling is arguably more intuitive
corner case behavior.
This may seem like a minor detail, but because mbits is purely
block_size derived and not configurable, any quirks here will become
a permanent compatibility requirement.
And hey, it saves a couple bytes (I'm not really sure why, the division
should've been optimized to a shift):
code stack ctx
before: 35528 2440 636
after: 35520 (-0.0%) 2440 (+0.0%) 636 (+0.0%)
This is limited to dbgle32.py, dbgleb128.py, and dbgtag.py for now.
This more closely matches how littlefs behaves, in that we read a
bounded number of bytes before leb128 decoding. This minimizes bugs
related to leb128 overflow and avoids reading inherently undecodable
data.
The previous unbounded behavior is still available with -w0.
Note this gives dbgle32.py much more flexibility in that it can now
decode other integer widths. Uh, ignore the name for now. At least it's
self documenting that the default is 32-bits...
---
Also fixed a bug in fromleb128 where size was reported incorrectly on
offset + truncated leb128.
Do you see the O(n^2) behavior in this loop?
j = 0
while j < len(data):
word, d = fromleb(data[j:])
j += d
The slice, data[j:], creates a O(n) copy every iteration of the loop.
A bit tricky. Or at least I found it tricky to notice. Maybe because
array indexing being cheap is baked into my brain...
Long story short, this repeated slicing resulted in O(n^2) behavior in
Rbyd.fetch and probably some other functions. Even though we don't care
_too_ much about performance in these scripts, having Rbyd.fetch run in
O(n^2) isn't great.
Tweaking all from* functions to take an optional index solves this, at
least on paper.
---
In practice I didn't actually find any measurable performance gain. I
guess array slicing in Python is optimized enough that the constant
factor takes over?
(Maybe it's being helped by us limiting Rbyd.fetch to block_size in most
scripts? I haven't tested NAND block sizes yet...)
Still, it's good to at least know this isn't a bottleneck.
This just gives dbgtag.py a few more bells and whistles that may be
useful:
- Can now parse multiple tags from hex:
$ ./scripts/dbgtag.py -x 71 01 01 01 12 02 02 02
71 01 01 01 altrgt 0x101 w1 -1
12 02 02 02 shrubdir w2 2
Note this _does_ skip attached data, which risks some confusion but
not skipping attached data will probably end up printing a bunch of
garbage for most use cases:
$ ./scripts/dbgtag.py -x 01 01 01 04 02 02 02 02 03 03 03 03
01 01 01 04 gdelta 0x01 w1 4
03 03 03 03 struct 0x03 w3 3
- Included hex in output. This is helpful for learning about the tag
encoding and also helps identify tags when parsing multiple tags.
I considered also included offsets, which might help with
understanding attached data, but decided it would be too noisy. At
some point you should probably jump to dbgrbyd.py anyways...
- Added -i/--input to read tags from a file. This is roughly the same as
-x/--hex, but allows piping from other scripts:
$ ./scripts/dbgcat.py disk -b4096 0 -n4,8 | ./scripts/dbgtag.py -i-
80 03 00 08 magic 8
Note this reads the entire file in before processing. We'd need to fit
everything into RAM anyways to figure out padding.
- Added TreeArt __bool__ and __len__.
This was causing a crash in _treeartfrommtreertree when rtree was
empty.
The code was not updated in the set -> TreeArt class transition, and
went unnoticed because it's unlikely to be hit unless the filesystem
is corrupt.
Fortunately(?) realtime rendering creates a bunch of transiently
corrupt filesystem images.
- Tweaked lookupleaf to not include mroots in their own paths.
This matches the behavior of leaf mdirs, and is intentionally
different from btree's lookupleaf which needs to lookup the leaf rattr
to terminate.
- Tweaked leaves to not remove the last path entry if it is an mdir.
This hid the previous lookupleaf inconsistency. We only remove the
last rbyd from the path because it is redundant, and for mdirs/mroots
it should never be redundant.
I ended up just replacing the corrupt check with an explicit check
that the rbyd is redundant. This should be more precise and avoid
issues like this in the future.
Also adopted explicit redundant checks in Btree.leaves and
Lfs.File.leaves.
Two new tricks:
1. Hide the cursor while redrawing the ring buffer.
2. Build up the entire redraw in RAM first, and render everything in a
single write call.
These _mostly_ get rid of the cursor flickering issues in rapidly
updating scripts.
After all, who doesn't love a good bit of flickering.
I think I was trying to be too clever, so reverting.
Printing these with no padding is the simplest solution, provides the
best information density, and worst case you can always add -s1 to limit
the update frequency if flickering is hurting readability.
This automatically minimizes the status strings without flickering, all
it took was a bit of ~*global state*~.
---
If I'm remembering correctly, this was actually how tracebd.py used to
work before dbgbmap.py was added. The idea was dropped with dbgbmap.py
since dbgbmap.py relied on watch.py for real-time rendering and couldn't
persist state.
But now dbgbmap.py has its own -k/--keep-open flag, so that's not a
problem.
This isn't true, especially for dbgbmap.py, 100% is very possible in
filesystems with small files. But by limiting padding to 99.9%, we avoid
the annoying wasted space caused by the rare but occasional 100.0%.
No one is realistically ever going to use this.
Ascii art is just too low resolution, trying to pad anything just wastes
terminal space. So we might as well not support --padding and save on
the additional corner cases.
Worst case, in the future we can always find this commit and revert
things.
Should've probably been two commits, but:
1. Cleaned up tracebd.py's header generation to be consistent with
dbgbmap.py and other scripts.
Percentage fields are now consistently floats in all scripts,
allowing user-specified precision when punescaping.
2. Moved header generation up to where we still have the disk open (in
dbgbmap[d3].py), to avoid issues with lazy Lfs attrs trying to access
the disk after it's been closed.
Found while testing with --title='cksum %(cksum)08x'. Lfs tries to
validate the gcksum last minute and things break.
This can be hit when dealing with very small maps, which is common since
we're rendering to the terminal. Not crashing here at least allows the
header/usage string to be shown.
Replacing -R/--aspect-ratio, --to-ratio now calculates the width/height
_before_ adding decoration such as headers, stack info, etc.
I toying around with generalizing -R/--aspect-ratio to include
decorations, but when Wolfram Alpha spit this mess for the post-header
formula:
header*r - sqrt(4*v*r + padding^2*r)
w = ------------------------------------
2
I decided maybe a generalized -R/--aspect-ratio is a _bit_ too
complicated for what are supposed to be small standalone Python
scripts...
---
Also fixed the scaling formula, which should've taken the sqrt _after_
multiplying by the aspect ratio:
w = sqrt(v*r)
I only noticed while trying to solve for the more complicated
post-decoration formula, the difference is pretty minor.
The notable exception being plot.py, where line-level history doesn't
really make sense.
These scripts all default to height=1, and -n/--lines can be useful for
viewing changes over time.
In theory you could achieve something similar to this with tailpipe.py,
but you would lose the header info, which is useful.
---
Note, as a point of simplicity, we do _not_ show sub-char history like
we used to in tracebd.py. That was way too complicated for what it was
worth.
This simplifies attrs a bit, and scripts can always override
__getitem__ if they want to provide lazy attr generation.
The original intention of accepting functions was to make lazy attr
generation easier, but while tinkering around with the idea I realized
the actual attr mapping/generation would be complicated enough that
you'd probably want a full class anyways.
All of our scripts are only using dict attrs anyways. And lazy attr
generation is probably a premature optimization for the same reason
everyone's ok with Python's slices being O(n).
Reading Wikipedia:
> Later terminals added the ability to directly specify the "bright"
> colors with 90–97 and 100–107.
So if we want to stick to one pattern, we should probably go with
brightness as a separate modifier.
This shouldn't noticeably change any script, unless your terminal
interprets 90-97m colors differently from 1;30-37m, in which case things
should be more consistent now.
This mirrors how -H/--height and -W/--width work, with -n-1 using the
terminal height - 1 for the output.
This is very useful for carving out space for the shell prompt and other
things, without sacrificing automatic sizing.
This allows for combining braille/dots with custom chars for specific
elements:
$ ./scripts/codemap.py lfs.o -H16 -: -.lfsr_rbyd_appendrattr=A
Note this is already how plot.py works, letting braille/dots take
priority in the new scripts/reworks was just an oversight.
So percentages now include unused blocks, instead of being derived from
only blocks in use.
This is a bit inconsistent with tracebd.py, where we show ops as
percentages of all ops, but it's more useful:
- mdir+btree+data gives you the total usage, which is useful if you want
to know how full disk is. You can't get this info from in-use
percentages.
Note that total field is sticking around, so you can show the total
usage directly if you provide your own title string:
$ ./scripts/dbgbmap.py disk \
--title="bd %(block_size)sx%(block_count)s, %(total_percent)s"
- You can derive the in-use percentages from total percentages if you
need them: in-use-mdir = mdir/(mdir+btree+data).
Maybe this should be added to the --title fields, but I can't think of
a good name at the moment...
Attempting to make tracebd.py consistent with dbgbmap.py doesn't really
make sense either: Showing op percentages of total bmap will usually be
an extremely small number.
At least dbgbmap.py is consistent with tracebd.py's --wear percentage,
which is out of all erase state in the bmap.
- Create a grid with dashes even in -%/--usage mode.
This was surprisingly annoying since it breaks the existing
1 block = 1 char assumption.
- Derive percentages from in-use blocks, not all blocks. This matches
behavior of tracebd.py's percentages (% read/prog/erase).
Though not tracebd.py's percent wear...
- Added mdir/btree/data counts/percentages to dbgbmapd3.py, for use in
custom --title strings and the newly added --title-usage.
Because why not. Unlike dbgbmap.py, performance is not a concern at
all, and the consistency between these two scripts helps
maintainability.
Case in point: also fixed a typo from copying the block_count
inference between scripts.
It's a mess but it's working. Still a number of TODOs to cleanup...
This adopts all of the changes in dbgbmap.py/dbgbmapd3.py, block
grouping, nested curves, Canvas, Attrs, etc:
- Like dbgbmap.py, we now group by block first before applying space
filling curves, using nested space filling curves to render byte-level
operations.
Python's ft.lru_cache really shines here.
The previous behavior is still available via -u/--contiguous
- Adopted most features in dbgbmap.py, so --to-scale, -t/--tiny, custom
--title strings, etc.
- Adopted Attrs so now chars/coloring can be customized with
-./--add-char, -,/--add-wear-char, -C/--add-color,
-G/--add-wear-color.
- Renamed -R/--reset -> --volatile, which is a much better name.
- Wear is now colored cyan -> white -> read, which is a bit more
visually interesting. And we're not using cyan in any scripts yet.
In addition to the new stuff, there were a few simplifications:
- We no longer support sub-char -n/--lines with -:/--dots or
-⣿/--braille. Too complicated, required Canvas state hacks to get
working, and wasn't super useful.
We probably want to avoid doing too much cleverness with -:/--dots and
-⣿/--braille since we can't color sub-chars.
- Dropped -@/--blocks byte-level range stuff. This was just not worth
the amount of complexity it added. -@/--blocks is now limited to
simple block ranges. High-level scripts should stick to high-level
options.
- No fancy/complicated Bmap class. The bmap object is just a dict of
TraceBlocks which contain RangeSets for relevant operations.
Actually the new RangeSet class deserves a mention but this commit
message is probably already too long.
RangeSet is a decently efficient set of, well, ranges, that can be
merged and queried. In a lower-level language it should be implemented
as a binary tree, but in Python we're just using a sorted list because
we're probably not going to be able to beat O(n) list operations.
- Wear is tracked at the block level, no reason to overcomplicate this.
- We no longer resize based on new info. Instead we either expect a
-b/--block-size argument or wait until first bd init call.
We can probably drop the block size in BD_TRACE statements now, but
that's a TODO item.
- Instead of one amalgamated regex, we use string searches to figure out
the bd op and then smaller regexes to parse. Lesson learned here:
Python's string search is very fast (compared to regex).
- We do _not_ support labels on blocks like we do in treemap.py/
codemap.py. It's less useful here and would just be more hassle.
I also tried to reorganize main a bit to mirror the simple two-main
approach in dbgbmap.py and other ascii-rendering scripts, but it's a bit
difficult here since trace info is very stateful. Building up main
functions in the main main function seemed to work well enough:
main -+-> main_ -> trace__ (main thread)
'-> draw_ -> draw__ (daemon thread)
---
You may note some weirdness going on with flags. That's me trying to
avoid upcoming flag conflicts.
I think we want -n/--lines in more scripts, now that it's relatively
self-contained, but this conflicts with -n/--namespace-depth in
codemap[d3].py, and risks conflict with -N/--notes in csv.py which may
end up with namespace-related functionality in the future.
I ended up hijacking -_, but this conflicted with -_/--add-line-char in
plot.py, but that's ok because we also want a common "secondary char"
flag for wear in tracebd.py... Long story short I ended up moving a
bunch of flags around:
- added -n/--lines
- -n/--namespace-depth -> -_/--namespace-depth
- -N/--notes -> -N/--notes
- -./--add-char -> -./--add-char
- -_/--add-line-char -> -,/--add-line-char
- added -,/--add-wear-char
- -C/--color -> -C/--add-color
- added -> -G/--add-wear-color
Worth it? Dunno.
By default, we don't actually do anything if we find an invalid gcksum,
so there's no reason to calculate it everytime.
Though this performance improvement may not be very noticeable:
dbgbmap.py w/ crc32c lib w/ no_ck --no-ckdata: 0m0.221s
dbgbmap.py w/ crc32c lib w/o no_ck --no-ckdata: 0m0.269s
dbgbmap.py w/o crc32c lib w/ no_ck --no-ckdata: 0m0.388s
dbgbmap.py w/o crc32c lib w/o no_ck --no-ckdata: 0m0.490s
dbgbmap.old.py: 0m0.231s
Note that there's no point in adopting this in dbgbmapd3.py: 1. svg
rendering dominates (probably, I haven't measured this), and 2. we
default to showing the littlefs mount string instead of mdir/btree/data
percentages.
Jumping from a simple Python implementation to the fully hardware
accelerated crc32c library basically deletes any crc32c related
bottlenecks:
crc32c.py disk (1MiB) w/ crc32c lib: 0m0.027s
crc32c.py disk (1MiB) w/o crc32c lib: 0m0.844s
This uses the same try-import trick we use for inotify_simple, so we get
the speed improvement without losing portability.
---
In dbgbmap.py:
dbgbmap.py w/ crc32c lib: 0m0.273s
dbgbmap.py w/o crc32c lib: 0m0.697s
dbgbmap.py w/ crc32c lib --no-ckdata: 0m0.269s
dbgbmap.py w/o crc32c lib --no-ckdata: 0m0.490s
dbgbmap.old.py: 0m0.231s
The bulk of the runtime is still in Rbyd.fetch, but this is now
dominated by leb128 decoding, which makes sense. We do ~twice as many
fetches in the new dbgbmap.py in order to calculate the gcksum (which
we then ignore...).
Checking every data block for errors really slows down dbgbmap.py, which
is unfortunate for realtime rendering.
To be fair, the real issue is our naive crc32c impl, but the mindset of
these scripts is if you want speed you really shouldn't be using Python
and should rewrite the script in Rust/C/something (see prettyasserts for
example). You _could_ speed things up with a table-based crc32c, but at
that point you should probably just find C-bindings for crc32c (maybe
optional like inotify?... actually that's not a bad idea...).
At least --no-ckmeta/--no-ckdata allow for the previous behavior of not
checking for relevant errors for a bit of speed.
---
Note that --no-ckmeta currently doesn't really do anything. I toyed with
adding a non-fetching Rbyd.fetchtrunk method, but this seems out of
scope for these scripts.
This better matches what you would expect from a function called
bd.read, at least in the context of littlefs, while also decreasing the
state (seek) we have to worry about.
Note that bd.readblock already behaved mostly like this, and is
preferred by every class except for Bptr.
So no more __getitem__, __contains__, or __iter__ for Rbyd, Btree, Mdir,
Mtree, Lfs.File, etc.
These were way too error-prone, especially when accidental unpacking
triggered unintended disk traversal and weird error states. We didn't
even use the implicit behavior because we preferred the full name for
heavy disk operations.
The motivation for this was Python not catching this bug, which is a bit
silly:
rid, rattr, *path_ = rbyd
This is a rework of dbgbmap.py to match dbgbmapd3.py, adopt the new
Rbyd/Lfs class abstractions, as well as Canvas, -k/--keep-open, etc.
Some of the main changes:
- dbgbmap.py now reports corrupt/conflict blocks, which can be useful
for debugging.
Note though that you will probably get false positives if running with
-k/--keep-open while something is writing to the disk. littlefs is
powerloss safe, not multi-write safe! Very different problem!
- dbgbmap.py now groups by blocks before mapping to the space filling
curve. This matches dbgbmapd3.py and I think is more intuitive now
that we have a bmap tiling algorithm.
-%/--usage still works, but is rendered as a second space filling
curve _inside_ the block tile. Different blocks can end up with
slightly different sizes due to rounding, but it's not the end of the
world.
I wasn't originally going to keep it around, but ended up caving, so
you can still get the original byte-level curve via -u/--contiguous.
- Like the other ascii rendering script, dbgbmap.py now supports
-k/--keep-open and friends as a thin main wrapper. This just makes it
a bit easier to watch a realtime bmap without needing to use watch.py.
- --mtree-only is supported, but filtering via --mdirs/--btrees/--data
is _not_ supported. This was too much complexity for a minor feature,
and doesn't cover other niche blocks like corrupted/conflict or parity
in the future.
- Things are more customizable thanks to the Attr class. For an example
you can now use the littlefs mount string as the title via
--title-littlefs.
- Support for --to-scale and -t/--tiny mode, if you want to scale based
on block_size.
One of the bigger differences dbgbmapd3.py -> dbgbmap.py is that
dbgbmap.py still supports -%/--usage. Should we backport -%/--usage to
dbgbmapd3.py? Uhhhh...
This ends up a funny example of raster graphics vs vector graphics. A
pixel-level space filling curve is easy with raster graphics, but with
an svg you'd need some sort of pixel -> path wrapping algorithm...
So no -%/--usage in dbgbmapd3.py for now.
Also just ripped out all of the -@/--blocks byte-level range stuff. Way
too complicated for what it was worth. -@/--blocks is limited to simple
block ranges now. High-level scripts should stick to high-level options.
One last thing to note is the adoption of "if '%' in label__" checks
before applying punescape. I wasn't sure if we should support punescape
in dbgbmap.py, since it's quite a bit less useful here, and may be
costly due to the lazy attr generation. Adding this simple check avoids
the cost and consistency question, so I adopted it in all scripts.
Like codemapd3.py this include an interactive UI for viewing the
underlying filesystem graph, including:
- mode-tree - Shows all reachable blocks from a given block
- mode-branches - Shows immediate children of a given block
- mode-references - Shows parents of a given block
- mode-redund - Shows sibling blocks in redund groups (This is
currently just mdir pairs, but the plan is to add more)
This is _not_ a full filesystem explorer, so we don't embed all block
data/metadata in the svg. That's probably a project for another time.
However we do include interesting bits such as trunk addresses,
checksums, etc.
An example:
# create an filesystem image
$ make test-runner -j
$ ./scripts/test.py -B test_files_many -a -ddisk -O- \
-DBLOCK_SIZE=1024 \
-DCHUNK=10 \
-DSIZE=2050 \
-DN=128 \
-DBLOCK_RECYCLES=1
... snip ...
done: 2/2 passed, 0/2 failed, 164pls!, in 0.16s
# generate bmap svg
$ ./scripts/dbgbmapd3.py disk -b1024 -otest.svg \
-W1400 -H750 -Z --dark
updated test.svg, littlefs v0.0 1024x1024 0x{26e,26f}.d8 w64.128, cksu
m 41ea791e
And open test.svg in a browser of your choice.
Here's what the current colors mean:
- yellow => mdirs
- blue => btree nodes
- green => data blocks
- red => corrupt/conflict issue
- gray => unused blocks
But like codemapd3.py the output is decently customizable. See -h/--help
for more info.
And, just like codemapd3.py, this is based on ideas from d3 and
brendangregg's flamegraphs:
- d3 - https://d3js.org
- brendangregg's flamegraphs - https://github.com/brendangregg/FlameGraph
Note we don't actually use d3... the name might be a bit confusing...
---
One interesting change from the previous dbgbmap.py is the addition of
"corrupt" (bad checksum) and "conflict" (multiple parents) blocks, which
can help find bugs.
You may find the "conflict" block reporting a bit strange. Yes it's
useful for finding block allocation failures, but won't naturally formed
dags in file btrees also be reported as "conflicts"?
Yes, but the long-term plan is to move away from dags and make littlefs
a pure tree (for block allocator and error correction reasons). This
hasn't been implemented yet, so for now dags will result in false
positives.
---
Implementation wise, this script was pretty straightforward given prior
dbglfs.py and codemapd3.py work.
However there was an interesting case of https://xkcd.com/1425:
- Traverse the filesystem and build a graph - easy
- Tile a rectangle with n nice looking rectangles - uhhh
I toyed around with an analytical approach (something like block width =
sqrt(canvas_width*canvas_height/n) * block_aspect_ratio), but ended up
settling on an algorithm that divides the number of columns by 2 until
we hit our target aspect ratio.
This algorithm seems to work quite well, runs in only O(log n), and
perfectly tiles the grid for powers-of-two. Honestly the result is
better than I was expecting.
Also tweaked how we fetch shrubs, adding Rbyd.fetchshrub and
Btree.fetchshrub instead of overloading the bd argument.
Oh, and also added --trunk to dbgmtree.py and dbglfs.py. Actually
_using_ --trunk isn't advised, since it will probably just result in a
corrupted filesystem, but these scripts are for accessing things that
aren't normally allowed anyways.
The reason for dropping the list/tuple distinction is because it was a
big ugly hack, unpythonic, and likely to catch users (and myself) by
surprise. Now, Rbyd.fetch and friends always require separate
block/trunk arguments, and the exercise of deciding which trunk to use
is left up to the caller.
Just some minor tweaks:
- rbydaddr: Return list instead of tuple, note we rely on the type
distinction in Rbyd.fetch now.
- tagrepr: Rename w -> weight.
So:
- before: ./scripts/dbgbmap.py disk -b4096 -@0 -n16,32
- after: ./scripts/dbgbmap.py disk -b4096 -@'0 -n16,32'
This is mainly to avoid the naming conflict between -n/--size and
-n/--lines, while also separating out the namespaces a bit.
It's probably not the most intuitive CLI UI, but --off and -n/--size are
probably infrequent arguments at this level of script anyways.
Mostly to move away from unnecessary shortform flags. Using shortform
flags for what is roughly an unbounded enum just causes too many flag
conflicts as scripts grow:
- -r/--read -> --reads
- -p/--prog -> --progs
- -e/--erase -> --erases
- -w/--wear -> --wear
- -i/--in-use -> -%/--usage
- -M/--mdirs -> --mdirs
- -B/--btrees -> --btress
- -D/--datas -> --data/--datas
I may have had too much fun forcing argparse to make -%/--usage to work.
The percent sign caused a lot of problems for argparse internally.
--no-header doesn't really deserve a shortform, and this risks conflicts
with -N/--notes in the future, not to mention any other number of flags
that can start with --no-*.
This was quite a puzzle.
The problem: How do we detect corrupt mdirs?
Seems like a simple question, but we can't just rely on mdir cksums. Our
mdirs are independently updateable logs, and logs have this annoying
tendency to "rollback" to previously valid states when corrupted.
Rollback issues aren't littlefs-specific, but what _is_ littlefs-
specific is that when one mdir rolls back, it can disagree with other
mdirs, resulting in wildly incorrect filesystem state.
To solve this, or at least protect against disagreeable mdirs, we need
to somehow include the state of all other mdirs in each mdir commit.
---
The first thought: Why not use gstate?
We already have a system for storing distributed state. If we add the
xor of all of our mdir cksums, we can rebuild it during mount and verify
that nothing changed:
.--------. .--------. .--------. .--------.
.| mdir 0 | .| mdir 1 | .| mdir 2 | .| mdir 3 |
|| | || | || | || |
|| gdelta | || gdelta | || gdelta | || gdelta |
|'-----|--' |'-----|--' |'-----|--' |'-----|--'
'------|-' '------|-' '------|-' '------|-'
'--.------' '--.------' '--.------' '--.------'
cksum | cksum | cksum | cksum |
| | v | v | v |
'---------> xor -------> xor -------> xor -------> gcksum
| v v v =?
'---------> xor -------> xor -------> xor ---> gcksum
Unfortunately it's not that easy. Consider what this looks like
mathematically (g is our gcksum, c_i is an mdir cksum, d_i is a
gcksumdelta, and +/-/sum is xor):
g = sum(c_i) = sum(d_i)
If we solve for a new gcksumdelta, d_i:
d_i = g' - g
d_i = g + c_i - g
d_i = c_i
The gcksum cancels itself out! We're left with an equation that depends
only on the current mdir, which doesn't help us at all.
Next thought: What if we permute the gcksum with a function t before
distributing it over our gcksumdeltas?
.--------. .--------. .--------. .--------.
.| mdir 0 | .| mdir 1 | .| mdir 2 | .| mdir 3 |
|| | || | || | || |
|| gdelta | || gdelta | || gdelta | || gdelta |
|'-----|--' |'-----|--' |'-----|--' |'-----|--'
'------|-' '------|-' '------|-' '------|-'
'--.------' '--.------' '--.------' '--.------'
cksum | cksum | cksum | cksum |
| | v | v | v |
'---------> xor -------> xor -------> xor -------> gcksum
| | | | .--t--'
| | | | '-> t(gcksum)
| v v v =?
'---------> xor -------> xor -------> xor ---> t(gcksum)
In math terms:
t(g) = t(sum(c_i)) = sum(d_i)
In order for this to work, t needs to be non-linear. If t is linear, the
same thing happens:
d_i = t(g') - t(g)
d_i = t(g + c_i) - t(g)
d_i = t(g) + t(c_i) - t(g)
d_i = t(c_i)
This was quite funny/frustrating (funnistrating?) during development,
because it means a lot of seemingly obvious functions don't work!
- t(g) = g - Doesn't work
- t(g) = crc32c(g) - Doesn't work because crc32cs are linear
- t(g) = g^2 in GF(2^n) - g^2 is linear in GF(2^n)!?
Fortunately, powers coprime with 2 finally give us a non-linear function
in GF(2^n), so t(g) = g^3 works:
d_i = g'^3 - g^3
d_i = (g + c_i)^3 - g^3
d_i = (g^2 + gc_i + gc_i + c_i^2)(g + c_i) - g^3
d_i = (g^2 + c_i^2)(g + c_i) - g^3
d_i = g^3 + gc_i^2 + g^2c_i + c_i^3 - g^3
d_i = gc_i^2 + g^2c_i + c_i^3
---
Bleh, now we need to implement finite-field operations? Well, not
entirely!
Note that our algorithm never uses division. This means we don't need a
full finite-field (+, -, *, /), but can get away with a finite-ring (+,
-, *). And conveniently for us, our crc32c polynomial defines a ring
epimorphic to a 31-bit finite-field.
All we need to do is define crc32c multiplication as polynomial
multiplication mod our crc32c polynomial:
crc32cmul(a, b) = pmod(pmul(a, b), P)
And since crc32c is more-or-less just pmod(x, P), this lets us take
advantage of any crc32c hardware/tables that may be available.
---
Bunch of notes:
- Our 2^n-bit crc-ring maps to a 2^n-1-bit finite-field because our crc
polynomial is defined as P(x) = Q(x)(x + 1), where Q(x) is a 2^n-1-bit
irreducible polynomial.
This is a common crc construction as it provides optimal odd-bit/2-bit
error detection, so it shouldn't be too difficult to adapt to other
crc sizes.
- t(g) = g^3 is not the only function that works, but it turns out to be
a pretty good one:
- 3 and 2^(2^n-1)-1 are coprime, which means our function t(g) = g^3
provides a one-to-one mapping in the underlying fields of all crc
rings of size 2^(2^n).
We know 3 and 2^(2^n-1)-1 are coprime because 2^(2^n-1)-1 =
2^(2^n)-1 (a Fermat number) - 2^(2^n-1) (a power-of-2), and 3
divides Fermat numbers >=3 (A023394) and is not 2.
- Our delta, when viewed as a polynomial in g: d(g) = gc^2 + g^2c +
c^3, has degree 2, which implies there are at most 2 solutions or
1-bit of information loss in the underlying field.
This is optimal since the original definition already had 2
solutions before we even chose a function:
d(g) = t(g + c) - t(g)
d(g) = t(g + c) - t((g + c) - c)
d(g) = t((g + c) + c) - t(g + c)
d(g) = d(g + c)
Though note the mapping of our crc-ring to the underlying field
already represents 1-bit of information loss.
- If you're using a cryptographic hash or other non-crc, you should
probably just use an equal sized finite-field.
Though note changing from a 2^n-1-bit field to a 2^n-bit field does
change the math a bit, with t(g) = g^7 being a better non-linear
function:
- 7 is the smallest odd-number coprime with 2^n-1, a Fermat number,
which makes t(g) = g^7 a one-to-one mapping.
3 humorously divides all 2^n-1 Fermat numbers.
- Expanding delta with t(g) = g^7 gives us a 6 degree polynomial,
which implies at most 6 solutions or ~3-bits of information loss.
This isn't actually the best you can do, some exhaustive searching
over small fields (<=2^16) suggests t(g) = g^(2^(n-1)-1) _might_ be
optimal, but that's a heck of a lot more multiplications.
- Because our crc32cs preserve parity/are epimorphic to parity bits,
addition (xor) and multiplication (crc32cmul) also preserve parity,
which can be used to show our entire gcksum system preserves parity.
This is quite neat, and means we are guaranteed to detect any odd
number of bit-errors across the entire filesystem.
- Another idea was to use two different addition operations: xor and
overflowing addition (or mod a prime).
This probably would have worked, but lacks the rigor of the above
solution.
- You might think an RS-like construction would help here, where g =
sum(c_ia^i), but this suffers from the same problem:
d_i = g' - g
d_i = g + c_ia^i - g
d_i = c_ia^i
Nothing here depends on anything outside of the current mdir.
- Another question is should we be using an RS-like construction anyways
to include location information in our gcksum?
Maybe in another system, but I don't think it's necessary in littlefs.
While our mdir are independently updateable, they aren't _entirely_
independent. The location of each mdir is stored in either the mtree
or a parent mdir, so it always gets mixed into the gcksum somewhere.
The only exception being the mrootanchor which is always at the fixed
blocks 0x{0,1}.
- This does _not_ catch "global-rollback" issues, where the most recent
commit in the entire filesystem is corrupted, revealing an older, but
still valid, filesystem state.
But as far as I am aware this is just a fundamental limitation of
powerloss-resilient filesystems, short of doing destructive
operations.
At the very least, exposing the gcksum would allow the user to store
it externally and prevent this issue.
---
Implementation details:
- Our gcksumdelta depends on the rbyd's cksum, so there's a catch-22 if
we include it in the rbyd itself.
We can avoid this by including it in the commit tags (actually the
separate canonical cksum makes this easier than it would have been
earlier), but this does mean LFSR_TAG_GCKSUMDELTA is not an
LFSR_TAG_GDELTA subtype. Unfortunate but not a dealbreaker.
- Reading/writing the gcksumdelta gets a bit annoying with it not being
in the rbyd. For now I've extended the low-level lfsr_rbyd_fetch_/
lfsr_rbyd_appendcksum_ to accept an optional gcksumdelta pointer,
which is a bit awkward, but I don't know of a better solution.
- Unlike the grm, _every_ mdir commit involves the gcksum, which means
we either need to propagate the gcksumdelta up the mroot chain
correctly, or somehow keep track of partially flushed gcksumdeltas.
To make this work I modified the low-level lfsr_mdir_commit__
functions to accept start_rid=-2 to indicate when gcksumdeltas should
be flushed.
It's a bit of a hack, but I think it might make sense to extend this
to all gdeltas eventually.
The gcksum cost both code and RAM, but I think it's well worth it for
removing an entire category of filesystem corruption:
code stack ctx
before: 37796 2608 620
after: 38428 (+1.7%) 2640 (+1.2%) 644 (+3.9%)
Now that we perturb commit cksums with the odd-parity zero, the q-bit no
longer serves a purpose other than extra debug info. But this is a
double-edged sword, because redundant info just means another thing that
can go wrong.
For example, should we assert? If the q-bit doesn't reflect the
previous-perturb state it's a bug, but the only thing that would break
would be the q-bit itself. And if we don't assert what's the point of
keeping the q-bit around?
Dropping the q-bit avoids answering this question and saves a bit of
code:
code stack ctx
before: 37772 2608 620
after: 37768 (-0.0%) 2608 (+0.0%) 620 (+0.0%)
I've been unhappy with LFSR_TAG_ORPHAN for a while now. While it's true
these represent orphaned files, they also represent zombied files. And
as long as a reference to the file exists in-RAM, I find it hard to say
these files are truely "orphaned".
We're also just using the term "orphan" for too many things.
Really this tag just represents an mid reservation. The term stickynote
works well enough for this, and fits in with the other internal tag,
LFSR_TAG_BOOKMARK.
Moved local import hack behind if __name__ == "__main__"
These scripts aren't really intended to be used as python libraries.
Still, it's useful to import them for debugging and to get access to
their juicy internals.
This seems like a more fitting name now that this script has evolved
into more of a general purpose high-level CSV tool.
Unfortunately this does conflict with the standard csv module in Python,
breaking every script that imports csv (which is most of them).
Fortunately, Python is flexible enough to let us remove the current
directory before imports with a bit of an ugly hack:
# prevent local imports
__import__('sys').path.pop(0)
These scripts are intended to be standalone anyways, so this is probably
a good pattern to adopt.
This extends Rbyd.fetch to accept another rbyd, in which case we inherit
the RAM-backed block without rereading it from disk. This avoids an
issue where shrubs can become corrupted if the disk is being
simultaneously written and debugged.
Normally we can detect the checksum mismatch and toss out the rbyd
during fetch, but shrub pointers don't include a checksum since they
assume the containing rbyd has already been checksummed.
It's interesting to note this even avoids the memory copy thanks to
Python's reference counting.
If we're fetching branches anyways, we might as well check that the
checksums match. This helps protect against infinite loops in B-tree
branches.
Also fixed an issue where we weren't xoring perturb state on finding an
explicit trunk.
Note this is equivalent to LFS_M_CKFETCHES in lfs.c.
---
This doesn't mean we always need LFS_M_CKFETCHES. Our dbg scripts just
need to be a little bit tougher because 1. running tests with -j creates
wildly corrupted and entangled littlefs images, and 2. Rbyd.fetch is
almost too forgiving in choosing the nearest trunk.
These work by keeping a set of all seen mroots as we descend down the
mroot chain. Simple, but it works.
The downside of this approach is that the mroot set grows unbounded, but
it's unlikely we'll ever have enough mroots in a system for this to
really matter.
This fixes scripts like dbgbmap.py getting stuck on intentional mroot
cycles created for testing. It's not a problem for a foreground script
to get stuck in an infinite loop, since you can just kill it, but a
background script getting stuck at 100% CPU is a bit more annoying.
This matches the style used in C, which is good for consistency:
a_really_long_function_name(
double_indent_after_first_newline(
single_indent_nested_newlines))
We were already doing this for multiline control-flow statements, simply
because I'm not sure how else you could indent this without making
things really confusing:
if a_really_long_function_name(
double_indent_after_first_newline(
single_indent_nested_newlines)):
do_the_thing()
This was the only real difference style-wise between the Python code and
C code, so now both should be following roughly the same style (80 cols,
double-indent multiline exprs, prefix multiline binary ops, etc).
Mainly to avoid conflicts with match results m, this frees up the single
letter variables m for other purposes.
Choosing a two letter alias was surprisingly difficult, but mt is nice
in that it somewhat matches it (for itertools) and ft (for functools).