This better matches what you would expect from a function called
bd.read, at least in the context of littlefs, while also decreasing the
state (seek) we have to worry about.
Note that bd.readblock already behaved mostly like this, and is
preferred by every class except for Bptr.
So no more __getitem__, __contains__, or __iter__ for Rbyd, Btree, Mdir,
Mtree, Lfs.File, etc.
These were way too error-prone, especially when accidental unpacking
triggered unintended disk traversal and weird error states. We didn't
even use the implicit behavior because we preferred the full name for
heavy disk operations.
The motivation for this was Python not catching this bug, which is a bit
silly:
rid, rattr, *path_ = rbyd
And made it slightly darker to match arrows in light mode.
Just trying to make the separator look a bit nicer, but it's tricky
since this is the only non-tile non-text element.
This is a rework of dbgbmap.py to match dbgbmapd3.py, adopt the new
Rbyd/Lfs class abstractions, as well as Canvas, -k/--keep-open, etc.
Some of the main changes:
- dbgbmap.py now reports corrupt/conflict blocks, which can be useful
for debugging.
Note though that you will probably get false positives if running with
-k/--keep-open while something is writing to the disk. littlefs is
powerloss safe, not multi-write safe! Very different problem!
- dbgbmap.py now groups by blocks before mapping to the space filling
curve. This matches dbgbmapd3.py and I think is more intuitive now
that we have a bmap tiling algorithm.
-%/--usage still works, but is rendered as a second space filling
curve _inside_ the block tile. Different blocks can end up with
slightly different sizes due to rounding, but it's not the end of the
world.
I wasn't originally going to keep it around, but ended up caving, so
you can still get the original byte-level curve via -u/--contiguous.
- Like the other ascii rendering script, dbgbmap.py now supports
-k/--keep-open and friends as a thin main wrapper. This just makes it
a bit easier to watch a realtime bmap without needing to use watch.py.
- --mtree-only is supported, but filtering via --mdirs/--btrees/--data
is _not_ supported. This was too much complexity for a minor feature,
and doesn't cover other niche blocks like corrupted/conflict or parity
in the future.
- Things are more customizable thanks to the Attr class. For an example
you can now use the littlefs mount string as the title via
--title-littlefs.
- Support for --to-scale and -t/--tiny mode, if you want to scale based
on block_size.
One of the bigger differences dbgbmapd3.py -> dbgbmap.py is that
dbgbmap.py still supports -%/--usage. Should we backport -%/--usage to
dbgbmapd3.py? Uhhhh...
This ends up a funny example of raster graphics vs vector graphics. A
pixel-level space filling curve is easy with raster graphics, but with
an svg you'd need some sort of pixel -> path wrapping algorithm...
So no -%/--usage in dbgbmapd3.py for now.
Also just ripped out all of the -@/--blocks byte-level range stuff. Way
too complicated for what it was worth. -@/--blocks is limited to simple
block ranges now. High-level scripts should stick to high-level options.
One last thing to note is the adoption of "if '%' in label__" checks
before applying punescape. I wasn't sure if we should support punescape
in dbgbmap.py, since it's quite a bit less useful here, and may be
costly due to the lazy attr generation. Adding this simple check avoids
the cost and consistency question, so I adopted it in all scripts.
This matches the coloring in dbglfs.py for other erroneous conditions,
and also matches how we color hidden items when shown.
Also fixed some minor bugs in grm printing.
This can be useful when you just want to check for errors.
The only exception being dbgblock.py/dbgcat.py, since these don't really
have a concept of an error.
For more aggressive checking of filesystem state. These should match the
behavior of LFS_M_CKMETA/CKDATA in lfs.c.
Also tweaked dbgbmapd3.py (and eventually dbgmap.py) to match, though we
don't need new flags there since we're already checking every block in
the filesystem.
These were hard to read, especially in light mode (which I use the
least). They're still hard to read, but hopefully a bit less so:
- Decreased opacity of unfocused tiles 0.7 -> 0.5
- Don't unfocus unused blocks in dbgbmapd3.py
- Softened arrow color in light mode #000000 -> #555555
- Added Lfs.traverse for full filesystem traversal
- Added Rbyd.shrub flag so we can tell if an Rbyd is a shrub
- Removed redundant leaves from paths in leaf iters
Like codemapd3.py this include an interactive UI for viewing the
underlying filesystem graph, including:
- mode-tree - Shows all reachable blocks from a given block
- mode-branches - Shows immediate children of a given block
- mode-references - Shows parents of a given block
- mode-redund - Shows sibling blocks in redund groups (This is
currently just mdir pairs, but the plan is to add more)
This is _not_ a full filesystem explorer, so we don't embed all block
data/metadata in the svg. That's probably a project for another time.
However we do include interesting bits such as trunk addresses,
checksums, etc.
An example:
# create an filesystem image
$ make test-runner -j
$ ./scripts/test.py -B test_files_many -a -ddisk -O- \
-DBLOCK_SIZE=1024 \
-DCHUNK=10 \
-DSIZE=2050 \
-DN=128 \
-DBLOCK_RECYCLES=1
... snip ...
done: 2/2 passed, 0/2 failed, 164pls!, in 0.16s
# generate bmap svg
$ ./scripts/dbgbmapd3.py disk -b1024 -otest.svg \
-W1400 -H750 -Z --dark
updated test.svg, littlefs v0.0 1024x1024 0x{26e,26f}.d8 w64.128, cksu
m 41ea791e
And open test.svg in a browser of your choice.
Here's what the current colors mean:
- yellow => mdirs
- blue => btree nodes
- green => data blocks
- red => corrupt/conflict issue
- gray => unused blocks
But like codemapd3.py the output is decently customizable. See -h/--help
for more info.
And, just like codemapd3.py, this is based on ideas from d3 and
brendangregg's flamegraphs:
- d3 - https://d3js.org
- brendangregg's flamegraphs - https://github.com/brendangregg/FlameGraph
Note we don't actually use d3... the name might be a bit confusing...
---
One interesting change from the previous dbgbmap.py is the addition of
"corrupt" (bad checksum) and "conflict" (multiple parents) blocks, which
can help find bugs.
You may find the "conflict" block reporting a bit strange. Yes it's
useful for finding block allocation failures, but won't naturally formed
dags in file btrees also be reported as "conflicts"?
Yes, but the long-term plan is to move away from dags and make littlefs
a pure tree (for block allocator and error correction reasons). This
hasn't been implemented yet, so for now dags will result in false
positives.
---
Implementation wise, this script was pretty straightforward given prior
dbglfs.py and codemapd3.py work.
However there was an interesting case of https://xkcd.com/1425:
- Traverse the filesystem and build a graph - easy
- Tile a rectangle with n nice looking rectangles - uhhh
I toyed around with an analytical approach (something like block width =
sqrt(canvas_width*canvas_height/n) * block_aspect_ratio), but ended up
settling on an algorithm that divides the number of columns by 2 until
we hit our target aspect ratio.
This algorithm seems to work quite well, runs in only O(log n), and
perfectly tiles the grid for powers-of-two. Honestly the result is
better than I was expecting.
This fixes an issue where shrub trunks were never printed even with
-i/--internal.
While only showing mdir/shrub/btree/bptr addresses on block changes is
nice in theory, it results in shrub trunks never being printed because
the mdir -> shrub block doesn't change.
Also checking for changes in block type avoids this.
I'm trying to avoid having classes with different implementations across
scripts, as it makes updating things error-prone, but at same time
copying all the tree renderers to all dbg scripts would be a bit much.
Monkey-patching the TreeArt class in relevant scripts seems like a
reasonable compromise.
These are pretty script specific, so probably shouldn't be in the
abstract littlefs classes. This also avoids the tree renderers getting
copied into scripts that don't need them (mtree -> dbglfs.py, dbgbmap.py
in the future, etc).
This also makes TreeArt consistent with JumpArt and LifetimeArt.
This just organizes things a bit better and makes dbg_log less of a
monolith:
- JumpArt - Encapsulates ascii jump rendering (-j/--jumps)
- LifetimeArt - Encapsulates ascii lifetime rendering (-g/--lifetimes)
So, instead of trying to be clever with python's tuple globbing, just
rely on lazy tuple unpacking and a whole bunch of if statements.
This is more verbose, but less magical. And generally, the less magic
there is, the easier things are to read.
This also drops the always-tupled lookup_ variants, which were
cluttering up the various namespaces.
Also tweaked how we fetch shrubs, adding Rbyd.fetchshrub and
Btree.fetchshrub instead of overloading the bd argument.
Oh, and also added --trunk to dbgmtree.py and dbglfs.py. Actually
_using_ --trunk isn't advised, since it will probably just result in a
corrupted filesystem, but these scripts are for accessing things that
aren't normally allowed anyways.
The reason for dropping the list/tuple distinction is because it was a
big ugly hack, unpythonic, and likely to catch users (and myself) by
surprise. Now, Rbyd.fetch and friends always require separate
block/trunk arguments, and the exercise of deciding which trunk to use
is left up to the caller.
Why not, -e/--exec seems useful/general purpose enough to deserve a
shortform flag. Especially since much of our testing involves emulation.
The only risk of conflicts is with -e/--error-* in other scripts, but
the _whole point_ of test.py is to error on failure, so I don't think
this will be an issue.
Note that -E may be more useful for environment variables in the future.
I feel like -e/--exec was more common in other programs, but I've only
found sed -e and perl -e so far. Most programs stick to -c/--command
(bash, python) which would conflict with -c/--compile here.
So:
$ ./scripts/dbgflags.py -l LFS_I
Is equivalent to:
$ ./scripts/dbgflags.py -l I
This matches some of the implicit prefixing during name lookup:
$ ./scripts/dbgflags.py LFS_I_SYNC
$ ./scripts/dbgflags.py I_SYNC
$ ./scripts/dbgflags.py SYNC
So:
all_ = all; del all
Instead of:
import builtins
all_, all = all, builtins.all
The del exposes the globally scoped builtin we accidentally shadow.
This requires less megic, and no module imports, though tbh I'm
surprised it works.
It also works in the case where you change a builtin globally, but
that's a bit too crazy even for me...
The inconsistency between inner/non-inner (-i/--inner) views was a bit
too confusing.
At least now the bptr rendering in dbglfs.py matches behavior, showing
the bptr tag -> bptr jump even when not showing inner nodes.
If the point of these renderers is to show all jumps necessary to reach
a given piece of data, hiding bptr jumps only sometimes is somewhat
counterproductive...
I'm starting to regret these reworks. They've been a big time sink. But
at least these should be much easier to extend with the future planned
auxiliary trees?
New classes:
- Bptr - A representation of littlefs's data-only block pointers.
Extra fun is the lazily checked Bptr.__bool__ method, which should
prevent slowing down scripts that don't actually verify checksums.
- Config - The set of littlefs config entries.
- Gstate - The set of littlefs gstate.
I may have had too much fun with Config and Gstate. Not only do these
provide lookup functions for config/gstate, but known config/gstate
get lazily parsed classes that can provide easy access to the relevant
metadata.
These even abuse Python's __subclasses__, so all you need to do to add
a new known config/gstate is extend the relevant Config.Config/
Gstate.Gstate class.
The __subclasses__ API is a weird but powerful one.
- Lfs - The big one, a high-level abstraction of littlefs itself.
Contains subclasses for known files: Lfs.Reg, Lfs.Dir, Lfs.Stickynote,
etc, which can be accessed by path, did+name, mid, etc. It even
supports iterating over orphaned files, though it's expensive (but
incredibly valuable for debugging!).
Note that all file types can currently have attached bshrubs/btrees.
In the existing implementation only reg files should actually end up
with bshrubs/btrees, but the whole point of these scripts is to debug
things that _shouldn't_ happen.
I intentionally gave up on providing depth bounds in Lfs. Too
complicated for something so high-level.
On noteworthy change is not recursing into directories by default. This
hopefully avoids overloading new users and matches the behavior of most
other Linux/Unix tools.
This adopts -r/--recurse/--file-depth for controlling how far to recurse
down directories, and -z/--depth/--tree-depth for controlling how far to
recurse down tree structures (mostly files). I like this API. It's
consistent with -z/--depth in the other dbg scripts, and -r/--recurse is
probably intuitive for most Linux/Unix users.
To make this work we did need to change -r/--raw -> -x/--raw. But --raw
is already a bit of a weird name for what really means "include a hex
dump".
Note that -z/--depth/--tree-depth does _not_ imply --files. Right now
only files can contain tree structures, but this will change when we get
around to adding the auxiliary trees.
This also adds the ability to specify a file path to use as the root
directory, though we need the leading slash to disambiguate file paths
and mroot addresses.
---
Also tagrepr has been tweaked to include the global/delta names,
toggleable with the optional global_ kwarg.
Rattr now has its own lazy parsers for did + name. A more organized
codebase would probably have a separate Name type, but it just wasn't
worth the hassle.
And the abstraction classes have all been tweaked to require the
explicit Rbyd.repr() function for a CLI-friendly representation. Relying
on __str__ hurt readability and debugging, especially since Python
prefers __str__ over __repr__ when printing things.
The main difference between -t/--tree and -R/--tree-rbyd is that only
the latter shows all internal jumps (unconditional alt->alt), so it
makes sense to also hide internal branches (rbyd->rbyd).
Note that we already hide the rbyd->block branches in dbglfs.py.
Also added color-ignoring comparison operators to our internal
TreeBranch struct. This fixes an issue where our non-inner branch
merging logic could end up with identical branches with different
colors, resulting in different colorings per run. Not the end of the
world, but something we want to avoid.
This requires an additional traversal of the mtree just to precalculate
the mrid width (mbits provides an upper-bound, but the actual number of
mrids in any given mdir may be much less), but it makes the output look
nicer.
This is where the high-level structure of littlefs starts to reveal
itself.
This is also where a lot of really annoying Mtree vs Btree API questions
come to a head, like should Mtree.lookup return an Mdir or an Rattr?
What about Btree.lookup? What gets included in the returned path in all
of these? Well, at least this is an interesting exercise in rethinking
littlefs's internal APIs...
New classes:
- Mid - A representation of littlefs's metadata ids. I've just gone
ahead and included the block_size-dependent mbits as a field in every
Mid instance to try to make Mid operations easier.
It's not like we care about one extra word of storage in Python.
- Mdir - Again, we intentionally _don't_ inherit Rbyd to try to reduce
type errors, though Mdirs really are just Rbyds in this design.
- Mtree - The skeleton of littlefs. Tricky bits include traversing the
mroot chain and handling mroot-inlined mdirs. Note mroots are included
in the mdir/mid iteration methods.
Getting the tree renderers all working again was a real pain in the ass.
Now that these are contained in the Rattr class, including the
tag/weight just clutters these APIs and makes things more confusing.
To make this more convenient, I've adding __iter__ methods that allow
unpacking both the Rattr and Ralt classes. These more-or-less represent
tag+weight+data tuples anyways.
Like the Rbyd class, Btree serves as an abstraction for littlefs's
btrees in Python.
New classes:
- Btree - btree abstraction, note this does _not_ inherit from Rbyd. I
find that sort of inheritance too error-prone. Instead Btree
_contains_ the root rbyd, which can always be accessed via Btree.rbyd.
If you want low-level root-rbyd details, just access Btree.rbyd.
Though most fields that are relevant to the Btree are also forwarded
via Python's @property properties.
- Bd - This just serves as a handle for the disk file that includes
block_size/block_count metadata.
One important change to note is the adoption of required vestigial names
in all btree nodes (yes this scripts was written... checks notes...
2 years ago... even the same month huh). This means we don't need the
parent name mapping, so the non-inner btree printing code no longer
needs to be extremely confusing at all times.
Also adopted the Rbyd class and friends, and backported Bd to
dbgrbyd.py.
Also tried to give a couple useful algorithms their own self-contained
functions, mainly:
- pathdelta - for emulating a traversal over exhaustive paths
- treerepr - for the common ascii tree rendering code
Just some minor tweaks:
- rbydaddr: Return list instead of tuple, note we rely on the type
distinction in Rbyd.fetch now.
- tagrepr: Rename w -> weight.
This reworks dbgrbyd.py to use the Rbyd class (well, a rewrite of the
Rbyd class) as an abstraction of littlefs's rbyd disk structure in
Python.
Duplicating common classes/functions across these scripts has proven
useful for sharing code without preventing these scripts from being
standalone (a problem for _actual_ code sharing, relative imports, etc).
And, because of how these scripts were written, dbgrbyd.py humorously
ended up the only script not sharing the Rbyd class.
I'm also trying to make the actual Rbyd abstraction a bit more concrete
now that the filesystem's design has had some time to mature. This means
more classes for things like Rattrs that reduce the sheer number of
tuples that were flying around.
New classes:
- Rattr - rbyd attrs, tag + weight + data, this includes all relevant
offsets which is useful for rendering hexdumps/etc.
- Ralt - rbyd alt pointers, useful for building tree representations.
- Rbyd - rbyd abstraction, including lookup/traversal methods
Note also that while the Rbyd class replaces most of the dbg_tree logic,
dbg_log is still pretty low-level and abstractionless.
---
Eventually I hope to have well defined classes for Btrees, Mdirs, Files,
etc, to make it easier to write more interesting debug scripts such as
dbgbmap.py.
Separating Btree, Mdirs, etc also means we shouldn't need the hacky
btree_lookup/tree_lookup methods in every script anymore. Having those
in dbgrbyd.py would've been a bit weird.
Might as well, since we already need to find this to calculate stack
info.
I've been considering adding -z/--depth to these scripts as well, but
that would require quite a bit more work. It's probably not worth the
added complexity/headache. Depth termination would need to happen on the
javascript side, and we'd still need cycle detection anyways.
But an error code is easy to add.
This drops the option to read tags from a disk file. I don't think I've
ever used this, and it requires quite a bit of circuitry to implement.
Also dropped -s/--string, because most tags can't be represented as
strings?
And tweaked -x/--hex flags to correctly parse spaces in arguments, so
now these are equivalent:
- ./scripts/dbgtag.py -x 00 03 00 08
- ./scripts/dbgtag.py -x "00 03 00 08"
I mean, why not. dbgblock.py is already a bit special compared to the
other dbg scripts:
$ ./scripts/dbgblock.py disk -b4096 0 1 -n16
block 0x0, size 16, cksum a90f45b6
00000000: 68 69 21 0e 00 03 00 08 6c 69 74 74 6c 65 66 73 hi!.....littlefs
block 0x1, size 16, cksum 01e5f5e4
00000000: 68 69 21 0c 80 03 00 08 6c 69 74 74 6c 65 66 73 hi!.....littlefs
This matches dbgcat.py, which is useful when switching between the two
for debugging pipelines, etc.
We want dbgblock.py/dbgcat.py to be as identical as possible, and if you
removed the multiple blocks from dbgcat.py you'd have to really start
asking why it's named dbgCAT.py.
Mainly fixing unbounded ranges, which required a bit of tweaking of when
we flatten block arguments.
This adopts the trick of using slice as the representation of, well,
slices in arguments instead of tuples. This avoids type confusion with
rbydaddr also returning tuples (of tuples!).
This finally solves the how-do-I-make-space-for-shell-prompts problem:
- plot.py -H0 => use full terminal height
- plot.py -H-1 => use height-1, making space for shell prompts
- plot.py -H => automatic based on other flags
While also allowing other carveouts in case your prompt takes up more
than 1 line.
Unfortunately this does make -H (no arg) subtly different from -H0, but
sometimes you can't have everything.
This simplifies plot.py's -k/--keep-open logic into a self-contained
loop that just calls main_ on an update.
This is a compromise on getting rid of -k/--keep-open completely, since
we _could_ just rely on watch.py. But plot.py knowing which argument is
the file to watch is convenient.
The eventual plan is to adopt this small bit of copy-pastable-code in
the other ascii-art scripts (treemap.py, dbgbmap.py, etc).
So:
- before: ./scripts/dbgbmap.py disk -b4096 -@0 -n16,32
- after: ./scripts/dbgbmap.py disk -b4096 -@'0 -n16,32'
This is mainly to avoid the naming conflict between -n/--size and
-n/--lines, while also separating out the namespaces a bit.
It's probably not the most intuitive CLI UI, but --off and -n/--size are
probably infrequent arguments at this level of script anyways.
Mostly to move away from unnecessary shortform flags. Using shortform
flags for what is roughly an unbounded enum just causes too many flag
conflicts as scripts grow:
- -r/--read -> --reads
- -p/--prog -> --progs
- -e/--erase -> --erases
- -w/--wear -> --wear
- -i/--in-use -> -%/--usage
- -M/--mdirs -> --mdirs
- -B/--btrees -> --btress
- -D/--datas -> --data/--datas
I may have had too much fun forcing argparse to make -%/--usage to work.
The percent sign caused a lot of problems for argparse internally.
--no-header doesn't really deserve a shortform, and this risks conflicts
with -N/--notes in the future, not to mention any other number of flags
that can start with --no-*.
- Fixed a NameError in watch.py caused by an outdated variable name
(renamed paths -> keep_open_paths). Yay for dynamic typing.
- Fixed fieldnames is None issue when csv file is empty.
For the same reason we output all field fields by default: Because
machines can process more information than humans can.
Worst case, by fields can still be limited via explicit -b/--by flags.
This should have no noticeable impact on plot.py, but shared classes
have proven helpful for maintaining these scripts.
Unfortunately, this did require some tweaking of the Canvas class to get
things working.
Now, instead of storing things in an internal high-resolution grid,
the Canvas class only keeps track of the most recent character, with
bitmasked ints storing sub-char info.
This makes it so sub-char draws overwrite full characters, which is
necessary for plot.py's axis/data overlap to work.
This only failed if "-" was used as an argument (for stdin/stdout), so
the issue was pretty hard to spot.
openio is a heavily copy-pasted function, so it makes sense to just add
the import os to openio directly. Otherwise this mistake will likely
happen again in the future.
- -*/--add-char/--chars -> -./--add-char/--chars
- -./--points -> -p/--points
- -!/--points-and-lines -> -P/--points-and-lines
Also fixed an issue in plot.py/Attr where non-list default were failing
to concatenate.
And added the optional --no-label to explicitly opt out.
This is a bit more consistent with treemapd3.py/codemapd3.py's handling
of labels, while still keeping the no-label default. It also makes it
easier to temporarily hide labels when editing commands.