Before this, the only option for ordering the legend was by specifying
explicit -L/--add-label labels. This works for the most part, but
doesn't cover the case where you don't know the parameterization of the
input data.
And we already have -s/-S flags in other csv scripts, so it makes sense
to adopt them in plot.py/plotmpl.py to allow sorting by one or more
explicit fields.
Note that -s/-S can be combined with explicit -L/--add-labels to order
datasets with the same sort field:
$ ./scripts/plot.py bench.csv \
-bBLOCK_SIZE \
-xn \
-ybench_readed \
-ybench_proged \
-ybench_erased \
--legend \
-sBLOCK_SIZE \
-L'*,bench_readed=bs=%(BLOCK_SIZE)s' \
-L'*,bench_proged=' \
-L'*,bench_erased='
---
Unfortunately this conflicted with -s/--sleep, which is a common flag in
the ascii-art scripts. This was bound to conflict with -s/--sort
eventually, so a came up with some alternatives:
- -s/--sleep -> -~/--sleep
- -S/--coalesce -> -+/--coalesce
But I'll admit I'm not the happiest about these...
Globs in CLI attrs (-L'*=bs=%(bs)s' for example), have been remarkably
useful. It makes sense to extend this to the other flags that match
against CSV fields, though this does add complexity to a large number of
smaller scripts.
- -D/--define can now use globs when filtering:
$ ./scripts/code.py lfs.o -Dfunction='lfsr_file_*'
-D/--define already accepted a comma-separated list of options, so
extending this to globs makes sense.
Note this differs from test.py/bench.py's -D/--define. Globbing in
test.py/bench.py wouldn't really work since -D/--define is generative,
not matching. But there's already other differences such as integer
parsing, range, etc. It's not worth making these perfectly consistent
as they are really two different tools that just happen to look the
same.
- -c/--compare now matches with globs when finding the compare entry:
$ ./scripts/code.py lfs.o -c'lfs*_file_sync'
This is quite a bit less useful that -D/--define, but makes sense for
consistency.
Note -c/--compare just chooses the first match. It doesn't really make
sense to compare against multiple entries.
This raised the question of globs in the field specifiers themselves
(-f'bench_*' for example), but I'm rejecting this for now as I need to
draw the complexity/scope _somewhere_, and I'm worried it's already way
over on the too-complex side.
So, for now, field names must always be specified explicitly. Globbing
field names would add too much complexity. Especially considering how
many flags accept field names in these scripts.
This was broken:
$ ./scripts/plotmpl.py -L'*=bs=%(bs)s'
There may be a better way to organize this logic, but spamming if
statements works well enough.
Two new tricks:
1. Hide the cursor while redrawing the ring buffer.
2. Build up the entire redraw in RAM first, and render everything in a
single write call.
These _mostly_ get rid of the cursor flickering issues in rapidly
updating scripts.
No one is realistically ever going to use this.
Ascii art is just too low resolution, trying to pad anything just wastes
terminal space. So we might as well not support --padding and save on
the additional corner cases.
Worst case, in the future we can always find this commit and revert
things.
Replacing -R/--aspect-ratio, --to-ratio now calculates the width/height
_before_ adding decoration such as headers, stack info, etc.
I toying around with generalizing -R/--aspect-ratio to include
decorations, but when Wolfram Alpha spit this mess for the post-header
formula:
header*r - sqrt(4*v*r + padding^2*r)
w = ------------------------------------
2
I decided maybe a generalized -R/--aspect-ratio is a _bit_ too
complicated for what are supposed to be small standalone Python
scripts...
---
Also fixed the scaling formula, which should've taken the sqrt _after_
multiplying by the aspect ratio:
w = sqrt(v*r)
I only noticed while trying to solve for the more complicated
post-decoration formula, the difference is pretty minor.
Crashing on invalid input isn't the _worst_ behavior, but with a few
tweaks we can make these scripts more-or-less noop in such cases. This
is useful when running with -k/--keep-open since intermediate file
states often contain garbage.
(Ironically one of the precise problems littlefs is trying to solve.)
Also added a special case to treemap.py/codemap.py to not output the
canvas if there's nothing to show and height is implicit. Otherwise the
history mode with -n/--lines ends up filled with blank lines.
Note this makes -H1 subtly different from no -H/--height, with -H1
printing a blank line if there is nothing to show. The -H1 behavior may
also be useful in niche cases where you want that part of the screen
cleared.
---
This was found while trying to run codemap.py -k -n5 during compilation.
GCC writes object files incrementally, and this was breaking our script.
The notable exception being plot.py, where line-level history doesn't
really make sense.
These scripts all default to height=1, and -n/--lines can be useful for
viewing changes over time.
In theory you could achieve something similar to this with tailpipe.py,
but you would lose the header info, which is useful.
---
Note, as a point of simplicity, we do _not_ show sub-char history like
we used to in tracebd.py. That was way too complicated for what it was
worth.
This simplifies attrs a bit, and scripts can always override
__getitem__ if they want to provide lazy attr generation.
The original intention of accepting functions was to make lazy attr
generation easier, but while tinkering around with the idea I realized
the actual attr mapping/generation would be complicated enough that
you'd probably want a full class anyways.
All of our scripts are only using dict attrs anyways. And lazy attr
generation is probably a premature optimization for the same reason
everyone's ok with Python's slices being O(n).
This mirrors how -H/--height and -W/--width work, with -n-1 using the
terminal height - 1 for the output.
This is very useful for carving out space for the shell prompt and other
things, without sacrificing automatic sizing.
This allows for combining braille/dots with custom chars for specific
elements:
$ ./scripts/codemap.py lfs.o -H16 -: -.lfsr_rbyd_appendrattr=A
Note this is already how plot.py works, letting braille/dots take
priority in the new scripts/reworks was just an oversight.
This is a rework of dbgbmap.py to match dbgbmapd3.py, adopt the new
Rbyd/Lfs class abstractions, as well as Canvas, -k/--keep-open, etc.
Some of the main changes:
- dbgbmap.py now reports corrupt/conflict blocks, which can be useful
for debugging.
Note though that you will probably get false positives if running with
-k/--keep-open while something is writing to the disk. littlefs is
powerloss safe, not multi-write safe! Very different problem!
- dbgbmap.py now groups by blocks before mapping to the space filling
curve. This matches dbgbmapd3.py and I think is more intuitive now
that we have a bmap tiling algorithm.
-%/--usage still works, but is rendered as a second space filling
curve _inside_ the block tile. Different blocks can end up with
slightly different sizes due to rounding, but it's not the end of the
world.
I wasn't originally going to keep it around, but ended up caving, so
you can still get the original byte-level curve via -u/--contiguous.
- Like the other ascii rendering script, dbgbmap.py now supports
-k/--keep-open and friends as a thin main wrapper. This just makes it
a bit easier to watch a realtime bmap without needing to use watch.py.
- --mtree-only is supported, but filtering via --mdirs/--btrees/--data
is _not_ supported. This was too much complexity for a minor feature,
and doesn't cover other niche blocks like corrupted/conflict or parity
in the future.
- Things are more customizable thanks to the Attr class. For an example
you can now use the littlefs mount string as the title via
--title-littlefs.
- Support for --to-scale and -t/--tiny mode, if you want to scale based
on block_size.
One of the bigger differences dbgbmapd3.py -> dbgbmap.py is that
dbgbmap.py still supports -%/--usage. Should we backport -%/--usage to
dbgbmapd3.py? Uhhhh...
This ends up a funny example of raster graphics vs vector graphics. A
pixel-level space filling curve is easy with raster graphics, but with
an svg you'd need some sort of pixel -> path wrapping algorithm...
So no -%/--usage in dbgbmapd3.py for now.
Also just ripped out all of the -@/--blocks byte-level range stuff. Way
too complicated for what it was worth. -@/--blocks is limited to simple
block ranges now. High-level scripts should stick to high-level options.
One last thing to note is the adoption of "if '%' in label__" checks
before applying punescape. I wasn't sure if we should support punescape
in dbgbmap.py, since it's quite a bit less useful here, and may be
costly due to the lazy attr generation. Adding this simple check avoids
the cost and consistency question, so I adopted it in all scripts.
Like codemapd3.py this include an interactive UI for viewing the
underlying filesystem graph, including:
- mode-tree - Shows all reachable blocks from a given block
- mode-branches - Shows immediate children of a given block
- mode-references - Shows parents of a given block
- mode-redund - Shows sibling blocks in redund groups (This is
currently just mdir pairs, but the plan is to add more)
This is _not_ a full filesystem explorer, so we don't embed all block
data/metadata in the svg. That's probably a project for another time.
However we do include interesting bits such as trunk addresses,
checksums, etc.
An example:
# create an filesystem image
$ make test-runner -j
$ ./scripts/test.py -B test_files_many -a -ddisk -O- \
-DBLOCK_SIZE=1024 \
-DCHUNK=10 \
-DSIZE=2050 \
-DN=128 \
-DBLOCK_RECYCLES=1
... snip ...
done: 2/2 passed, 0/2 failed, 164pls!, in 0.16s
# generate bmap svg
$ ./scripts/dbgbmapd3.py disk -b1024 -otest.svg \
-W1400 -H750 -Z --dark
updated test.svg, littlefs v0.0 1024x1024 0x{26e,26f}.d8 w64.128, cksu
m 41ea791e
And open test.svg in a browser of your choice.
Here's what the current colors mean:
- yellow => mdirs
- blue => btree nodes
- green => data blocks
- red => corrupt/conflict issue
- gray => unused blocks
But like codemapd3.py the output is decently customizable. See -h/--help
for more info.
And, just like codemapd3.py, this is based on ideas from d3 and
brendangregg's flamegraphs:
- d3 - https://d3js.org
- brendangregg's flamegraphs - https://github.com/brendangregg/FlameGraph
Note we don't actually use d3... the name might be a bit confusing...
---
One interesting change from the previous dbgbmap.py is the addition of
"corrupt" (bad checksum) and "conflict" (multiple parents) blocks, which
can help find bugs.
You may find the "conflict" block reporting a bit strange. Yes it's
useful for finding block allocation failures, but won't naturally formed
dags in file btrees also be reported as "conflicts"?
Yes, but the long-term plan is to move away from dags and make littlefs
a pure tree (for block allocator and error correction reasons). This
hasn't been implemented yet, so for now dags will result in false
positives.
---
Implementation wise, this script was pretty straightforward given prior
dbglfs.py and codemapd3.py work.
However there was an interesting case of https://xkcd.com/1425:
- Traverse the filesystem and build a graph - easy
- Tile a rectangle with n nice looking rectangles - uhhh
I toyed around with an analytical approach (something like block width =
sqrt(canvas_width*canvas_height/n) * block_aspect_ratio), but ended up
settling on an algorithm that divides the number of columns by 2 until
we hit our target aspect ratio.
This algorithm seems to work quite well, runs in only O(log n), and
perfectly tiles the grid for powers-of-two. Honestly the result is
better than I was expecting.
--no-header doesn't really deserve a shortform, and this risks conflicts
with -N/--notes in the future, not to mention any other number of flags
that can start with --no-*.
- Fixed a NameError in watch.py caused by an outdated variable name
(renamed paths -> keep_open_paths). Yay for dynamic typing.
- Fixed fieldnames is None issue when csv file is empty.
This should have no noticeable impact on plot.py, but shared classes
have proven helpful for maintaining these scripts.
Unfortunately, this did require some tweaking of the Canvas class to get
things working.
Now, instead of storing things in an internal high-resolution grid,
the Canvas class only keeps track of the most recent character, with
bitmasked ints storing sub-char info.
This makes it so sub-char draws overwrite full characters, which is
necessary for plot.py's axis/data overlap to work.
This only failed if "-" was used as an argument (for stdin/stdout), so
the issue was pretty hard to spot.
openio is a heavily copy-pasted function, so it makes sense to just add
the import os to openio directly. Otherwise this mistake will likely
happen again in the future.
- -*/--add-char/--chars -> -./--add-char/--chars
- -./--points -> -p/--points
- -!/--points-and-lines -> -P/--points-and-lines
Also fixed an issue in plot.py/Attr where non-list default were failing
to concatenate.
And added the optional --no-label to explicitly opt out.
This is a bit more consistent with treemapd3.py/codemapd3.py's handling
of labels, while still keeping the no-label default. It also makes it
easier to temporarily hide labels when editing commands.
So by default, instead of just using "." for tiles, we use interesting
parts of the tile's name:
- For treemap.py, we use the first character of the last by-field (so
"lfs.c,lfsr_file_write,1234" -> "1").
- For codemap.py, we use the first character of the non-subsystem part
of the function name (so "lfsr_file_write" -> "w").
This nice thing about this, is the resulting treemap is somewhat
understandable even without colors:
$ ./scripts/codemap.py lfs.o lfs_util.o lfs.ci lfs_util.ci -W60 -H8
code 35528 stack 2440 ctx 636
ffffffoooffaaaaaaaaaaaacccccccccttttccccrrrrpgffmmrraifmmcss
ffffffwwwttaaaaaaaaaaaacccccccccttttccccrprrpcscmmoommrrcepp
ffffffwwwttaaaaaaaaalllcccccccccttttccccrpppccscmmsrmmrrrrss
ccccssrrfclaaaaanneeasscccccccccgpppccccrpppsgsummstmmrrlfgf
ccccssrrfccaaaaanneeaaaccccccsaagpppcccccrrrfrrcccrrfiiilucs
ccccssrrtfcfffffaapplcccccccclssgnnllllcrrffrrrccccifssscmcm
ccccssrrtrdfffffaapppapcccfffllsgnnllllcrrrffrrcccorfsssicnu
Ok, so maybe the word "somewhat" is doing a lot of heavy lifting...
Like codemapd3.py, but with an ascii renderer.
This is basically just codemapd3.py and treemap.py smooshed together.
It's not the cleanest, but it gets the job done. codemap.py is not
the most critical of scripts.
Unfortunately callgraph and stack/ctx info are difficult (impossible?)
to render usefully in ascii, but we can at least do the script calling,
parsing, namespacing, etc, necessary to create the code cost tilemap.
This turns out to be extremely useful, for the sole purpose of being
able to specify colors/formats/etc in csv fields (-C'%(fields)s' for
example, or -C'#%(field)06x' for a cooler example).
This is a bit tricky for --chars, but doable with a psplit helper
function.
Also fixed a bug in plot.py where we weren't using dataattrs_ correctly.
Even though I think this makes less sense for the ascii-rendering
scripts, it's useful to have this flag around when jumping between
treemap.py and treemapd3.py.
And it might actually make sense sometimes now that -t/--tiny does not
override --to-scale.
This just makes dat behave similarly to Python's getattr, etc:
- dat("bogus") -> raises ValueError
- dat("bogus", 1234) -> returns 1234
This replaces try_dat, which is easy to forget about when copy-pasting
between scripts.
Though all of this wouldn't be necessary if only we could catch
exceptions in expressions...
Inspired heavily by d3 and brendangregg's flamegraphs, codemapd3.py is
intended to be a powerful high-level code exploring tool.
It's a visual tool, so probably best explained visually:
$ CFLAGS='-DLFS_NO_LOG -DLFS_NO_ASSERT' make -j
$ ./scripts/codemapd3.py \
lfs.o lfs_util.o \
lfs.ci lfs_util.ci \
-otest.svg -W1500 -H700 --dark
updated test.svg, code 35528 stack 2440 ctx 636
And open test.svg in a browser of your choice.
(TODO add a make rule for this)
---
Features include:
- Rendering of code cost in a treemap organized by subsystem (based on
underscore-separated namespaces), making it relatively easy to see
where the bulk of our code cost comes from.
- Rendering of the deepest stack/ctx cost as a set of tiles, making it
relatively easy to see where the bulk of our stack cost comes from.
- Interactive (on mouseover) rendering of callgraph info, showing
dependencies and relevant stack/ctx costs per-function.
This currently includes 4 modes:
1. mode-callgraph - This shows the full callgraph, including all
children's children, which is effectively all dependencies of that
function, i.e. the total code cost necessary for that _specific_
function to work.
2. mode-deepest - This shows the deepest/hot path of calls from that
function, which is every child that contributes to the function's
stack cost.
3. mode-callees - This shows all functions the current function
immediately calls.
4. mode-callers - This shows all functions that call the current
function.
And yes, cycles are handled correctly: We show the deepest
non-cyclical path, but display the measured stack usage as infinite.
For more details see ./scripts/codemapd3.py --help.
---
One particularly neat feature I'm happy about is -t/--tiny, which scales
the resulting image such that 1 pixel ~= 1 byte. This should be useful
for comparing littlefs to other filesystems in a way that is visually
interesting.
- d3 - https://d3js.org
- brendangregg's flamegraphs - https://github.com/brendangregg/FlameGraph
The previous behavior of -N/--no-header still rendering a header when
--title is also provided was confusing. I think this is a better API,
at the minor cost of needing to pass one more flag if you don't want
stats in the header.
I guess in addition to its other utilities, csv.py is now also turning
into a sort of man database for some of the more complicated APIs in the
scripts:
./csv.py --help
./csv.py --help-exprs
./csv.py --help-mods
It's a bit minimal, but better than nothing.
Also dropped the %c modifier because this never actually worked.
This adopts the Attr rework for the --add-xticklabel and
--add-yticklabel flags.
Sort of.
These require a bit of special behavior to make work, but should at
least be externally consistent with the other Attr flags.
Instead of assigning to by-field groups, --add-xticklabel/yticklabel
assign to the relevant x/y coord:
$ ./scripts/plotmpl.py \
--add-xticklabel='0=zero' \
--add-yticklabel='100=one-hundred'
The real power comes from our % modifiers. As a special case,
--add-xticklabel/yticklabel can reference the special x/y field, which
represents the current x/y coord:
$ ./scripts/plotmpl.py --y2 --yticks=5 --add-yticklabel='%(y)d KiB'
Combined with format specifiers, this allows for quite a bit:
$ ./scripts/plotmpl.py --y2 --yticks=5 --add-yticklabel='0x%(y)04x'
---
Note that plot.py only shows the min/max x/yticks, so plot.py only
accepts indexed --add-xticklabel/yticklabels, and will error if the
assigning variant is used.
Unifying these complicated attr-assigning flags across all the scripts
is the main benefit of the new internal Attr system.
The only tricky bit is we need to somehow keep track of all input fields
in case % modifiers reference fields, when we could previously discard
non-data fields.
Tricky but doable.
Updated flags:
- -L/--label -> -L/--add-label
- --colors -> -C/--add-color
- --formats -> -F/--add-format
- --chars -> -*/--add-char/--chars
- --line-chars -> -_/--add-line-char/--line-chars
I've also tweaked Attr to accept glob matches when figuring out group
assignments. This is useful for matching slightly different, but
similarly named results in our benchmark scripts.
There's probably a clever way to do this by injecting new by fields with
csv.py, but just adding globbing is simpler and makes attr assignment
even more flexible.
No more special indexed attrs at the top-level, now all attrs are
indexed, even if assigned to a specific group.
This just makes it so group-specific cycles are possible:
$ ./scripts/treemap.py -Clfs.c=red -Clfs.c=green
Now, instead of specifying a specific field or comma-separated set of
order-defined constants, -L/--add-label, -C/--add-color, and
-./--add-char/--chars accept a by-field group assignment similar to
-L/--label in plotmpl.py.
I also reworked our % modifiers to behave a bit more like printf
modifiers with optional field targets.
It gets a bit complicated, but this ends up extremely flexible:
- Assign to a specific group:
$ ./scripts/treemap.py -Clfs.c,lfsr_format=orange
- Note this is heirarchical, with more specific groups taking priority:
$ ./scripts/treemap.py -Clfs.c=blue -Clfs.c,lfsr_format=orange
- We can still get the order-assigned behavior by specifying multiple
options, but note there is no longer a comma ambiguity! This is useful
if you want to specify a palette and don't care which dataset gets
which attr:
$ ./scripts/treemap.py -Cred -Cgreen -Cblue
- Mix and match:
$ ./scripts/treemap.py -Cred -Cgreen -Cblue -Clfsr_format=orange
- And with the new % modifiers, we can still use labels stored in a
field:
$ ./scripts/treemap.py -L'%(label_field)s'
- -./--add-char/--chars in treemap.py is a bit of a special case. Since
it only accepts single characters, we can still accept multiple
options with a single flag without having to worry about ambiguities:
$ ./scripts/treemap.py -.asdf
Well, unless you want to include a literal '='. This is possible, but
a bit messy:
$ ./scripts/treemap.py -.as -.=== -.df
Yes that is 3 equal signs... One for argparse, one for the assignment,
one for the '=' literal.
This one is minor, but nice for terseness.
A painful lesson learned from plot[mpl].py: we should never implicitly
sum results in a late-stage rendering script. It just makes it way to
easy to accidentally render incorrect/misleading data, while being
difficult to notice.
We should always render redundant results as redundant results.
If the redundant results are an error, this hopefully makes the problem
more obvious to the user. And if the user really does want summed
results, they can always use csv.py as an intermediate step:
$ ./scripts/treemap.py \
<(./scripts/csv.py lfs.code.csv -bfile -fsize -q -o-)
-fsize
This adds --rectify for a parent-aspect-ratio-preserving --squarify
variant, reverting squarify to try to match the aspect ratio of a
square (1:1).
I can see arguments for both of these. On one hand --squarify makes the
squarest squares, which according to Mark Bruls et al's paper on the
topic is easier visually compare. On the other hand --rectify may be
more visually pleasing and fit into parent tiles better.
d3 allows for any ratio, but at the moment I'm not seeing a strong
reason for the extra parameter.
Like treemap.py, but outputting an svg file, which is quite a bit more
useful.
Things svg is _not_:
- A simple vector graphics format
Things svg _is_:
- A surprisingly powerful high-level graphics language.
I might have to use svgs as an output format more often. It's
surprisingly easy to generate graphics without worrying about low-level
rendering details.
---
Aside from the extra flags for svg details like font, padding,
background colors, etc, the main difference between treemap.py and
treemapd3.py is the addition of the --nested mode, which renders a
containing tile for each recursive group (each -b/--by field).
There's no way --nested would've worked in treemap.py. The main benefit
is the extra labels per subgroup, which are already hard enough to read
in treemap.py.
Other than that, treemapd3.py is mostly the same as treemap.py, but with
a resolution that's actually readable.
Based on the d3 javascript library (https://d3js.org), treemap.py
renders heirarchical data as ascii art:
$ ./scripts/treemap.py lfs.code.csv \
-bfunction -fsize --chars=asdf -W60 -H8
total 65454, avg 369 +-366.8σ, min 3, max 4990
aaaassssddddddaaaadddddssddfffaaadfffaassaassfasssdfdfsddfad
aaaassssddddddaaaadddddssddfffaaadfffaassdfaafasssdfdfsddfsf
aaaassssddddddaaaafffffssddfffsssdaaaddffdfaadfaaasdfafaasfa
aaaassssddddddaaaafffffaaaddddsssaassddffdfaaffssfssfsfadffa
aaaassssffffffssssfffffaaaddddsssaassssffddffffssfdffsadfsad
aaaassssffffffssssaaaaasssffffddfaassssaaassdaaddadffsadadad
aaaassssffffffssssaaaaasssffffddfddffddssassdfassadffsadaffa
aaaassssffffffssssaaaaasssffffddfddffddssassdfaddsdadasfsada
(Normally this is also colored, but you know.)
I've been playing around with d3 to try to better visualize code costs
in littlefs, and it's been quite neat. I figured it would be useful to
directly integrate a similar treemap renderer into our result scripts.
That being said, this ascii rendering is probably too difficult to parse
for any non-trivial data. I'm also working on an svg-based renderer, so
treemap.py is really just for in-terminal previews and an exercise to
understand the underlying algorithms, similar to plot.py/plotmpl.py.