Without this, naming a column i/children/notes in csv.py could cause
things to break. Unlikely for children/notes, but very likely for i,
especially when benchmarking.
Unfortunately namedtuple makes this tricky. I _want_ to just rename
these to _i/_children/_notes and call the problem solved, but namedtuple
reserves all underscore-prefixed fields for its own use.
As a workaround, the table renderer now looks for _i/_children/_notes at
the _class_ level, as an optional name of which namedtuple field to use.
This way Result types can stay lightweight namedtuples while including
extra table rendering info without risk of conflicts.
This also makes the HotResult type a bit more funky, but that's not a
big deal.
This extends the recursive part of the table renderer to sort children
by the optional "i" field, if available.
Note this only affects children entries. The top-level entries are
strictly ordered by the relevant "by" fields. I just haven't seen a use
case for this yet, and not sorting "i" at the top-level reduces that
number of things that can go wrong for scripts without children.
---
This also rewrites -t/--hot to take advantage of children ordering by
injecting a totally-no-hacky HotResult subclass.
Now -t/--hot should be strictly ordered by the call depth! Though note
entries that share "by" fields are still merged...
This also gives us a way to introduce the "cycle detected" note and
respect -z/--depth, so overall a big improvement for -t/--hot.
We don't really need padding for the notes on the last column of tables,
which is where row-level notes end up.
This may seem minor, but not padding here avoids quite a bit of
unnecessary line wrapping in small terminals.
- Adopted higher-level collect data structures:
- high-level DwarfEntry/DwarfInfo class
- high-level SymInfo class
- high-level LineInfo class
Note these had to be moved out of function scope due to pickling
issues in perf.py/perfbd.py. These were only function-local to
minimize scope leak so this fortunately was an easy change.
- Adopted better list-default patterns in Result types:
def __new__(..., children=None):
return Result(..., children if children is not None else [])
A classic python footgun.
- Adopted notes rendering, though this is only used by ctx.py at the
moment.
- Reverted to sorting children entries, for now.
Unfortunately there's no easy way to sort the result entries in
perf.py/perfbd.py before folding. Folding is going to make a mess
of more complicated children anyways, so another solution is
needed...
And some other shared miscellany.
It looks like the failure case in our scripts' subprocess stderr
handling was not tested well during a fix to stderr blocking (a735bcd).
This code was attempting to print stderr only if an error occured, but
with stderr=None this just results in a NoneType TypeError.
In retrospect, completely hiding stderr is kind of shitty if a
subprocess fails, but it doesn't seem possible to read from both stdin
and stderr with Python's APIs without getting stuck when the stderr's
buffer is full.
It might be possible to work around this with either multithreading,
select calls, or a temp file, but I'm not sure slightly less verbose
scripts are worth the added complexity in every single subprocess call.
For now just reverting to unconditionally forwarding stderr from the
child process. This is the simplest/most robust option.
This breaks the collect function down into collect_dwarf_files,
collect_dwarf_info, and collect_sizes. This makes the dwarf-info parser
a bit easier to share with structs.py, etc.
Sharing easily copy-pastable chunks of code in scripts like this has
allowed for better code reuse without intricately tying script
dependencies together. Being able to run each of these scripts
standalone is a goal.
The fact that our scripts' table renderer was slightly different for
recursive scripts (stack.py, perf.py) and non-recursive scripts
(code.py, structs.py) was a ticking time bomb, one innocent edit away
from breaking half the scripts.
The makes the table renderer consistent across all scripts, allowing for
easy copy-pasting when editing at the cost of some unused code in
scripts.
One hiccup with this though is the difference in cycle detection
behavior between scripts:
- stack.py:
lfsr_bd_sync
'-> lfsr_bd_prog
'-> lfsr_bd_sync <-- cycle!
- structs.py:
lfsr_bshrub_t
'-> u
'-> bsprout
'-> u <-- not a cycle!
To solve this the table renderer now accepts a simple detect_cycles
flag, which can be set per-script.
This makes the -p/--percent flag a bit more consistent with -d/--diff
and -c/--compare, both of which change the printing strategy based on
additional context.
This showcases the sort of high-level result printing where -c/--compare
is useful:
$ make summary-diff
code data stack structs
BEFORE 57057 0 3056 1476
AFTER 68864 (+20.7%) 0 (+0.0%) 3744 (+22.5%) 1520 (+3.0%)
There was one hiccup though: how to hide the name of the first field.
It may seem minor, but the missing field name really does help
readability when you're staring at a wall of CLI output.
It's a bit of a hack, but this can now be controlled with -Y/--summary,
which has the sole purpose of disabling the first field name if mixed
with -c/--compare.
-c/--compare is already a weird case for the summary row anyways...
Example:
$ ./scripts/csv.py lfs.code.csv \
-bfunction -fsize \
-clfsr_rbyd_appendrattr
function size
lfsr_rbyd_appendrattr 3598
lfsr_mdir_commit 5176 (+43.9%)
lfsr_btree_commit__.constprop.0 3955 (+9.9%)
lfsr_file_flush_ 2729 (-24.2%)
lfsr_file_carve 2503 (-30.4%)
lfsr_mountinited 2357 (-34.5%)
... snip ...
I don't think this is immediately useful for our code/stack/etc
measurement scripts, but it's certainly useful in csv.py for comparing
results at a high level.
And by useful I mean it replaces a 40-line long awk script that has
outgrown its original purpose...
This may make some mathematician mad, but these are informative scripts.
Returning +-inf is much more useful than erroring when dealing with
several hundred rows of results.
And hey, if it's good enough for IEEE 754, it's good enough for us :)
Also fixed a division operator mismatch in RFrac that was causing
problems.
Not sure if this is an old habit from Python 2, or just because it looks
nicer next to __mul__, __mod__, etc, but in Python 3 this should be
__truediv__ (or __floordiv__), not __div__.
I still think the 24 (23+1) char minimum is a good default for 2 column
output such as help text, especially if you don't have automatic width
detection. But our result scripts need to be a bit more flexible.
Consider:
$ make summary
code data stack structs
TOTAL 68864 0 3744 1520
Vs:
$ make summary
code data stack structs
TOTAL 68864 0 3744 1520
Up until now we were just kind of working around this with cut -c 25- in
our Makefile, but now that our result scripts automatically scale the
table widths, they should really just default to whatever is the most
useful.
- RInt/RFloat now accepts implicitly castable types (mainly
RInt(RFloat(x)) and RFloat(RInt(x))).
- RInt/RFloat/RFrac are now "truthy", implements __bool__.
- More operator support for RInt/RFloat/RFrac:
- __pos__ => +a
- __neg__ => -a
- __abs__ => abs(a)
- __div__ => a/b
- __mod__ => a%b
These work in Python, but are mainly used to implement expr eval in
csv.py.
This seems like a more fitting name now that this script has evolved
into more of a general purpose high-level CSV tool.
Unfortunately this does conflict with the standard csv module in Python,
breaking every script that imports csv (which is most of them).
Fortunately, Python is flexible enough to let us remove the current
directory before imports with a bit of an ugly hack:
# prevent local imports
__import__('sys').path.pop(0)
These scripts are intended to be standalone anyways, so this is probably
a good pattern to adopt.
This matches the style used in C, which is good for consistency:
a_really_long_function_name(
double_indent_after_first_newline(
single_indent_nested_newlines))
We were already doing this for multiline control-flow statements, simply
because I'm not sure how else you could indent this without making
things really confusing:
if a_really_long_function_name(
double_indent_after_first_newline(
single_indent_nested_newlines)):
do_the_thing()
This was the only real difference style-wise between the Python code and
C code, so now both should be following roughly the same style (80 cols,
double-indent multiline exprs, prefix multiline binary ops, etc).
Mainly to avoid conflicts with match results m, this frees up the single
letter variables m for other purposes.
Choosing a two letter alias was surprisingly difficult, but mt is nice
in that it somewhat matches it (for itertools) and ft (for functools).
This fixes an issue where mixing recursive renderers (-t/--hot or
-z/--depth) with defines (-Dfunction=lfsr_mount) would not account for
children entry widths. An unexpected side-effect of no longer filtering
the children entries.
We could continue to try to estimate the width without table rendering,
but it would basically need two full recursive pass at this point...
Instead, I've just moved the recursive stuff before table rendering,
which should remove any issues with width calculation while also
deduplicating the recursive passes.
It's invasive for a small change, but probably worthwhile long term.
The downside is this does mean our recursive scripts now build the full
table (including all recursive calls!) before they start printing. When
mixed with unbounded recursive depth (-z0 or --depth=0) this can get
quite large and cause quite a slow start.
But I guess that was the tradeoff in adopting this sort of intermediate
table rendering... At least it does make the code simpler and less bug
prone...
As a convenience, -d/--diff in our measurement scripts hides entries
that are unchanged by default.
Unfortunately this was broken during a recent refactor that ended up
filtering the line info but not the actual names.
Instead of reverting the broken part of the refactor, I've just moved the
filtering up to where we calculate the names. Hopefully this fixes the
bug while also simplifying this messy chunk of a logic a bit.
code.py, specifically, was getting messed up by inconsequential GCC
objdump errors on Clang -g3 generated binaries.
Now stderr from child processes is just redirected to /dev/null when
-v/--verbose is not provided.
If we actually depended on redirecting stderr->stdout these scripts
would have been broken when -v/--verbose was provided anyways. Not
really sure what the original code was trying to do...
The original idea was to allow merging a whole bunch of different csv
results into a single lfs.csv file, but this never really happened. It's
much easier to operate on smaller context-specific csv files, where the
field prefix:
- Doesn't really add much information
- Requires more typing
- Is confusing in how it doesn't match the table field names.
We can always use summary.py -fcode_size=size to add prefixes when
necessary anyways.
We already rely on this symbol in these scripts, so might use it to
display the mathematically correct ratio for new entries.
This has the added benefit of ordering new entries vs extremely big
changes correctly:
$ ./scripts/code.py -u test.after.csv -d test.before.csv
function (1 added, 0 removed) osize nsize dsize
test_a - 49 +49 (+∞%)
test_b 19 719 +700 (+3684.2%)
test_c 91 191 +100 (+109.9%)
TOTAL 110 959 +849 (+771.8%)
This is a bit more complicated, but make testmarks really showed how
confusing this could get.
Now, instead of:
suite passed time
test_alloc 304/304 1.6 (100.0%)
test_badblocks 6880/6880 1323.3 (100.0%)
... snip ...
test_rbyd 385878/385878 592.7 (100.0%)
test_relocations 7899/7899 318.8 (100.0%)
TOTAL 548206/548206 6229.7 (100.0%)
Percents/notes are interspersed next to their relevant fields:
suite passed time
test_alloc 304/304 (100.0%) 1.6
test_badblocks 6880/6880 (100.0%) 1323.3
... snip ...
test_rbyd 385878/385878 (100.0%) 592.7
test_relocations 7899/7899 (100.0%) 318.8
TOTAL 548206/548206 (100.0%) 6229.7
Note has no effect on scripts with only a single field (code.py, etc).
But it does make multi-field diffs a bit more readable:
$ ./scripts/stack.py -u after.stack.csv -d before.stack.csv -p
function frame limit
lfsr_bd_sync 8 (+100.0%) 216 (+100.0%)
lfsr_bd_flush 40 (+25.0%) 208 (+4.0%)
... snip ...
lfsr_file_flush 32 (+0.0%) 2424 (-0.3%)
lfsr_file_flush_ 216 (-3.6%) 2392 (-0.3%)
TOTAL 9008 (+0.4%) 2600 (-0.3%)
This matches how diff percentages are rendered, and simplifies the
internal table rendering by making Frac less of a special case. It also
allows for other type notes in the future.
One concern is how all the notes are shoved to the side, which may make
it a bit harder to find related percentages. If this becomes annoying we
should probably look into interspersing all notes (including diff
percentages) between the relevant columns.
Before:
function lines branches
lfsr_rbyd_appendattr 230/231 99.6% 172/192 89.6%
lfsr_rbyd_p_recolor 33/34 97.1% 11/12 91.7%
lfs_alloc 40/42 95.2% 21/24 87.5%
lfsr_rbyd_appendcompaction 54/57 94.7% 39/42 92.9%
...
After:
function lines branches
lfsr_rbyd_appendattr 230/231 172/192 (99.6%, 89.6%)
lfsr_rbyd_p_recolor 33/34 11/12 (97.1%, 91.7%)
lfs_alloc 40/42 21/24 (95.2%, 87.5%)
lfsr_rbyd_appendcompaction 54/57 39/42 (94.7%, 92.9%)
...
Previously, with -d/--diff, we would only show non-zero percentages. But
this was ambiguous/confusing when dealing with multiple results
(stack.py, summary.py, etc).
To help with this, I've switched to showing all percentages unless all
percentages are zero (no change). This matches the -d/--diff row-hiding
logic, so by default all rows should show all percentages.
Note -p/--percent did not change, as it already showed all percentages
all of the time.
Note there's a bit of subtlety here, field _types_ are still infered,
but the intention of the fields, i.e. if the field contains data vs
row name/other properties, must be unambiguous in the scripts.
There is still a _tiny_ bit of inference. For most scripts only one
of --by or --fields is strictly needed, since this makes the purpose of
the other fields unambiguous.
The reason for this change is so the scripts are a bit more reliable,
but also because this simplifies the data parsing/inference a bit.
Oh, and this also changes field inference to use the csv.DictReader's
fieldnames field instead of only inspecting the returned dicts. This
should also save a bit of O(n) overhead when parsing CSV files.
The whitespace sensitivity of field args was starting to be a problem,
mostly for advanced plotmpl.py usage (which tbf might be appropriately
described as "super hacky" in how it uses CLI parameters):
./scripts/plotmpl.py \
-Dcase=" \
bench_rbyd_attr_append, \
bench_rbyd_attr_remove, \
bench_rbyd_attr_fetch, \
..."
This may present problems when parsing CSV files with whitespace, in
theory, maybe. But given the scope of these scripts for littlefs...
just don't do that. Thanks.
With the quantity of data being output by bench.py now, filtering ASAP
while parsing CSV files is a valuable optimization. And thanks to how
CSV files are structured, we can even avoid ever loading the full
contents into RAM.
This does end up with use filtering for defines redundantly in a few
places, but this is well worth the saved overhead from early filtering.
Also tried to clean up the plot.py/plotmpl.py's data folding path,
though that may have been wasted effort.
The previous state machine would happily pick up random names if the
struct had no name of its own. This was picking up typedefs of random
structs and making things really confusing.
Now the rule is that unnamed structs are not printed. Unnamed structs
are usually implementation details so their size is not really useful.
Also made the parsing state machine for objdump outputs more resilient
to these sort of issues.
Also changed structs.py to also report unions if they have a name.
- Renamed struct_.py -> structs.py again.
- Removed lfs.csv, instead prefering script specific csv files.
- Added *-diff make rules for quick comparison against a previous
result, results are now implicitly written on each run.
For example, `make code` creates lfs.code.csv and prints the summary, which
can be followed by `make code-diff` to compare changes against the saved
lfs.code.csv without overwriting.
- Added nargs=? support for -s and -S, now uses a per-result _sort
attribute to decide sort if fields are unspecified.
- Fixed added/removed count in scripts when an entry has no field in
the expected results
- Fixed a python-sort-type issue when by-field is missing in a result
- Changed --(tool)-tool to --(tool)-path in scripts, this seems to be
a more common name for this sort of flag.
- Changed BUILDDIR to not have implicit slash, makes Makefile internals
a bit more readable.
- Fixed some outdated names hidden in less-often used ifdefs.
Based loosely on Linux's perf tool, perfbd.py uses trace output with
backtraces to aggregate and show the block device usage of all functions
in a program, propagating block devices operation cost up the backtrace
for each operation.
This combined with --trace-period and --trace-freq for
sampling/filtering trace events allow the bench-runner to very
efficiently record the general cost of block device operations with very
little overhead.
Adopted this as the default side-effect of make bench, replacing
cycle-based performance measurements which are less important for
littlefs.
This adds -P/--propagate and -Z/--depth to perf.py for showing recursive
results, making it easy to narrow down on where spikes in performance
come from.
This ended up being a bit different from stack.py's recursive results,
as we end up with different (diminishing) numbers as we descend.
This provides 2 things:
1. perf integration with the bench/test runners - This is a bit tricky
with perf as it doesn't have its own way to combine perf measurements
across multiple processes. perf.py works around this by writing
everything to a zip file, using flock to synchronize. As a plus, free
compression!
2. Parsing and presentation of perf results in a format consistent with
the other CSV-based tools. This actually ran into a surprising number of
issues:
- We need to process raw events to get the information we want, this
ends up being a lot of data (~16MiB at 100Hz uncompressed), so we
paralellize the parsing of each decompressed perf file.
- perf reports raw addresses post-ASLR. It does provide sym+off which
is very useful, but to find the source of static functions we need to
reverse the ASLR by finding the delta the produces the best
symbol<->addr matches.
- This isn't related to perf, but decoding dwarf line-numbers is
really complicated. You basically need to write a tiny VM.
This also turns on perf measurement by default for the bench-runner, but at a
low frequency (100 Hz). This can be decreased or removed in the future
if it causes any slowdown.
The main change is requiring field names for -b/-f/-s/-S, this
is a bit more powerful, and supports hidden extra fields, but
can require a bit more typing in some cases.
- Added the littlefs license note to the scripts.
- Adopted parse_intermixed_args everywhere for more consistent arg
handling.
- Removed argparse's implicit help text formatting as it does not
work with perse_intermixed_args and breaks sometimes.
- Used string concatenation for argparse everywhere, uses backslashed
line continuations only works with argparse because it strips
redundant whitespace.
- Consistent argparse formatting.
- Consistent openio mode handling.
- Consistent color argument handling.
- Adopted functools.lru_cache in tracebd.py.
- Moved unicode printing behind --subscripts in traceby.py, making all
scripts ascii by default.
- Renamed pretty_asserts.py -> prettyasserts.py.
- Renamed struct.py -> struct_.py, the original name conflicts with
Python's built in struct module in horrible ways.
With more scripts generating CSV files this moves most CSV manipulation
into summary.py, which can now handle more or less any arbitrary CSV
file with arbitrary names and fields.
This also includes a bunch of additional, probably unnecessary, tweaks:
- summary.py/coverage.py use a custom fractional type for encoding
fractions, this will also be used for test counts.
- Added a smaller diff output for size scripts with the --percent flag.
- Added line and hit info to coverage.py's CSV files.
- Added --tree flag to stack.py to show only the call tree without
other noise.
- Renamed structs.py to struct.py.
- Changed a few flags around for consistency between size/summary scripts.
- Added `make sizes` alias.
- Added `make lfs.code.csv` rules
These scripts can't easily share the common logic, but separating
field details from the print/merge/csv logic should make the common
part of these scripts much easier to create/modify going forward.
This also tweaked the behavior of summary.py slightly.
A small mistake in test.py's control flow meant the failing test job
would succesfully kill all other test jobs, but then humorously start
up a new process to continue testing.
Using errors=replace in python utf-8 decoding makes these scripts more
resilient to underlying errors, rather than just throwing an unhelpfully
generic decode error.
A full summary of static measurements (code size, stack usage, etc) can now
be found with:
make summary
This is done through the combination of a new ./scripts/summary.py
script and the ability of existing scripts to merge into existing csv
files, allowing multiple results to be merged either in a pipeline, or
in parallel with a single ./script/summary.py call.
The ./scripts/summary.py script can also be used to quickly compare
different builds or configurations. This is a proper implementation
of a similar but hacky shell script that has already been very useful
for making optimization decisions:
$ ./scripts/structs.py new.csv -d old.csv --summary
name (2 added, 0 removed) code stack structs
TOTAL 28648 (-2.7%) 2448 1012
Also some other small tweaks to scripts:
- Removed state saving diff rules. This isn't the most useful way to
handle comparing changes.
- Added short flags for --summary (-Y) and --files (-F), since these
are quite often used.
- Added -L/--depth argument to show dependencies for scripts/stack.py,
this replaces calls.py
- Additional internal restructuring to avoid repeated code
- Removed incorrect diff percentage when there is no actual size
- Consistent percentage rendering in test.py