This makes it easier to read the output, at a cost of these scripts not
terminating if the underlying call sctucture contains loops.
Previously these scripts would not terminate, but at least output the
call tree as they visit each function. This was hard to read, and wasn't
really that useful? If you hit a case with infinite recursion, you can
limit the output size explicitly with -Z.
Note this also drops --tree in stack.py. Since we get more readable
output, this flag is less useful. This simplifies the script a bit.
- Renamed struct_.py -> structs.py again.
- Removed lfs.csv, instead prefering script specific csv files.
- Added *-diff make rules for quick comparison against a previous
result, results are now implicitly written on each run.
For example, `make code` creates lfs.code.csv and prints the summary, which
can be followed by `make code-diff` to compare changes against the saved
lfs.code.csv without overwriting.
- Added nargs=? support for -s and -S, now uses a per-result _sort
attribute to decide sort if fields are unspecified.
- Fixed added/removed count in scripts when an entry has no field in
the expected results
- Fixed a python-sort-type issue when by-field is missing in a result
- Changed --(tool)-tool to --(tool)-path in scripts, this seems to be
a more common name for this sort of flag.
- Changed BUILDDIR to not have implicit slash, makes Makefile internals
a bit more readable.
- Fixed some outdated names hidden in less-often used ifdefs.
Based loosely on Linux's perf tool, perfbd.py uses trace output with
backtraces to aggregate and show the block device usage of all functions
in a program, propagating block devices operation cost up the backtrace
for each operation.
This combined with --trace-period and --trace-freq for
sampling/filtering trace events allow the bench-runner to very
efficiently record the general cost of block device operations with very
little overhead.
Adopted this as the default side-effect of make bench, replacing
cycle-based performance measurements which are less important for
littlefs.
This adds -P/--propagate and -Z/--depth to perf.py for showing recursive
results, making it easy to narrow down on where spikes in performance
come from.
This ended up being a bit different from stack.py's recursive results,
as we end up with different (diminishing) numbers as we descend.
This provides 2 things:
1. perf integration with the bench/test runners - This is a bit tricky
with perf as it doesn't have its own way to combine perf measurements
across multiple processes. perf.py works around this by writing
everything to a zip file, using flock to synchronize. As a plus, free
compression!
2. Parsing and presentation of perf results in a format consistent with
the other CSV-based tools. This actually ran into a surprising number of
issues:
- We need to process raw events to get the information we want, this
ends up being a lot of data (~16MiB at 100Hz uncompressed), so we
paralellize the parsing of each decompressed perf file.
- perf reports raw addresses post-ASLR. It does provide sym+off which
is very useful, but to find the source of static functions we need to
reverse the ASLR by finding the delta the produces the best
symbol<->addr matches.
- This isn't related to perf, but decoding dwarf line-numbers is
really complicated. You basically need to write a tiny VM.
This also turns on perf measurement by default for the bench-runner, but at a
low frequency (100 Hz). This can be decreased or removed in the future
if it causes any slowdown.