Commit Graph

66 Commits

Author SHA1 Message Date
Christopher Haster
3e03c2ee7f scripts: Adopted better input file handling in result scripts
- Error on no/insufficient files.

  Instead of just returning no results. This is more useful when
  debugging complicated bash scripts.

- Use elf magic to allow any file order in perfbd.py/stack.py.

  This was already implemented in stack.py, now also adopted in
  perfbd.py.

  Elf files always start with the magic string "\x7fELF", so we can use
  this to figure out the types of input files without needing to rely on
  argument order.

  This is just one less thing to worry about when invoking these
  scripts.
2024-12-16 19:13:22 -06:00
Christopher Haster
4325a06277 scripts: Fixed incorrect files on recursive results
It's been a while since I've been hurt by Python's late-binding
variables. In this case the scope-creep of the "file" variable hid that
we didn't actually know which recursive result belonged to which file.
Instead we were just assigning whatever the most recent top-level result
was.

This is fixed by looking up the correct file in childrenof. Though this
unfortunately does add quite a bit of noise.
2024-12-16 19:12:46 -06:00
Christopher Haster
c8c12ffae8 scripts: Reverted stack.py to use -fcallgraph-info=su again
See previous commit for the issues with stack.py's current approach. I'm
convinced dwarf-info simply does not contain enough info to figure out
stack usage.

There is one last idea, which is to parse the dissassembly. In theory
you only need to understand calls, branches (for control-flow), and
push/pop instructions to figure out the worst-case stack usage. But this
would be ISA-specific and error-prone, so it probably shouldn't
_replace_ the -fcallgraph-info=su based stack.py.

So, out of ideas, reverting.

---

It's worth noting this isn't a trivial revert. There's a couple
interesting changes in stack.py:

- We now use .o files to map callgraph nodes to relevant symbol names.

  This should be a bit more robust than relying only on the names in the
  .ci files, and guarantees function names line up with other
  symbol-based scripts (code.py, ctx.py, etc).

  This also lets us warn on missing callgraph nodes, in case the
  callgraph info is incomplete.

- Callgraph parsing should be quite a bit more robust now. Added a small
  (and reusable?) Parser class.

- Moved cycle detection into result collection.

  This should let us drop cycle detection from the table renderer
  eventually.
2024-12-16 18:10:23 -06:00
Christopher Haster
0e658b8246 scripts: Attempted reimp of stack.py using dwarf variable tags
Problem: I misunderstood the purpose of .debug_frames (objdump
--dwarf=frames).

The purpose of .debug_frames is not to record the size of function
stack frames, but to only tell a debugger how to access the previous
function's stack frame. It just so happens that this _coincidentally_
tells you the stack frame size when compiling with -fomit-frame-pointer.

With -fno-omit-frame-pointer (common on some archs), .debug_frames just
says "hey here's the frame pointer" (DW_CFA_def_cfa_register), which
tells us nothing about the function's actual stack usage.

So unfortunately .debug_frames does not provide enough info on its
own...

---

This commit was an attempt to find the actual stack usage by looking at
the relevant variable info (DW_TAG_variable, etc) in function's dwarf
info, but this approach is also not looking very good...

1. The numbers do not appear correct:

     before -fcallgraph-info=su:         2720
     after --dwarf=info:                 3558
     after --dwarf=info --no-shrinkwrap: 3922

     (this is with -fno-omit-frame-pointer)

   In hindsight, this approach is fundamentally flawed. While the
   variable tags does give us a lower bound on stack usage, it doesn't
   tell us about implicit compiler variables and various stack push/pops
   as a part of expression evaluation.

   As far as I can tell there's simply not enough info in dwarf info to
   find an accurate upper bound on stack usage.

2. This approach is quite a bit more complicated, since we need:

   1. Dwarf info (--dwarf=info) to find variable tags.
   2. Location info (--dwarf=loc) to map var allocations to address
      ranges.
   3. Range info (--dwarf=Ranges) to map lexical blocks to address
      ranges when var allocation is implicit (not implemented).
   4. And we still need frame info (--dwarf=frames)! since var
      allocations are frame-relative.

3. Also dwarf info is not guaranteed to contain the whole callgraph.

   It seems callgraph info is actually _omitted_ with -O0??

   I guess this is because the callgraph info is a side-effect of some
   compiler pass? This seems a bit backwards.

   Dwarf does have a flag (DW_AT_call_all_calls) to indicate when
   callgraph info is complete, but it doesn't seem to be set reliably?

   Even with optimizations, lfsr_bd_sync, _and only lfsr_bd_sync_, is
   missing the DW_AT_call_all_calls flag. I have no idea why. The flag
   is still present in lfsr_bd_erase, lfsr_bd_read, and other functions
   with function pointers...

So I think this will probably be reverted.
2024-12-16 18:01:47 -06:00
Christopher Haster
14e7501e5c scripts: Minor stack.py fixes
- Always interpret DW_AT_low_pc/high_pc/call_return_pc as hex.

  Clang populates these with hex digits without a 0x prefix.

- Don't ignore callees with no name.

  Now that DW_AT_abstract_origin is fixed, a callee with no name should
  be an error.
2024-12-16 18:01:46 -06:00
Christopher Haster
ac79c88c6f scripts: Improved cycle detection notes in scripts
- Prevented childrenof memoization from hiding the source of a
  detected cycle.

- Deduplicated multiple cycle detected notes.

- Fixed note rendering when last column does not have a notes list.
  Currently this only happens when entry is None (no results).
2024-12-16 18:01:46 -06:00
Christopher Haster
56d888933f scripts: Reworked stack.py to use dwarf, dropped -fcallgraph-info=su
There were a lot of small challenges (see previous commits), but this
commit reworks stack.py to rely only on dwarf-info and symbols to build
stack + callgraph info.

Not only does this remove an annoying dependency on a GCC-specific flag,
but it also should give us more correct stack measurements by only
penalizing calls for the stack usage at the call site. This should
better account for things like shrinkwrapping, which make the
-fcallgraph-info=su results look worse than they actually are.

To make this work required jumping through a couple hoops:

1. Map symbols -> dwarf entries by address (DW_AT_low_pc).

   We use symbols here to make sure function names line up with other
   scripts.

   Note that there can be multiple dwarf entries with the same name due
   to optimization passes. Apparently the optimized name is not included
   because that would be too useful.

2. Find each functions' frame info.

   This is stored in the .debug_frames section (objdump --dwarf=frames),
   and requires _yet another state machine_ to parse, but gives us the
   stack frame info for each function at the instruction level, so
   that's nice.

3. Find call sites (DW_TAG_call_site).

   The hierchical nesting of DW_TAG_lexical_blocks gets a bit annoying
   here, but ultimately we can find all DW_TAG_call_sites by looking at
   the DW_TAG_subprogram's children tags.

4. Map call sites to frame info.

   This gets funky.

   Finding the target function is simple enough, DW_AT_call_origin
   contains its dwarf offset (but why is this the _origin_?). But we
   don't actually know what address the call originated from.

   Fortunately we do know the return address, DW_AT_call_return_pc?

   The instruction before DW_AT_call_return_pc should be the call
   instruction. Subtracting 1 will awkwardly put us in the middle of the
   instruction, but it should at least map to the correct stack frame?
   And without ISA-specific info it's the best we can do.

It's messy, but this should be all the info we need.

---

To build confidence in the new script, I included the --no-shrinkwrap
flag, which reverts to penalizing each call site for the function's
worst-case stack frame. This makes it easy to compare against the
-fcallgraph-info=su approach:

  with -fcallgraph-info=su:          2624
  with --dwarf=info --no-shrinkwrap: 2624

I was hoping that accounting for shrinkwrap-like optimizations would
reveal a lower stack cost, but for better or worse it seems that
worst-case stack usage is unchanged:

  with --dwarf=info --no-shrinkwrap: 2624
  with --dwarf=info:                 2624

Still, it's good to know that our stack measurement is correct.
2024-12-16 18:01:46 -06:00
Christopher Haster
faf4d09c34 scripts: Added __repr__ to RInt and friends
Just a minor quality of life feature to help debugging these scripts.
2024-12-16 18:01:46 -06:00
Christopher Haster
8526cd9cf1 scripts: Prevented i/children/notes result field collisions
Without this, naming a column i/children/notes in csv.py could cause
things to break. Unlikely for children/notes, but very likely for i,
especially when benchmarking.

Unfortunately namedtuple makes this tricky. I _want_ to just rename
these to _i/_children/_notes and call the problem solved, but namedtuple
reserves all underscore-prefixed fields for its own use.

As a workaround, the table renderer now looks for _i/_children/_notes at
the _class_ level, as an optional name of which namedtuple field to use.
This way Result types can stay lightweight namedtuples while including
extra table rendering info without risk of conflicts.

This also makes the HotResult type a bit more funky, but that's not a
big deal.
2024-12-15 16:36:14 -06:00
Christopher Haster
183ede1b83 scripts: Option for result scripts to force children ordering
This extends the recursive part of the table renderer to sort children
by the optional "i" field, if available.

Note this only affects children entries. The top-level entries are
strictly ordered by the relevant "by" fields. I just haven't seen a use
case for this yet, and not sorting "i" at the top-level reduces that
number of things that can go wrong for scripts without children.

---

This also rewrites -t/--hot to take advantage of children ordering by
injecting a totally-no-hacky HotResult subclass.

Now -t/--hot should be strictly ordered by the call depth! Though note
entries that share "by" fields are still merged...

This also gives us a way to introduce the "cycle detected" note and
respect -z/--depth, so overall a big improvement for -t/--hot.
2024-12-15 16:35:52 -06:00
Christopher Haster
e6ed785a27 scripts: Removed padding from tail notes in tables
We don't really need padding for the notes on the last column of tables,
which is where row-level notes end up.

This may seem minor, but not padding here avoids quite a bit of
unnecessary line wrapping in small terminals.
2024-12-15 16:35:29 -06:00
Christopher Haster
512cf5ad4b scripts: Adopted ctx.py-related changes in other result scripts
- Adopted higher-level collect data structures:

  - high-level DwarfEntry/DwarfInfo class
  - high-level SymInfo class
  - high-level LineInfo class

  Note these had to be moved out of function scope due to pickling
  issues in perf.py/perfbd.py. These were only function-local to
  minimize scope leak so this fortunately was an easy change.

- Adopted better list-default patterns in Result types:

    def __new__(..., children=None):
        return Result(..., children if children is not None else [])

  A classic python footgun.

- Adopted notes rendering, though this is only used by ctx.py at the
  moment.

- Reverted to sorting children entries, for now.

  Unfortunately there's no easy way to sort the result entries in
  perf.py/perfbd.py before folding. Folding is going to make a mess
  of more complicated children anyways, so another solution is
  needed...

And some other shared miscellany.
2024-12-15 15:41:11 -06:00
Christopher Haster
e00db216c1 scripts: Consistent table renderer, cycle detection optional
The fact that our scripts' table renderer was slightly different for
recursive scripts (stack.py, perf.py) and non-recursive scripts
(code.py, structs.py) was a ticking time bomb, one innocent edit away
from breaking half the scripts.

The makes the table renderer consistent across all scripts, allowing for
easy copy-pasting when editing at the cost of some unused code in
scripts.

One hiccup with this though is the difference in cycle detection
behavior between scripts:

- stack.py:

    lfsr_bd_sync
    '-> lfsr_bd_prog
        '-> lfsr_bd_sync  <-- cycle!

- structs.py:

    lfsr_bshrub_t
    '-> u
        '-> bsprout
            '-> u  <-- not a cycle!

To solve this the table renderer now accepts a simple detect_cycles
flag, which can be set per-script.
2024-12-14 12:25:15 -06:00
Christopher Haster
ef3accc07c scripts: Tweaked -p/--percent to accept the csv file for diffing
This makes the -p/--percent flag a bit more consistent with -d/--diff
and -c/--compare, both of which change the printing strategy based on
additional context.
2024-11-16 18:01:27 -06:00
Christopher Haster
9a2b561a76 scripts: Adopted -c/--compare in make summary-diff
This showcases the sort of high-level result printing where -c/--compare
is useful:

  $ make summary-diff
              code             data           stack          structs
  BEFORE     57057                0            3056             1476
  AFTER      68864 (+20.7%)       0 (+0.0%)    3744 (+22.5%)    1520 (+3.0%)

There was one hiccup though: how to hide the name of the first field.

It may seem minor, but the missing field name really does help
readability when you're staring at a wall of CLI output.

It's a bit of a hack, but this can now be controlled with -Y/--summary,
which has the sole purpose of disabling the first field name if mixed
with -c/--compare.

-c/--compare is already a weird case for the summary row anyways...
2024-11-16 18:01:15 -06:00
Christopher Haster
29eff6f3e8 scripts: Added -c/--compare for comparing specific result rows
Example:

  $ ./scripts/csv.py lfs.code.csv \
          -bfunction -fsize \
          -clfsr_rbyd_appendrattr
  function                                size
  lfsr_rbyd_appendrattr                   3598
  lfsr_mdir_commit                        5176 (+43.9%)
  lfsr_btree_commit__.constprop.0         3955 (+9.9%)
  lfsr_file_flush_                        2729 (-24.2%)
  lfsr_file_carve                         2503 (-30.4%)
  lfsr_mountinited                        2357 (-34.5%)
  ... snip ...

I don't think this is immediately useful for our code/stack/etc
measurement scripts, but it's certainly useful in csv.py for comparing
results at a high level.

And by useful I mean it replaces a 40-line long awk script that has
outgrown its original purpose...
2024-11-16 17:59:22 -06:00
Christopher Haster
2fa968dd3f scripts: csv.py: Fixed divide-by-zero, return +-inf
This may make some mathematician mad, but these are informative scripts.
Returning +-inf is much more useful than erroring when dealing with
several hundred rows of results.

And hey, if it's good enough for IEEE 754, it's good enough for us :)

Also fixed a division operator mismatch in RFrac that was causing
problems.
2024-11-16 16:47:48 -06:00
Christopher Haster
5dc9eabbf7 scripts: csv.py: Fixed use of __div__ vs __truediv__
Not sure if this is an old habit from Python 2, or just because it looks
nicer next to __mul__, __mod__, etc, but in Python 3 this should be
__truediv__ (or __floordiv__), not __div__.
2024-11-16 16:38:36 -06:00
Christopher Haster
0ac326d9cb scripts: Reduced table name widths to 8 chars minimum
I still think the 24 (23+1) char minimum is a good default for 2 column
output such as help text, especially if you don't have automatic width
detection. But our result scripts need to be a bit more flexible.

Consider:

  $ make summary
                              code     data    stack  structs
  TOTAL                      68864        0     3744     1520

Vs:

  $ make summary
              code     data    stack  structs
  TOTAL      68864        0     3744     1520

Up until now we were just kind of working around this with cut -c 25- in
our Makefile, but now that our result scripts automatically scale the
table widths, they should really just default to whatever is the most
useful.
2024-11-16 13:39:42 -06:00
Christopher Haster
434479f101 scripts: Adopted csv.py-related result-type tweaks in all scripts
- RInt/RFloat now accepts implicitly castable types (mainly
  RInt(RFloat(x)) and RFloat(RInt(x))).

- RInt/RFloat/RFrac are now "truthy", implements __bool__.

- More operator support for RInt/RFloat/RFrac:

  - __pos__ => +a
  - __neg__ => -a
  - __abs__ => abs(a)
  - __div__ => a/b
  - __mod__ => a%b

  These work in Python, but are mainly used to implement expr eval in
  csv.py.
2024-11-16 13:37:15 -06:00
Christopher Haster
7cfcc1af1d scripts: Renamed summary.py -> csv.py
This seems like a more fitting name now that this script has evolved
into more of a general purpose high-level CSV tool.

Unfortunately this does conflict with the standard csv module in Python,
breaking every script that imports csv (which is most of them).
Fortunately, Python is flexible enough to let us remove the current
directory before imports with a bit of an ugly hack:

  # prevent local imports
  __import__('sys').path.pop(0)

These scripts are intended to be standalone anyways, so this is probably
a good pattern to adopt.
2024-11-09 12:31:16 -06:00
Christopher Haster
007ac97bec scripts: Adopted double-indent on multiline expressions
This matches the style used in C, which is good for consistency:

  a_really_long_function_name(
          double_indent_after_first_newline(
              single_indent_nested_newlines))

We were already doing this for multiline control-flow statements, simply
because I'm not sure how else you could indent this without making
things really confusing:

  if a_really_long_function_name(
          double_indent_after_first_newline(
              single_indent_nested_newlines)):
      do_the_thing()

This was the only real difference style-wise between the Python code and
C code, so now both should be following roughly the same style (80 cols,
double-indent multiline exprs, prefix multiline binary ops, etc).
2024-11-06 15:31:17 -06:00
Christopher Haster
48c2e7784b scripts: Renamed import math alias m -> mt
Mainly to avoid conflicts with match results m, this frees up the single
letter variables m for other purposes.

Choosing a two letter alias was surprisingly difficult, but mt is nice
in that it somewhat matches it (for itertools) and ft (for functools).
2024-11-05 01:58:40 -06:00
Christopher Haster
96ddc72481 scripts: Moved hot path calculation before recursive rendering
So now the hot path participates in sorting, folding, etc:

  $ ./scripts/stack.py ./lfs.ci ./lfs_util.ci \
      -Dfunction=lfsr_mount -t -sframe
  function                               frame    limit
  lfsr_mount                                96     2736
  |-> lfsr_mdir_commit                     512     2368
  |-> lfsr_btree_commit__.constprop        336     1648
  |-> lfs_alloc                            272     1296
  |-> lfsr_btree_commit                    208     1856
  |-> lfsr_btree_lookupnext_               208      720
  |-> lfsr_mtree_gc                        192     2560
  |-> lfsr_mtree_traverse                  176     1024
  |-> lfsr_rbyd_lookupnext                 160      448
  |-> lfsr_bd_readtag.constprop            128      288
  |-> lfsr_mtree_lookup                    128      848
  |-> lfsr_bd_read                          80      160
  |-> lfsr_bd_read__                        80       80
  |-> lfsr_fs_gc                            80     2640
  |-> lfsr_rbyd_sublookup                   64      512
  '-> lfsr_rbyd_alloc                       16     1312
  TOTAL                                     96     2736

This risks some rather unintuitive behavior now that the hot path
rendering no longer matches the call stack, but in theory the extra
sorting features are more useful?

This is a bit of an experiment, if this is more confusing than useful,
we can always revert to the strict call-order ordering.

Note that you can _usually_ get the call-order ordering by sorting by
limit, but this trick breaks if any call frames are zero sized...
2024-11-05 01:23:01 -06:00
Christopher Haster
ade563cc24 scripts: Removed outdated non-terminating warning from scripts
All of these scripts have cycle detectors now, so this warning should
not longer be valid.
2024-11-04 18:26:22 -06:00
Christopher Haster
c0a9af1e9a scripts: Moved recursive entry generation before table rendering
This fixes an issue where mixing recursive renderers (-t/--hot or
-z/--depth) with defines (-Dfunction=lfsr_mount) would not account for
children entry widths. An unexpected side-effect of no longer filtering
the children entries.

We could continue to try to estimate the width without table rendering,
but it would basically need two full recursive pass at this point...
Instead, I've just moved the recursive stuff before table rendering,
which should remove any issues with width calculation while also
deduplicating the recursive passes.

It's invasive for a small change, but probably worthwhile long term.

The downside is this does mean our recursive scripts now build the full
table (including all recursive calls!) before they start printing. When
mixed with unbounded recursive depth (-z0 or --depth=0) this can get
quite large and cause quite a slow start.

But I guess that was the tradeoff in adopting this sort of intermediate
table rendering... At least it does make the code simpler and less bug
prone...
2024-11-04 18:18:58 -06:00
Christopher Haster
0c3868f92c scripts: Fully connected graph in stack.py, no more recursive folding
This makes -D/--define more useful in stack.py/perf.py/perfbd.py by no
longer hiding undfined children entries.

For example:

  $ ./scripts/stack.py lfs.ci lfs_util.ci -Dfunction=lfsr_mount -t
  function                       frame    limit
  lfsr_mount                        96     2816
  |-> lfsr_fs_gc                    80     2720
  |-> lfsr_mtree_gc                176     2640
  |-> lfsr_mdir_commit             576     2464
  ... snip ...

Now shows all functions in the hot path of lfsr_mount, where before it
would only show functions in the hot path of lfsr_mount that were also
_named_ lfsr_mount.

The previous behavior was technically not wrong... but not very useful
(and confusing).

---

This was actually quite a bit annoying to get working because of the
possibility of function call cycles.

I ended up turning stack.py's result type into a fully connected graph,
which only works because Python has a cycle detector. (Actually this
script is so short-lived we probably wouldn't care if this leaked
memory.)

A nice side effect of this is now all the recursive scripts (stack.py,
perf.py, and perfbd.py) share the same internal result representation
and recursive printing logic, which is probably a good thing.
2024-11-04 18:12:57 -06:00
Christopher Haster
711cebfcf3 scripts: Simplified memoization of stack.py's limit calculation
While a decorator does a good job of separating concerns here, it's a
bit overkill for a single function.
2024-11-04 18:09:33 -06:00
Christopher Haster
d324333903 scripts: Fixed names/lines falling out of sync in diff table renderers
As a convenience, -d/--diff in our measurement scripts hides entries
that are unchanged by default.

Unfortunately this was broken during a recent refactor that ended up
filtering the line info but not the actual names.

Instead of reverting the broken part of the refactor, I've just moved the
filtering up to where we calculate the names. Hopefully this fixes the
bug while also simplifying this messy chunk of a logic a bit.
2024-11-04 18:04:58 -06:00
Christopher Haster
e32af5cd8a scripts: Added -t/--hot to recursive scripts, stack.py, etc
This is mainly useful for stack.py, where -t/--hot lets you quickly see
everything that contributes to the stack limit for each function.

This was (and still is) possible with -s + -z, but it was pretty
annoying to use:

- The stack trace rendered _diagonally_ as a consequence of -z, which is
  probably the worst use of screen real estate.

- This trick only really worked with -s, which was the opposite order of
  what you usually want on the command line: -S.

Adding a special for-purpose -t/--hot flag makes looking at the hot path
much easier, at the cost of more hacky python code (and I _mean_ hacky,
making the hot path selection useful while following exising sort rules
was annoyingly complicated).

Also added -t/--hot to perf.py and perfbd.py for consistency, though it
makes a bit less sense there.

Also also reworked related code in all three scripts: stack.py, perf.py,
perfbd.py. The logic should be a bit more equivalent, and
perf.py/perfbd.py detect cycles now.
2024-11-04 18:03:59 -06:00
Christopher Haster
904c2eddd7 scripts: Memoized stack.py's limit calculation
This is a pretty classic case for memoization. We don't really need to
recalculate every stack limit at every call site.

Cuts the runtime in half:

  before: 0.335s
  after:  0.139s (-58.5%)

---

Unfortunately functools.cache was not fit for purpose. It's stuck using
all parameters as the key, which breaks on the "seen" parameter we use
for cycle detection that otherwise has no impact on results.

Fortunately decorators aren't too difficult in Python, so I just rolled
my own (cache1).
2024-11-04 17:54:42 -06:00
Christopher Haster
fc45af3f6e scripts: Added better cycles detection to stack.py
stack.py actually already had a simple cycle detector, since we needed
one to calculate stack limits without getting stuck.

Copying this simple cycle detector into the actual table rendering code
lets us print a nice little "cycle detected" message, instead of just
vomiting to stdout forever:

    $ ./scripts/stack.py lfs.ci lfs_util.ci -z -s
    function                       frame    limit
    lfsr_format                      320        ∞
    |-> lfsr_mountinited             304        ∞
    |   |-> lfsr_mountmroot           80        ∞
    |   |   |-> lfsr_mountmroot       80        ∞ (cycle detected)
    |   |   |-> lfsr_mdir_lookup      48      576
    ... snip ...

The cycle detector is a bit naive, just building a new set each step,
but it gets the job done.

As for perf.py and perfbd.py, it turns out they can't actually create
cycles, so no need for a cycle detector. This is good because I didn't
really want to test these scripts again :)
2024-11-04 17:54:11 -06:00
Christopher Haster
54d77da2f5 Dropped csv field prefixes in scripts
The original idea was to allow merging a whole bunch of different csv
results into a single lfs.csv file, but this never really happened. It's
much easier to operate on smaller context-specific csv files, where the
field prefix:

- Doesn't really add much information
- Requires more typing
- Is confusing in how it doesn't match the table field names.

We can always use summary.py -fcode_size=size to add prefixes when
necessary anyways.
2024-06-02 19:19:46 -05:00
Christopher Haster
169952dec0 Tweaked scripts to render new entry ratios as +∞%
We already rely on this symbol in these scripts, so might use it to
display the mathematically correct ratio for new entries.

This has the added benefit of ordering new entries vs extremely big
changes correctly:

  $ ./scripts/code.py -u test.after.csv -d test.before.csv
  function (1 added, 0 removed)      osize    nsize    dsize
  test_a                                 -       49      +49 (+∞%)
  test_b                                19      719     +700 (+3684.2%)
  test_c                                91      191     +100 (+109.9%)
  TOTAL                                110      959     +849 (+771.8%)
2024-06-02 19:19:46 -05:00
Christopher Haster
06bfed7a8b Interspersed precent/notes in measurement scripts
This is a bit more complicated, but make testmarks really showed how
confusing this could get.

Now, instead of:

  suite                             passed    time
  test_alloc                       304/304     1.6 (100.0%)
  test_badblocks                 6880/6880  1323.3 (100.0%)
  ... snip ...
  test_rbyd                  385878/385878   592.7 (100.0%)
  test_relocations               7899/7899   318.8 (100.0%)
  TOTAL                      548206/548206  6229.7 (100.0%)

Percents/notes are interspersed next to their relevant fields:

  suite                             passed             time
  test_alloc                       304/304 (100.0%)     1.6
  test_badblocks                 6880/6880 (100.0%)  1323.3
  ... snip ...
  test_rbyd                  385878/385878 (100.0%)   592.7
  test_relocations               7899/7899 (100.0%)   318.8
  TOTAL                      548206/548206 (100.0%)  6229.7

Note has no effect on scripts with only a single field (code.py, etc).

But it does make multi-field diffs a bit more readable:

  $ ./scripts/stack.py -u after.stack.csv -d before.stack.csv -p
  function                       frame             limit
  lfsr_bd_sync                       8 (+100.0%)     216 (+100.0%)
  lfsr_bd_flush                     40 (+25.0%)      208 (+4.0%)
  ... snip ...
  lfsr_file_flush                   32 (+0.0%)      2424 (-0.3%)
  lfsr_file_flush_                 216 (-3.6%)      2392 (-0.3%)
  TOTAL                           9008 (+0.4%)      2600 (-0.3%)
2024-06-02 19:19:38 -05:00
Christopher Haster
a9f6b6e903 Renamed internal script result types * -> R*
So Int -> RInt, Frac -> RFrac, etc. This just helps distinguish these
types from builtin types, which could be confusing.
2024-05-18 13:00:15 -05:00
Christopher Haster
03ea2e6ac5 Tweaked cov.py, summary.py, to render fraction percents as notes
This matches how diff percentages are rendered, and simplifies the
internal table rendering by making Frac less of a special case. It also
allows for other type notes in the future.

One concern is how all the notes are shoved to the side, which may make
it a bit harder to find related percentages. If this becomes annoying we
should probably look into interspersing all notes (including diff
percentages) between the relevant columns.

Before:

  function                                   lines            branches
  lfsr_rbyd_appendattr             230/231   99.6%     172/192   89.6%
  lfsr_rbyd_p_recolor                33/34   97.1%       11/12   91.7%
  lfs_alloc                          40/42   95.2%       21/24   87.5%
  lfsr_rbyd_appendcompaction         54/57   94.7%       39/42   92.9%
  ...

After:

  function                           lines    branches
  lfsr_rbyd_appendattr             230/231     172/192 (99.6%, 89.6%)
  lfsr_rbyd_p_recolor                33/34       11/12 (97.1%, 91.7%)
  lfs_alloc                          40/42       21/24 (95.2%, 87.5%)
  lfsr_rbyd_appendcompaction         54/57       39/42 (94.7%, 92.9%)
  ...
2024-05-18 13:00:15 -05:00
Christopher Haster
1d88fa9864 In scripts -d/--diff, show either all percentages or none
Previously, with -d/--diff, we would only show non-zero percentages. But
this was ambiguous/confusing when dealing with multiple results
(stack.py, summary.py, etc).

To help with this, I've switched to showing all percentages unless all
percentages are zero (no change). This matches the -d/--diff row-hiding
logic, so by default all rows should show all percentages.

Note -p/--percent did not change, as it already showed all percentages
all of the time.
2024-05-18 13:00:15 -05:00
Christopher Haster
5128522fe2 Renamed script flag -Z/--depth -> -z/--depth
Previously, the intention of upper case -Z was the match -W/--width and
-H/--height, which are uppercase to avoid conflicts with -h/--help.

But -z/--depth isn't _really_ related to -W/-H.

This avoids a conflict with -Z/--lebesgue, but may conflict with
-z/--cat. Fortunately we don't currently have any conflicts with the
latter. Since -z/--depth and -Z/--lebesgue are both disk-layout related,
the risk of conflicts are probably much higher there.
2024-02-14 14:04:45 -06:00
Christopher Haster
1e4d4cfdcf Tried to write errors to stderr consistently in scripts 2023-11-05 15:55:07 -06:00
Christopher Haster
d0a6ef0c89 Changed scripts to not infer field purposes from CSV values
Note there's a bit of subtlety here, field _types_ are still infered,
but the intention of the fields, i.e. if the field contains data vs
row name/other properties, must be unambiguous in the scripts.

There is still a _tiny_ bit of inference. For most scripts only one
of --by or --fields is strictly needed, since this makes the purpose of
the other fields unambiguous.

The reason for this change is so the scripts are a bit more reliable,
but also because this simplifies the data parsing/inference a bit.

Oh, and this also changes field inference to use the csv.DictReader's
fieldnames field instead of only inspecting the returned dicts. This
should also save a bit of O(n) overhead when parsing CSV files.
2023-11-04 15:24:18 -05:00
Christopher Haster
0f93fa3057 Tweaked script field arg parsing to strip whitespace almost everywhere
The whitespace sensitivity of field args was starting to be a problem,
mostly for advanced plotmpl.py usage (which tbf might be appropriately
described as "super hacky" in how it uses CLI parameters):

  ./scripts/plotmpl.py \
      -Dcase=" \
          bench_rbyd_attr_append, \
          bench_rbyd_attr_remove, \
          bench_rbyd_attr_fetch, \
          ..."

This may present problems when parsing CSV files with whitespace, in
theory, maybe. But given the scope of these scripts for littlefs...
just don't do that. Thanks.
2023-11-03 15:03:46 -05:00
Christopher Haster
616b4e1c9e Tweaked scripts that consume .csv files to filter defines early
With the quantity of data being output by bench.py now, filtering ASAP
while parsing CSV files is a valuable optimization. And thanks to how
CSV files are structured, we can even avoid ever loading the full
contents into RAM.

This does end up with use filtering for defines redundantly in a few
places, but this is well worth the saved overhead from early filtering.

Also tried to clean up the plot.py/plotmpl.py's data folding path,
though that may have been wasted effort.
2023-11-03 14:30:22 -05:00
Christopher Haster
e7bf5ad82f Added scripts/crc32c.py
This seems like a useful script to have.
2023-09-15 18:42:48 -05:00
Christopher Haster
61c51b699a In scripts, adopted aggresive width-finding for unbounded recursion
This makes it easier to read the output, at a cost of these scripts not
terminating if the underlying call sctucture contains loops.

Previously these scripts would not terminate, but at least output the
call tree as they visit each function. This was hard to read, and wasn't
really that useful? If you hit a case with infinite recursion, you can
limit the output size explicitly with -Z.

Note this also drops --tree in stack.py. Since we get more readable
output, this flag is less useful. This simplifies the script a bit.
2023-07-18 21:40:39 -05:00
Christopher Haster
c4b3e9d826 A couple of script changes after CI integration
- Renamed struct_.py -> structs.py again.

- Removed lfs.csv, instead prefering script specific csv files.

- Added *-diff make rules for quick comparison against a previous
  result, results are now implicitly written on each run.

  For example, `make code` creates lfs.code.csv and prints the summary, which
  can be followed by `make code-diff` to compare changes against the saved
  lfs.code.csv without overwriting.

- Added nargs=? support for -s and -S, now uses a per-result _sort
  attribute to decide sort if fields are unspecified.
2022-12-06 23:09:07 -06:00
Christopher Haster
387cf6f6e0 Fixed a couple corner cases in scripts when fields are empty
- Fixed added/removed count in scripts when an entry has no field in
  the expected results

- Fixed a python-sort-type issue when by-field is missing in a result
2022-11-28 12:51:18 -06:00
Christopher Haster
b2a2cc9a19 Added teepipe.py and watch.py 2022-11-15 13:38:13 -06:00
Christopher Haster
3a33c3795b Added perfbd.py and block device performance sampling in bench-runner
Based loosely on Linux's perf tool, perfbd.py uses trace output with
backtraces to aggregate and show the block device usage of all functions
in a program, propagating block devices operation cost up the backtrace
for each operation.

This combined with --trace-period and --trace-freq for
sampling/filtering trace events allow the bench-runner to very
efficiently record the general cost of block device operations with very
little overhead.

Adopted this as the default side-effect of make bench, replacing
cycle-based performance measurements which are less important for
littlefs.
2022-11-15 13:38:13 -06:00
Christopher Haster
df283aeb48 Added recursive results to perf.py
This adds -P/--propagate and -Z/--depth to perf.py for showing recursive
results, making it easy to narrow down on where spikes in performance
come from.

This ended up being a bit different from stack.py's recursive results,
as we end up with different (diminishing) numbers as we descend.
2022-11-15 13:38:13 -06:00