Commit Graph

1897 Commits

Author SHA1 Message Date
Christopher Haster
1d8d0785fc scripts: More flags to control table renderer, -Q/--small-table, etc
Instead of trying to be too clever, this just adds a bunch of small
flags to control parts of table rendering:

- --no-header - Don't show the header.
- --small-header - Don't show by field names.
- --no-total - Don't show the total.
- -Q/--small-table - Equivalent to --small-header + --no-total.

Note that -Q/--small-table replaces the previous -Y/--summary +
-c/--compare hack, while also allowing a similar table style for
non-compare results.
2024-12-18 14:03:35 -06:00
Christopher Haster
a3ac512cc1 scripts: Adopted Parser class in prettyasserts.py
This ended up being a pretty in-depth rework of prettyasserts.py to
adopt the shared Parser class. But now prettyasserts.py should be both
more robust and faster.

The tricky parts:

- The Parser class eagerly munches whitespace by default. This is
  usually a good thing, but for prettyasserts.py we need to keep track
  of the whitespace somehow in order to write it to the output file.

  The solution here is a little bit hacky. Instead of complicating the
  Parser class, we implicitly add a regex group for whitespace when
  compiling our lexer.

  Unfortunately this does make last-minute patching of the lexer a bit
  messy (for things like -p/--prefix, etc), thanks to Python's
  re.Pattern class not being extendable. To work around this, the Lexer
  class keeps track of the original patterns to allow recompilation.

- Since we no longer tokenize in a separate pass, we can't use the
  None token to match any unmatched tokens.

  Fortunately this can be worked around with sufficiently ugly regex.
  See the 'STUFF' rule.

  It's a good thing Python has negative lookaheads.

  On the flip side, this means we no longer need to explicitly specify
  all possible tokens when multiple tokens overlap.

- Unlike stack.py/csv.py, prettyasserts.py needs multi-token lookahead.

  Fortunately this has a pretty straightforward solution with the
  addition of an optional stack to the Parser class.

  We can even have a bit of fun with Python's with statements (though I
  do wish with statements could have else clauses, so we wouldn't need
  double nesting to catch parser exceptions).

---

In addition to adopting the new Parser class, I also made sure to
eliminate intermediate string allocation through heavy use of Python's
io.StringIO class.

This, plus Parser's cheap shallow chomp/slice operations, gives
prettyasserts.py a much needed speed boost.

(Honestly, the original prettyasserts.py was pretty naive, with the
assumption that it wouldn't be the bottleneck during compilation. This
turned out to be wrong.)

These changes cut total compile time in ~half:

                                          real      user      sys
  before (time make test-runner -j): 0m56.202s 2m31.853s 0m2.827s
  after  (time make test-runner -j): 0m26.836s 1m51.213s 0m2.338s

Keep in mind this includes both prettyasserts.py and gcc -Os (and other
Makefile stuff).
2024-12-17 15:34:44 -06:00
Christopher Haster
eeab0c41e8 scripts: Reverted to lh type preference in prettyasserts.py
This was flipped in b5e264b.

Infering the type from the right-hand side is tempting here, but the
right-hand side if often a constant, which gets a bit funky in C.

Consider:

  assert(lfs->cfg->read != NULL);

  gcc: warning: ISO C forbids initialization between function pointer
  and ‘void *’ [-Wpedantic]

  assert(err < 0ULL);

  gcc: warning: comparison of unsigned expression in ‘< 0’ is always
  false [-Wtype-limits]

Prefering the left-hand type should hopefully avoid these issues most of
the time.
2024-12-17 15:34:44 -06:00
Christopher Haster
4c87d59c7b scripts: Simplified result->file mapping, dropped collect_dwarf_files
This reverts per-result source file mapping, and tears out of a bunch of
messy dwarf parsing code. Results from the same .o file are now mapped
to the same source file.

This was just way too much complexity for slightly better result->file
mapping, which risked losing results accidentally mapped to the wrong
file.

---

I was originally going to revert all the way back to relying strictly on
the .o name and --build-dir (490e1c4) (this is the simplest solution),
but after poking around in dwarf-info a bit, I realized we do have
access to the original source file in DW_TAG_compile_unit's
DW_AT_comp_dir + DW_AT_name.

This is much simpler/more robust than parsing objdump --dwarf=rawline,
and avoid needing --build-dir in a bunch of scripts.

---

This also reverts stack.py to rely only on the .ci files. These seem as
reliable as DW_TAG_compile_unit while simplifying things significantly.

Symbol mapping used to be a problem, but this was fixed by using the
symbol in the title field instead of the label field (which strips some
optimization suffixes?)
2024-12-17 15:34:39 -06:00
Christopher Haster
dad3367e9e scripts: Adopted Parser in csv.py
It's a bit funny, the motivation for a new Parser class came from the
success of simple regex + space munching in csv.py, but adopting Parser
in csv.py makes sense for a couple reasons:

- Consistency and better code sharing with other scripts that need to
  parse things (stack.py, prettyasserts.py?).

- Should be more efficient, since we avoid copying the entire string
  every time we chomp/slice.

  Though I don't think this really matters for the size of csv.py's
  exprs...

- No need to write every regex twice! Since Parser remembers the last
  match.
2024-12-16 19:27:31 -06:00
Christopher Haster
5d777f84ad scripts: Use depth to limit recursive result collection
If we're not using these results, no reason to collect all of the
children.

Note that we still need to recurse for other measurements (limit, struct
size, etc).

This has a measurable, but small, impact on runtime:

  stack.py -z0 -Y: 0.202s
  stack.py -z1 -Y: 0.162s (~-19.8%)

  ctx.py -z0 -Y: 0.112s
  ctx.py -z1 -Y: 0.098s (~-12.5%)
2024-12-16 19:27:12 -06:00
Christopher Haster
6a6ed0f741 scripts: Dropped cycle detection from table renderer
Now that cycle detection is always done at result collection time, we
don't need this in the table renderer itself.

This had a tendency to cause problems for non-function scripts (ctx.py,
structs.py).
2024-12-16 19:26:21 -06:00
Christopher Haster
dd389f23ee scripts: Switched to sorted sets for result notes
God, I wish Python had an OrderedSet.

This is a fix for duplicate "cycle detected" notes when using -t/--hot.
This mix of merging both _hot_notes and _notes in the HotResult class is
tricky when the underlying container is a list.

The order is unlikely to be guaranteed anyways, when different results
with different notes are folded.

And if we ever want more control over the order of notes in result
scripts we can always change this back later.
2024-12-16 19:22:14 -06:00
Christopher Haster
3e03c2ee7f scripts: Adopted better input file handling in result scripts
- Error on no/insufficient files.

  Instead of just returning no results. This is more useful when
  debugging complicated bash scripts.

- Use elf magic to allow any file order in perfbd.py/stack.py.

  This was already implemented in stack.py, now also adopted in
  perfbd.py.

  Elf files always start with the magic string "\x7fELF", so we can use
  this to figure out the types of input files without needing to rely on
  argument order.

  This is just one less thing to worry about when invoking these
  scripts.
2024-12-16 19:13:22 -06:00
Christopher Haster
4325a06277 scripts: Fixed incorrect files on recursive results
It's been a while since I've been hurt by Python's late-binding
variables. In this case the scope-creep of the "file" variable hid that
we didn't actually know which recursive result belonged to which file.
Instead we were just assigning whatever the most recent top-level result
was.

This is fixed by looking up the correct file in childrenof. Though this
unfortunately does add quite a bit of noise.
2024-12-16 19:12:46 -06:00
Christopher Haster
c8c12ffae8 scripts: Reverted stack.py to use -fcallgraph-info=su again
See previous commit for the issues with stack.py's current approach. I'm
convinced dwarf-info simply does not contain enough info to figure out
stack usage.

There is one last idea, which is to parse the dissassembly. In theory
you only need to understand calls, branches (for control-flow), and
push/pop instructions to figure out the worst-case stack usage. But this
would be ISA-specific and error-prone, so it probably shouldn't
_replace_ the -fcallgraph-info=su based stack.py.

So, out of ideas, reverting.

---

It's worth noting this isn't a trivial revert. There's a couple
interesting changes in stack.py:

- We now use .o files to map callgraph nodes to relevant symbol names.

  This should be a bit more robust than relying only on the names in the
  .ci files, and guarantees function names line up with other
  symbol-based scripts (code.py, ctx.py, etc).

  This also lets us warn on missing callgraph nodes, in case the
  callgraph info is incomplete.

- Callgraph parsing should be quite a bit more robust now. Added a small
  (and reusable?) Parser class.

- Moved cycle detection into result collection.

  This should let us drop cycle detection from the table renderer
  eventually.
2024-12-16 18:10:23 -06:00
Christopher Haster
0e658b8246 scripts: Attempted reimp of stack.py using dwarf variable tags
Problem: I misunderstood the purpose of .debug_frames (objdump
--dwarf=frames).

The purpose of .debug_frames is not to record the size of function
stack frames, but to only tell a debugger how to access the previous
function's stack frame. It just so happens that this _coincidentally_
tells you the stack frame size when compiling with -fomit-frame-pointer.

With -fno-omit-frame-pointer (common on some archs), .debug_frames just
says "hey here's the frame pointer" (DW_CFA_def_cfa_register), which
tells us nothing about the function's actual stack usage.

So unfortunately .debug_frames does not provide enough info on its
own...

---

This commit was an attempt to find the actual stack usage by looking at
the relevant variable info (DW_TAG_variable, etc) in function's dwarf
info, but this approach is also not looking very good...

1. The numbers do not appear correct:

     before -fcallgraph-info=su:         2720
     after --dwarf=info:                 3558
     after --dwarf=info --no-shrinkwrap: 3922

     (this is with -fno-omit-frame-pointer)

   In hindsight, this approach is fundamentally flawed. While the
   variable tags does give us a lower bound on stack usage, it doesn't
   tell us about implicit compiler variables and various stack push/pops
   as a part of expression evaluation.

   As far as I can tell there's simply not enough info in dwarf info to
   find an accurate upper bound on stack usage.

2. This approach is quite a bit more complicated, since we need:

   1. Dwarf info (--dwarf=info) to find variable tags.
   2. Location info (--dwarf=loc) to map var allocations to address
      ranges.
   3. Range info (--dwarf=Ranges) to map lexical blocks to address
      ranges when var allocation is implicit (not implemented).
   4. And we still need frame info (--dwarf=frames)! since var
      allocations are frame-relative.

3. Also dwarf info is not guaranteed to contain the whole callgraph.

   It seems callgraph info is actually _omitted_ with -O0??

   I guess this is because the callgraph info is a side-effect of some
   compiler pass? This seems a bit backwards.

   Dwarf does have a flag (DW_AT_call_all_calls) to indicate when
   callgraph info is complete, but it doesn't seem to be set reliably?

   Even with optimizations, lfsr_bd_sync, _and only lfsr_bd_sync_, is
   missing the DW_AT_call_all_calls flag. I have no idea why. The flag
   is still present in lfsr_bd_erase, lfsr_bd_read, and other functions
   with function pointers...

So I think this will probably be reverted.
2024-12-16 18:01:47 -06:00
Christopher Haster
14e7501e5c scripts: Minor stack.py fixes
- Always interpret DW_AT_low_pc/high_pc/call_return_pc as hex.

  Clang populates these with hex digits without a 0x prefix.

- Don't ignore callees with no name.

  Now that DW_AT_abstract_origin is fixed, a callee with no name should
  be an error.
2024-12-16 18:01:46 -06:00
Christopher Haster
ac79c88c6f scripts: Improved cycle detection notes in scripts
- Prevented childrenof memoization from hiding the source of a
  detected cycle.

- Deduplicated multiple cycle detected notes.

- Fixed note rendering when last column does not have a notes list.
  Currently this only happens when entry is None (no results).
2024-12-16 18:01:46 -06:00
Christopher Haster
56d888933f scripts: Reworked stack.py to use dwarf, dropped -fcallgraph-info=su
There were a lot of small challenges (see previous commits), but this
commit reworks stack.py to rely only on dwarf-info and symbols to build
stack + callgraph info.

Not only does this remove an annoying dependency on a GCC-specific flag,
but it also should give us more correct stack measurements by only
penalizing calls for the stack usage at the call site. This should
better account for things like shrinkwrapping, which make the
-fcallgraph-info=su results look worse than they actually are.

To make this work required jumping through a couple hoops:

1. Map symbols -> dwarf entries by address (DW_AT_low_pc).

   We use symbols here to make sure function names line up with other
   scripts.

   Note that there can be multiple dwarf entries with the same name due
   to optimization passes. Apparently the optimized name is not included
   because that would be too useful.

2. Find each functions' frame info.

   This is stored in the .debug_frames section (objdump --dwarf=frames),
   and requires _yet another state machine_ to parse, but gives us the
   stack frame info for each function at the instruction level, so
   that's nice.

3. Find call sites (DW_TAG_call_site).

   The hierchical nesting of DW_TAG_lexical_blocks gets a bit annoying
   here, but ultimately we can find all DW_TAG_call_sites by looking at
   the DW_TAG_subprogram's children tags.

4. Map call sites to frame info.

   This gets funky.

   Finding the target function is simple enough, DW_AT_call_origin
   contains its dwarf offset (but why is this the _origin_?). But we
   don't actually know what address the call originated from.

   Fortunately we do know the return address, DW_AT_call_return_pc?

   The instruction before DW_AT_call_return_pc should be the call
   instruction. Subtracting 1 will awkwardly put us in the middle of the
   instruction, but it should at least map to the correct stack frame?
   And without ISA-specific info it's the best we can do.

It's messy, but this should be all the info we need.

---

To build confidence in the new script, I included the --no-shrinkwrap
flag, which reverts to penalizing each call site for the function's
worst-case stack frame. This makes it easy to compare against the
-fcallgraph-info=su approach:

  with -fcallgraph-info=su:          2624
  with --dwarf=info --no-shrinkwrap: 2624

I was hoping that accounting for shrinkwrap-like optimizations would
reveal a lower stack cost, but for better or worse it seems that
worst-case stack usage is unchanged:

  with --dwarf=info --no-shrinkwrap: 2624
  with --dwarf=info:                 2624

Still, it's good to know that our stack measurement is correct.
2024-12-16 18:01:46 -06:00
Christopher Haster
02ccbdfed2 scripts: Enabled symbol->dwarf mapping via address
We have symbol->addr info and dwarf->addr info (DW_AT_low_pc), so why
not use this to map symbols to dwarf entries?

This should hopefully be more reliable than the current name based
heuristic, but only works for functions (DW_TAG_subprogram).

Note that we still have to fuzzy match due to thumb-bit weirdness (small
rant below).

---

Ok. Why in Thumb does the symbol table include the thumb bit, but the
dwarf info does not?? Would it really have been that hard to add the
thumb bit to DW_AT_low_pc so symbols and dwarf entries match?

So, because of Thumb, we can't expect either the address or name to
match exactly. The best we can do is binary search and expect the symbol
to point somewhere _within_ the dwarf's DW_AT_low_pc/DW_AT_high_pc
range.

Also why does DW_AT_high_pc store the _size_ of the function?? Why isn't
it, idunno, the _high_pc_? I get that the size takes up less space when
leb128 encoding, but surely there could have been a better name?
2024-12-16 18:01:46 -06:00
Christopher Haster
eb09865868 scripts: Resolve DW_AT_abstract_origin during dwarf collection
Sometimes I feel like dwarf-info is designed to be as error-prone as
possible.

In this case, DW_AT_abstract_origin indicates that one dwarf entry
should inherit the attributes of another. If you don't know this, it's
easy to miss relevant dwarf entries due to missing name fields, etc.

Expanding DW_AT_abstract_origin lazily would be tricky due to how our
DwarfInfo class is structured, so instead I am just expanding
DW_AT_abstract_origins during collect_dwarf_info.

Note this doesn't handle recursive DW_AT_abstract_origins, but there is
at least an assert.

---

It does seem like DW_AT_abstract_origin is intended to be limited to
"Inline instances of inline subprograms" and "Out-of-line instances of
inline subprograms" according to the DWARF5 spec, but it's unclear if
this is a rule or suggestion...

This hasn't been an issue for existing scripts, but is needed from some
ongoing stack.py rework. Otherwise we don't find "out-of-line instances
of inline subprograms" (optimized functions?) correctly.
2024-12-16 18:01:46 -06:00
Christopher Haster
19cd428a3c scripts: Added DwarfEntry.info to help find recursive tags
Long story short: DW_TAG_lexical_blocks are annoying.

In order to search the full tree of children of a given dwarf entry, we
need a recursive function somewhere. We might as well make this function
a part of the DwarfEntry class so we can share it with other scripts.

Note this is roughly the same as collect_dwarf_info, but limited to
the children of a given dwarf entry.

This is useful for ongoing stack.py rework.
2024-12-16 18:01:46 -06:00
Christopher Haster
faf4d09c34 scripts: Added __repr__ to RInt and friends
Just a minor quality of life feature to help debugging these scripts.
2024-12-16 18:01:46 -06:00
Christopher Haster
e4ff9a1701 scripts: Added Line class for collect_dwarf_lines
It seems like a good rule of thumb is for every XInfo class to be paired
with at least a small X class wrapper:

- Easier to extend without breaking tuple unpacking everywhere
- Better code readability
- Better memory reuse in _by_addr caches (less tuple repacking)
2024-12-16 18:01:46 -06:00
Christopher Haster
b4038e3c27 scripts: Include global/section info in collect_syms, added Sym
We have this info, might as well expose it for scripts to use.

Unfortunately this extra info did make tuple unpacking a bit of a mess,
especially in scripts that don't use this extra info, so I've added a
small Sym class similar to DwarfEntry in collect_dwarf_info.

This is useful for some ongoing stack.py rework.
2024-12-16 18:01:46 -06:00
Christopher Haster
eb7fff8843 scripts: Include all entries in collect_dwarf_info
Note this only affects the top-level entries. Dwarf-info contains a
heirarchical structure, but for some scripts we just don't care. Finding
DW_TAG_variables in nested DW_TAG_lexical_blocks for example.

This is useful for ongoing stack.py rework.
2024-12-16 18:01:46 -06:00
Christopher Haster
bd7004a4f3 scripts: Prefer objdump --syms over -t in scripts
objdump --syms is a bit more self-documenting.

The other uses of objdump already use the long forms (--dwarf=rawline,
--dwarf=info).
2024-12-16 18:01:46 -06:00
Christopher Haster
308b4b6080 scripts: Made dwarf tags explicit in ctx.py/structs.py
This will make ctx.py/structs.py more likely to error on unknown tags,
which is preferable to silently reporting incorrect numbers.
2024-12-16 18:01:46 -06:00
Christopher Haster
b90b2953ea scripts: Some minor regex cleanup
Just trying to make regex in scripts a bit more consistent. Though regex
being regex this may be fruitless.
2024-12-16 18:01:46 -06:00
Christopher Haster
28d89eb009 scripts: Adopted simpler+faster heuristic for symbol->dwarf mapping
After tinkering around with the scripts for a bit, I've started to
realize difflib is kinda... really slow...

I don't think this is strictly difflib's fault. It's a pure python
library (proof of concept?), may be prioritizing quality over speed, and
I may be throwing too much data at it.

difflib does have quick_ratio() and real_quick_ratio() for faster
comparisons, but while looking into these for correctness, I realized
there's a simpler heuristic we can use since GCC's optimized names seem
strictly additive: Choose the name that matches with the smallest prefix
and suffix.

So comparing, say, lfsr_rbyd_lookup to __lfsr_rbyd_lookup.constprop.0:

    lfsr_rbyd_lookup
  __lfsr_rbyd_lookup.constprop.0
   |'------.-------''----.-----'
   '-------|-----.   .---'
           v     v   v
  key: (matches, 2, 12)

Note we prioritize the prefix, since it seems GCC's optimized names are
strictly suffixes. We also now fail to match if the dwarf name is not
substring, instead of just finding the most similar looking symbol.

This results in both faster and more robust symbol->dwarf mapping:

  before: time code.py -Y: 0.393s
  after:  time code.py -Y: 0.152s

  (this is WITH the fast dict lookup on exact matches!)

This also drops difflib from the scripts. So one less dependency to
worry about.
2024-12-16 18:01:33 -06:00
Christopher Haster
e77010265e scripts: Replaced nm with objdump in code.py/data.py
There is an argument for prefering nm for code size measurements due to
portability. But I'm not sure this really holds up these days with
objdump being so prevalent.

We already depend on objdump for ctx/structs/perf and other dwarf info,
so using objdump -t to get symbol information means one less tool to
depend on/pass around when cross-compiling.

As a minor benefit this also gives us more control over which sections
to include, instead of relying on nm's predefined t/r/d/b section types.

---

Note code.py/data.py did _not_ require objdump before this. They did use
objdump to map symbols to source files, but would just guess if
objdump wasn't available.
2024-12-15 16:39:04 -06:00
Christopher Haster
8526cd9cf1 scripts: Prevented i/children/notes result field collisions
Without this, naming a column i/children/notes in csv.py could cause
things to break. Unlikely for children/notes, but very likely for i,
especially when benchmarking.

Unfortunately namedtuple makes this tricky. I _want_ to just rename
these to _i/_children/_notes and call the problem solved, but namedtuple
reserves all underscore-prefixed fields for its own use.

As a workaround, the table renderer now looks for _i/_children/_notes at
the _class_ level, as an optional name of which namedtuple field to use.
This way Result types can stay lightweight namedtuples while including
extra table rendering info without risk of conflicts.

This also makes the HotResult type a bit more funky, but that's not a
big deal.
2024-12-15 16:36:14 -06:00
Christopher Haster
183ede1b83 scripts: Option for result scripts to force children ordering
This extends the recursive part of the table renderer to sort children
by the optional "i" field, if available.

Note this only affects children entries. The top-level entries are
strictly ordered by the relevant "by" fields. I just haven't seen a use
case for this yet, and not sorting "i" at the top-level reduces that
number of things that can go wrong for scripts without children.

---

This also rewrites -t/--hot to take advantage of children ordering by
injecting a totally-no-hacky HotResult subclass.

Now -t/--hot should be strictly ordered by the call depth! Though note
entries that share "by" fields are still merged...

This also gives us a way to introduce the "cycle detected" note and
respect -z/--depth, so overall a big improvement for -t/--hot.
2024-12-15 16:35:52 -06:00
Christopher Haster
e6ed785a27 scripts: Removed padding from tail notes in tables
We don't really need padding for the notes on the last column of tables,
which is where row-level notes end up.

This may seem minor, but not padding here avoids quite a bit of
unnecessary line wrapping in small terminals.
2024-12-15 16:35:29 -06:00
Christopher Haster
94df6d47d4 scripts: Added make ctx, adopted ctx.py in the Makefile
make ctx now does what you expect it to, and ctx.py now replaces
structs.py in the summary rules (make funcs, make summary):

  $ make summary
  ... blablabla ...
              code     data    stack      ctx
  TOTAL      38100        0     2624      752

Also finally cleaned up SUMMARYFLAGS in make funcs. This should have
been cleaned up when cleaning up make summary...
2024-12-15 16:34:32 -06:00
Christopher Haster
512cf5ad4b scripts: Adopted ctx.py-related changes in other result scripts
- Adopted higher-level collect data structures:

  - high-level DwarfEntry/DwarfInfo class
  - high-level SymInfo class
  - high-level LineInfo class

  Note these had to be moved out of function scope due to pickling
  issues in perf.py/perfbd.py. These were only function-local to
  minimize scope leak so this fortunately was an easy change.

- Adopted better list-default patterns in Result types:

    def __new__(..., children=None):
        return Result(..., children if children is not None else [])

  A classic python footgun.

- Adopted notes rendering, though this is only used by ctx.py at the
  moment.

- Reverted to sorting children entries, for now.

  Unfortunately there's no easy way to sort the result entries in
  perf.py/perfbd.py before folding. Folding is going to make a mess
  of more complicated children anyways, so another solution is
  needed...

And some other shared miscellany.
2024-12-15 15:41:11 -06:00
Christopher Haster
b4c79c53d2 scripts: csv.py: Fixed NoneType issues with default sort
$ ./scripts/csv.py lfs.code.csv -bfunction -fsize -S
  ... blablabla ...
  TypeError: cannot unpack non-iterable NoneType object

The issue was argparse's const defaults bypassing the type callback, so
the sort field ends up with None when it expects a tuple (well
technically a tuple tuple).

This is only an issue for csv.py because csv.py's sort fields can
contain exprs.
2024-12-15 15:39:04 -06:00
Christopher Haster
55d01f69f9 scripts: Adopted ctx.py-related changes in structs.py
- Dropped --internal flag, structs.py includes all structs now.

  No reason to limit structs.py to public structs if ctx.py exists.

- Added struct/union/enum prefixes to results (enums were missing in
  ctx.py).

- Only sort children layers if explicitly requested. This should
  preserve field order, which is nice.

- Adopt more advanced FileInfo/DwarfInfo classes.

- Adopted table renderer changes (notes rendering).
2024-12-15 15:10:49 -06:00
Christopher Haster
c8a4ee91a6 scripts: ctx.py: Only sort children layers if explicitly requested
- Sorting struct fields by name? Eh, that's not a big deal.
- Sorting function params by name? Okay, that's really annoying.

This compromises by sorting only the top-level results by name, and
leaving recursive results in the order returned by collect by default.
Recursive results should usually have a well-defined order.

This should be extendable to the other result scripts as well.
2024-12-15 15:04:11 -06:00
Christopher Haster
3a0a58369a scripts: ctx.py: Added struct/union namespace prefix to results
This is a bit more readable and better matches the names used in the C
code (lfs_config vs struct lfs_config).

The downside is we now have fields with spaces in them, which may cause
problems for naive parsers.
2024-12-15 14:56:53 -06:00
Christopher Haster
2df97cd858 scripts: Added ctx.py for finding function contexts
ctx.py reports functions' "contexts", i.e. the sum of the size of all
function parameters and indirect structs, recursively dereferencing
pointers when possible.

The idea is this should give us a rough lower bound on the amount of
state that needs to be allocated to call the function:

  $ ./scripts/ctx.py lfs.o lfs_util.o -Dfunction=lfsr_file_write -z3 -s
  function                size
  lfsr_file_write          596
  |-> lfs                  436
  |   '-> lfs_t            432
  |-> file                 152
  |   '-> lfsr_file_t      148
  |-> buffer                 4
  '-> size                   4
  TOTAL                    596

---

The long story short is that structs.py, while very useful for
introspection, has not been useful as a general metric.

Sure it can give you a rough idea of the impact of small changes to
struct sizes, but it's not uncommon for larger changes to add/remove
structs that have no real impact on the user facing RAM usage. There are
some structs we care about (lfs_t) and some we don't (lfsr_data_t).
Internal-only structs should already be measured by stack.py.

Which raises the question, how do we know which structs we care about?

The idea here is to look at function parameters and chase pointers. This
gives a complicated, but I think reasonable, heuristic. Fortunately
dwarf-info gives us all the necessary info.

Some notes:

- This does _not_ include buffer sizes. Buffer sizes are user
  configurable, so it's sort of up to the user to account for these.

- We include structs once if we find a cycle (lfsr_file_t.o for
  example). Can't really do any better and this at least provides a
  lower bound for complex data-structures.

- We sum all params/fields, but find the max of all functions. Note this
  prevents common types (lfs_t for example) from being counted more than
  once.

- We only include global functions (based on the symbol flag). In theory
  the context of all internal functions should end up in stack.py.

  This can be overridden with --everything.

Note this doesn't replace structs.py. structs.py is still useful for
looking at all structs in the system. ctx.py should just be more useful
for comparing builds at a high level.
2024-12-15 13:24:31 -06:00
Christopher Haster
25814ed5cb scripts: Fixed failed subprocess stderr, unconditionally forward
It looks like the failure case in our scripts' subprocess stderr
handling was not tested well during a fix to stderr blocking (a735bcd).

This code was attempting to print stderr only if an error occured, but
with stderr=None this just results in a NoneType TypeError.

In retrospect, completely hiding stderr is kind of shitty if a
subprocess fails, but it doesn't seem possible to read from both stdin
and stderr with Python's APIs without getting stuck when the stderr's
buffer is full.

It might be possible to work around this with either multithreading,
select calls, or a temp file, but I'm not sure slightly less verbose
scripts are worth the added complexity in every single subprocess call.

For now just reverting to unconditionally forwarding stderr from the
child process. This is the simplest/most robust option.
2024-12-14 15:08:39 -06:00
Christopher Haster
b58266c3b0 scripts: Small refactor to adopt collect_thing pattern everywhere
- stack.py:collect -> collect + collect_cov
- perf.py:collect_syms_and_lines -> collect_syms + collect_dwarf_lines
- perfbd.py:collect_syms_and_lines -> collect_syms + collect_dwarf_lines

This should hopefully lead to both better readability and better code
reuse.

Note collect_dwarf_lines is a bit different than collect_dwarf_files in
code.py/data.py/etc, but the extra complexity of collect_dwarf_lines is
probably not worth sharing here.
2024-12-14 15:08:04 -06:00
Christopher Haster
26ba7bdebc scripts: Adopted new dwarf-info parser in code.py/data.py
This breaks the collect function down into collect_dwarf_files,
collect_dwarf_info, and collect_sizes. This makes the dwarf-info parser
a bit easier to share with structs.py, etc.

Sharing easily copy-pastable chunks of code in scripts like this has
allowed for better code reuse without intricately tying script
dependencies together. Being able to run each of these scripts
standalone is a goal.
2024-12-14 12:37:43 -06:00
Christopher Haster
e00db216c1 scripts: Consistent table renderer, cycle detection optional
The fact that our scripts' table renderer was slightly different for
recursive scripts (stack.py, perf.py) and non-recursive scripts
(code.py, structs.py) was a ticking time bomb, one innocent edit away
from breaking half the scripts.

The makes the table renderer consistent across all scripts, allowing for
easy copy-pasting when editing at the cost of some unused code in
scripts.

One hiccup with this though is the difference in cycle detection
behavior between scripts:

- stack.py:

    lfsr_bd_sync
    '-> lfsr_bd_prog
        '-> lfsr_bd_sync  <-- cycle!

- structs.py:

    lfsr_bshrub_t
    '-> u
        '-> bsprout
            '-> u  <-- not a cycle!

To solve this the table renderer now accepts a simple detect_cycles
flag, which can be set per-script.
2024-12-14 12:25:15 -06:00
Christopher Haster
7c8afd26cf scripts: Added alignment info to structs.py
Dwarf-info doesn't actually provide alignment info with the current
tools I'm using (but it does look like DW_AT_alignment was added in a
recent version), so for now this is just a heuristic based on the
largest base/pointer type.

This heuristic is still useful info and probably correct for the types
littlefs cares about (no SIMD here!).

This is also another field that folds using max, so that's fun.
2024-12-03 10:52:23 -06:00
Christopher Haster
35f68a733c scripts: Reworked structs.py to include field info
This reworks structs.py's internal dwarf-info parser to be a bit more
flexible. The eventual plan is to adopt this parser in other scripts.

The main difference is we now parse the dwarf-info into a full tree,
with optional filtering, before extracting the fields we care about.
This is both more flexible and gives us more confidence the parser is
not misparsing something.

(Unrelated but apparently misparsing is a real word.)

This also extends structs.py to include field info for structs and
unions. This is quite useful for understanding the size of things:

  $ ./scripts/structs.py thumb/lfs.o -Dstruct=lfsr_bptr_t -z
  struct                      size
  lfsr_bptr_t                   20
  |-> cksize                     4
  |-> cksum                      4
  '-> data                      12
      |-> size                   4
      '-> u                      8
          |-> buffer             4
          '-> disk               8
              |-> block          4
              '-> off            4
  TOTAL                         20

The field info uses the same -z/--depth flag from stack.py/perf.py/
perbd.py, however the cycle detector needed a bit of tweaking. Detecting
cycles purely by name doesn't quite work with structs:

  file->o.o.flags
        ^ |
        '-' not a cycle!

Unfortunately, we do lose the field order in structs. But this info is
still useful.

Oh, we also prefer typedef names over struct/union names now. These are
a bit easier to read since they are more common in the codebase.
2024-12-03 10:52:13 -06:00
Christopher Haster
51b8cdb1f0 scripts: Added -q/--quiet to test.py/bench.py
This will probably only have niche uses, but may be useful for small
test sets or for running specific tests with -O-.

Though it is a bit funny that -q -O- turns test.py/bench.py into more or
less just a complicated way to run a C program.
2024-11-17 23:50:32 -06:00
Christopher Haster
0b450b1184 scripts: Reverted full C exprs in test/bench define ranges
A couple problems:

1. We should probably also support negative ranges, but this is a bit
   annoying since we can't tell if the range is negative or positive
   until expr evaluation.

2. Evaluating the range exprs at compile-time is inconsistent from other
   C exprs in our tests/benches (normal defines, if filters, etc), and
   severely limiting since we can't use other defines before the define
   system is initialized.

2. Attempting to move these range exprs into their own lazily evaluated
   functions does not seem tractable...

   We'd need to evaluate defines to know how many permutations there
   are, but how can we evaluate defines before knowing which permutation
   we're on?

   I think this circular dependency would make the permutation count
   undecidable?

Even if we could move these exprs to their own lazily evaluated
functions (which would solve the inconsistency issue), the complexity
risks outweighing the benefit. Keep in mind it's useful if external
tools can parse our tests. So reverting for now.

Though I am keeping some of the refactoring in test.py/bench.py. Having
a special DRange type is useful if we ever want to add more define
functions in the future.
2024-11-17 23:36:57 -06:00
Christopher Haster
608d8a2bc1 scripts: Enabled full C exprs in test/bench define ranges
This enables full C exprs in test/bench define ranges by simply passing
them on to the C compiler.

So this:

  defines.N = 'range(1,20+1)'

Becomes this, in N's define function:

  if (i < 0 + ((((20+1)-1-(1))/(1) + 1))) return ((i-(0))*(1) + (1));

Which is a bit of a mess, but generates the correct range at runtime.

This allows for much more flexible exprs in range defines without
needing a full expr parser in Python.

Note though that we need to evaluate the range length at compile time.
This is notably before the test/bench define system is initialized, so
all three range args (start, stop, step) are limited to really only
simple C literals and exprs.
2024-11-17 14:36:47 -06:00
Christopher Haster
ef3accc07c scripts: Tweaked -p/--percent to accept the csv file for diffing
This makes the -p/--percent flag a bit more consistent with -d/--diff
and -c/--compare, both of which change the printing strategy based on
additional context.
2024-11-16 18:01:27 -06:00
Christopher Haster
9a2b561a76 scripts: Adopted -c/--compare in make summary-diff
This showcases the sort of high-level result printing where -c/--compare
is useful:

  $ make summary-diff
              code             data           stack          structs
  BEFORE     57057                0            3056             1476
  AFTER      68864 (+20.7%)       0 (+0.0%)    3744 (+22.5%)    1520 (+3.0%)

There was one hiccup though: how to hide the name of the first field.

It may seem minor, but the missing field name really does help
readability when you're staring at a wall of CLI output.

It's a bit of a hack, but this can now be controlled with -Y/--summary,
which has the sole purpose of disabling the first field name if mixed
with -c/--compare.

-c/--compare is already a weird case for the summary row anyways...
2024-11-16 18:01:15 -06:00
Christopher Haster
29eff6f3e8 scripts: Added -c/--compare for comparing specific result rows
Example:

  $ ./scripts/csv.py lfs.code.csv \
          -bfunction -fsize \
          -clfsr_rbyd_appendrattr
  function                                size
  lfsr_rbyd_appendrattr                   3598
  lfsr_mdir_commit                        5176 (+43.9%)
  lfsr_btree_commit__.constprop.0         3955 (+9.9%)
  lfsr_file_flush_                        2729 (-24.2%)
  lfsr_file_carve                         2503 (-30.4%)
  lfsr_mountinited                        2357 (-34.5%)
  ... snip ...

I don't think this is immediately useful for our code/stack/etc
measurement scripts, but it's certainly useful in csv.py for comparing
results at a high level.

And by useful I mean it replaces a 40-line long awk script that has
outgrown its original purpose...
2024-11-16 17:59:22 -06:00
Christopher Haster
14687a20bf scripts: csv.py: Implicitly convert during string concatenation
This may be a (very javascript-esque) mistake, but implicit conversion
to strings is useful when mixing fields and strings in -b/--by field
exprs:

  $ ./scripts/csv.py input.csv -bcase='"test"+n' -fn

Note that this now (mostly) matches the behavior when the n field is
unspecified:

  $ ./scripts/csv.py input.csv -bcase='"test"+n'

Er... well... mostly. When we specify n as a field, csv.py does
typecheck and parse the field, which ends up sort of canonicalizing the
field, unlike omitting n which leaves n as a string... But at least if
the field was already canonicalized the behavior matches...

It may also be better to force all -b/--by expr inputs to strings first,
but this would require us to know which expr came from where. It also
wouldn't solve the canonicalization problem.
2024-11-16 17:39:39 -06:00