Commit Graph

20 Commits

Author SHA1 Message Date
Christopher Haster
ac79c88c6f scripts: Improved cycle detection notes in scripts
- Prevented childrenof memoization from hiding the source of a
  detected cycle.

- Deduplicated multiple cycle detected notes.

- Fixed note rendering when last column does not have a notes list.
  Currently this only happens when entry is None (no results).
2024-12-16 18:01:46 -06:00
Christopher Haster
02ccbdfed2 scripts: Enabled symbol->dwarf mapping via address
We have symbol->addr info and dwarf->addr info (DW_AT_low_pc), so why
not use this to map symbols to dwarf entries?

This should hopefully be more reliable than the current name based
heuristic, but only works for functions (DW_TAG_subprogram).

Note that we still have to fuzzy match due to thumb-bit weirdness (small
rant below).

---

Ok. Why in Thumb does the symbol table include the thumb bit, but the
dwarf info does not?? Would it really have been that hard to add the
thumb bit to DW_AT_low_pc so symbols and dwarf entries match?

So, because of Thumb, we can't expect either the address or name to
match exactly. The best we can do is binary search and expect the symbol
to point somewhere _within_ the dwarf's DW_AT_low_pc/DW_AT_high_pc
range.

Also why does DW_AT_high_pc store the _size_ of the function?? Why isn't
it, idunno, the _high_pc_? I get that the size takes up less space when
leb128 encoding, but surely there could have been a better name?
2024-12-16 18:01:46 -06:00
Christopher Haster
eb09865868 scripts: Resolve DW_AT_abstract_origin during dwarf collection
Sometimes I feel like dwarf-info is designed to be as error-prone as
possible.

In this case, DW_AT_abstract_origin indicates that one dwarf entry
should inherit the attributes of another. If you don't know this, it's
easy to miss relevant dwarf entries due to missing name fields, etc.

Expanding DW_AT_abstract_origin lazily would be tricky due to how our
DwarfInfo class is structured, so instead I am just expanding
DW_AT_abstract_origins during collect_dwarf_info.

Note this doesn't handle recursive DW_AT_abstract_origins, but there is
at least an assert.

---

It does seem like DW_AT_abstract_origin is intended to be limited to
"Inline instances of inline subprograms" and "Out-of-line instances of
inline subprograms" according to the DWARF5 spec, but it's unclear if
this is a rule or suggestion...

This hasn't been an issue for existing scripts, but is needed from some
ongoing stack.py rework. Otherwise we don't find "out-of-line instances
of inline subprograms" (optimized functions?) correctly.
2024-12-16 18:01:46 -06:00
Christopher Haster
19cd428a3c scripts: Added DwarfEntry.info to help find recursive tags
Long story short: DW_TAG_lexical_blocks are annoying.

In order to search the full tree of children of a given dwarf entry, we
need a recursive function somewhere. We might as well make this function
a part of the DwarfEntry class so we can share it with other scripts.

Note this is roughly the same as collect_dwarf_info, but limited to
the children of a given dwarf entry.

This is useful for ongoing stack.py rework.
2024-12-16 18:01:46 -06:00
Christopher Haster
faf4d09c34 scripts: Added __repr__ to RInt and friends
Just a minor quality of life feature to help debugging these scripts.
2024-12-16 18:01:46 -06:00
Christopher Haster
b4038e3c27 scripts: Include global/section info in collect_syms, added Sym
We have this info, might as well expose it for scripts to use.

Unfortunately this extra info did make tuple unpacking a bit of a mess,
especially in scripts that don't use this extra info, so I've added a
small Sym class similar to DwarfEntry in collect_dwarf_info.

This is useful for some ongoing stack.py rework.
2024-12-16 18:01:46 -06:00
Christopher Haster
eb7fff8843 scripts: Include all entries in collect_dwarf_info
Note this only affects the top-level entries. Dwarf-info contains a
heirarchical structure, but for some scripts we just don't care. Finding
DW_TAG_variables in nested DW_TAG_lexical_blocks for example.

This is useful for ongoing stack.py rework.
2024-12-16 18:01:46 -06:00
Christopher Haster
bd7004a4f3 scripts: Prefer objdump --syms over -t in scripts
objdump --syms is a bit more self-documenting.

The other uses of objdump already use the long forms (--dwarf=rawline,
--dwarf=info).
2024-12-16 18:01:46 -06:00
Christopher Haster
308b4b6080 scripts: Made dwarf tags explicit in ctx.py/structs.py
This will make ctx.py/structs.py more likely to error on unknown tags,
which is preferable to silently reporting incorrect numbers.
2024-12-16 18:01:46 -06:00
Christopher Haster
b90b2953ea scripts: Some minor regex cleanup
Just trying to make regex in scripts a bit more consistent. Though regex
being regex this may be fruitless.
2024-12-16 18:01:46 -06:00
Christopher Haster
28d89eb009 scripts: Adopted simpler+faster heuristic for symbol->dwarf mapping
After tinkering around with the scripts for a bit, I've started to
realize difflib is kinda... really slow...

I don't think this is strictly difflib's fault. It's a pure python
library (proof of concept?), may be prioritizing quality over speed, and
I may be throwing too much data at it.

difflib does have quick_ratio() and real_quick_ratio() for faster
comparisons, but while looking into these for correctness, I realized
there's a simpler heuristic we can use since GCC's optimized names seem
strictly additive: Choose the name that matches with the smallest prefix
and suffix.

So comparing, say, lfsr_rbyd_lookup to __lfsr_rbyd_lookup.constprop.0:

    lfsr_rbyd_lookup
  __lfsr_rbyd_lookup.constprop.0
   |'------.-------''----.-----'
   '-------|-----.   .---'
           v     v   v
  key: (matches, 2, 12)

Note we prioritize the prefix, since it seems GCC's optimized names are
strictly suffixes. We also now fail to match if the dwarf name is not
substring, instead of just finding the most similar looking symbol.

This results in both faster and more robust symbol->dwarf mapping:

  before: time code.py -Y: 0.393s
  after:  time code.py -Y: 0.152s

  (this is WITH the fast dict lookup on exact matches!)

This also drops difflib from the scripts. So one less dependency to
worry about.
2024-12-16 18:01:33 -06:00
Christopher Haster
e77010265e scripts: Replaced nm with objdump in code.py/data.py
There is an argument for prefering nm for code size measurements due to
portability. But I'm not sure this really holds up these days with
objdump being so prevalent.

We already depend on objdump for ctx/structs/perf and other dwarf info,
so using objdump -t to get symbol information means one less tool to
depend on/pass around when cross-compiling.

As a minor benefit this also gives us more control over which sections
to include, instead of relying on nm's predefined t/r/d/b section types.

---

Note code.py/data.py did _not_ require objdump before this. They did use
objdump to map symbols to source files, but would just guess if
objdump wasn't available.
2024-12-15 16:39:04 -06:00
Christopher Haster
8526cd9cf1 scripts: Prevented i/children/notes result field collisions
Without this, naming a column i/children/notes in csv.py could cause
things to break. Unlikely for children/notes, but very likely for i,
especially when benchmarking.

Unfortunately namedtuple makes this tricky. I _want_ to just rename
these to _i/_children/_notes and call the problem solved, but namedtuple
reserves all underscore-prefixed fields for its own use.

As a workaround, the table renderer now looks for _i/_children/_notes at
the _class_ level, as an optional name of which namedtuple field to use.
This way Result types can stay lightweight namedtuples while including
extra table rendering info without risk of conflicts.

This also makes the HotResult type a bit more funky, but that's not a
big deal.
2024-12-15 16:36:14 -06:00
Christopher Haster
183ede1b83 scripts: Option for result scripts to force children ordering
This extends the recursive part of the table renderer to sort children
by the optional "i" field, if available.

Note this only affects children entries. The top-level entries are
strictly ordered by the relevant "by" fields. I just haven't seen a use
case for this yet, and not sorting "i" at the top-level reduces that
number of things that can go wrong for scripts without children.

---

This also rewrites -t/--hot to take advantage of children ordering by
injecting a totally-no-hacky HotResult subclass.

Now -t/--hot should be strictly ordered by the call depth! Though note
entries that share "by" fields are still merged...

This also gives us a way to introduce the "cycle detected" note and
respect -z/--depth, so overall a big improvement for -t/--hot.
2024-12-15 16:35:52 -06:00
Christopher Haster
e6ed785a27 scripts: Removed padding from tail notes in tables
We don't really need padding for the notes on the last column of tables,
which is where row-level notes end up.

This may seem minor, but not padding here avoids quite a bit of
unnecessary line wrapping in small terminals.
2024-12-15 16:35:29 -06:00
Christopher Haster
512cf5ad4b scripts: Adopted ctx.py-related changes in other result scripts
- Adopted higher-level collect data structures:

  - high-level DwarfEntry/DwarfInfo class
  - high-level SymInfo class
  - high-level LineInfo class

  Note these had to be moved out of function scope due to pickling
  issues in perf.py/perfbd.py. These were only function-local to
  minimize scope leak so this fortunately was an easy change.

- Adopted better list-default patterns in Result types:

    def __new__(..., children=None):
        return Result(..., children if children is not None else [])

  A classic python footgun.

- Adopted notes rendering, though this is only used by ctx.py at the
  moment.

- Reverted to sorting children entries, for now.

  Unfortunately there's no easy way to sort the result entries in
  perf.py/perfbd.py before folding. Folding is going to make a mess
  of more complicated children anyways, so another solution is
  needed...

And some other shared miscellany.
2024-12-15 15:41:11 -06:00
Christopher Haster
55d01f69f9 scripts: Adopted ctx.py-related changes in structs.py
- Dropped --internal flag, structs.py includes all structs now.

  No reason to limit structs.py to public structs if ctx.py exists.

- Added struct/union/enum prefixes to results (enums were missing in
  ctx.py).

- Only sort children layers if explicitly requested. This should
  preserve field order, which is nice.

- Adopt more advanced FileInfo/DwarfInfo classes.

- Adopted table renderer changes (notes rendering).
2024-12-15 15:10:49 -06:00
Christopher Haster
c8a4ee91a6 scripts: ctx.py: Only sort children layers if explicitly requested
- Sorting struct fields by name? Eh, that's not a big deal.
- Sorting function params by name? Okay, that's really annoying.

This compromises by sorting only the top-level results by name, and
leaving recursive results in the order returned by collect by default.
Recursive results should usually have a well-defined order.

This should be extendable to the other result scripts as well.
2024-12-15 15:04:11 -06:00
Christopher Haster
3a0a58369a scripts: ctx.py: Added struct/union namespace prefix to results
This is a bit more readable and better matches the names used in the C
code (lfs_config vs struct lfs_config).

The downside is we now have fields with spaces in them, which may cause
problems for naive parsers.
2024-12-15 14:56:53 -06:00
Christopher Haster
2df97cd858 scripts: Added ctx.py for finding function contexts
ctx.py reports functions' "contexts", i.e. the sum of the size of all
function parameters and indirect structs, recursively dereferencing
pointers when possible.

The idea is this should give us a rough lower bound on the amount of
state that needs to be allocated to call the function:

  $ ./scripts/ctx.py lfs.o lfs_util.o -Dfunction=lfsr_file_write -z3 -s
  function                size
  lfsr_file_write          596
  |-> lfs                  436
  |   '-> lfs_t            432
  |-> file                 152
  |   '-> lfsr_file_t      148
  |-> buffer                 4
  '-> size                   4
  TOTAL                    596

---

The long story short is that structs.py, while very useful for
introspection, has not been useful as a general metric.

Sure it can give you a rough idea of the impact of small changes to
struct sizes, but it's not uncommon for larger changes to add/remove
structs that have no real impact on the user facing RAM usage. There are
some structs we care about (lfs_t) and some we don't (lfsr_data_t).
Internal-only structs should already be measured by stack.py.

Which raises the question, how do we know which structs we care about?

The idea here is to look at function parameters and chase pointers. This
gives a complicated, but I think reasonable, heuristic. Fortunately
dwarf-info gives us all the necessary info.

Some notes:

- This does _not_ include buffer sizes. Buffer sizes are user
  configurable, so it's sort of up to the user to account for these.

- We include structs once if we find a cycle (lfsr_file_t.o for
  example). Can't really do any better and this at least provides a
  lower bound for complex data-structures.

- We sum all params/fields, but find the max of all functions. Note this
  prevents common types (lfs_t for example) from being counted more than
  once.

- We only include global functions (based on the symbol flag). In theory
  the context of all internal functions should end up in stack.py.

  This can be overridden with --everything.

Note this doesn't replace structs.py. structs.py is still useful for
looking at all structs in the system. ctx.py should just be more useful
for comparing builds at a high level.
2024-12-15 13:24:31 -06:00