Commit Graph

42 Commits

Author SHA1 Message Date
Christopher Haster
ac30a20d12 scripts: Reworked to support optional json input/output
Guh

This may have been more work than I expected. The goal was to allowing
passing recursive results (callgraph info, structs, etc) between
scripts, which is simply not possible with csv files.

Unfortunately, this raised a number of questions: What happens if a
script receives recursive results? -d/--diff with recursive results?
How to prevent folding of ordered results (structs, hot, etc) in piped
scripts? etc.

And ended up with a significant rewrite of most of the result scripts'
internals.

Key changes:

- Most result scripts now support -O/--output-json in addition to
  -o/--json, with -O/--output-json including any recursive results in
  the "children" field.

- Most result scripts now support both csv and json as input to relevant
  flags: -u/--use, -d/--diff, -p/--percent. This is accomplished by
  looking for a '[' as the first character to decide if an input file is
  json or csv.

  Technically this breaks if your json has leading whitespace, but why
  would you ever keep whitespace around in json? The human-editability
  of json was already ruined the moment comments were disallowed.

- csv.py requires all fields to be explicitly defined, so added
  -i/--enumerate, -Z/--children, and -N/--notes. At least we can provide
  some reasonable defaults so you shouldn't usually need to type out the
  whole field.

- Notably, the rendering scripts (plot.py, treemapd3.py, etc) and
  test/bench scripts do _not_ support json. csv.py can always convert
  to/from json when needed.

- The table renderer now supports diffing recursive results, which is
  nice for seeing how the hot path changed in stack.py/perf.py/etc.

- Moved the -r/--hot logic up into main, so it also affects the
  outputted results. Note it is impossible for -z/--depth to _not_
  affect the outputted results.

- We now sort in one pass, which is in theory more efficient.

- Renamed -t/--hot -> -r/--hot and -R/--reverse-hot, matching -s/-S.

- Fixed an issue with -S/--reverse-sort where only the short form was
  actually reversed (I misunderstood what argparse passes to Action
  classes).

- csv.py now supports json input/output, which is funny.
2025-03-12 19:09:43 -05:00
Christopher Haster
dcbc195b41 scripts: csv.py: Replaced -b/--by exprs with % modifiers
In addition to providing more functionality for creating -b/--by fields,
this lets us remove strings from the expr parser. Strings had no
well-defined operations and could best be described as an "ugly wart".

Maybe we'll reintroduce string exprs in the future, but for now csv.py's
-f/--field fields will be limited to numeric values.

As an extra plus, no more excessive quoting when injecting new -b/--by
fields.

---

This also fixed sorting on non-field fields, which was apparently
broken. Or at least mostly useless since it was defaulting to string
sorting.
2025-03-11 18:48:27 -05:00
Christopher Haster
0adec7f15c scripts: Replaced __builtins__ with builtins
Apparently __builtins__ is a CPython implementation detail, and behaves
differently when executed vs imported???

import builtins is the correct way to go about this.
2025-01-28 14:41:45 -06:00
Christopher Haster
62cc4dbb14 scripts: Disabled local import hack on import
Moved local import hack behind if __name__ == "__main__"

These scripts aren't really intended to be used as python libraries.
Still, it's useful to import them for debugging and to get access to
their juicy internals.
2025-01-28 14:41:30 -06:00
Christopher Haster
1d8d0785fc scripts: More flags to control table renderer, -Q/--small-table, etc
Instead of trying to be too clever, this just adds a bunch of small
flags to control parts of table rendering:

- --no-header - Don't show the header.
- --small-header - Don't show by field names.
- --no-total - Don't show the total.
- -Q/--small-table - Equivalent to --small-header + --no-total.

Note that -Q/--small-table replaces the previous -Y/--summary +
-c/--compare hack, while also allowing a similar table style for
non-compare results.
2024-12-18 14:03:35 -06:00
Christopher Haster
a3ac512cc1 scripts: Adopted Parser class in prettyasserts.py
This ended up being a pretty in-depth rework of prettyasserts.py to
adopt the shared Parser class. But now prettyasserts.py should be both
more robust and faster.

The tricky parts:

- The Parser class eagerly munches whitespace by default. This is
  usually a good thing, but for prettyasserts.py we need to keep track
  of the whitespace somehow in order to write it to the output file.

  The solution here is a little bit hacky. Instead of complicating the
  Parser class, we implicitly add a regex group for whitespace when
  compiling our lexer.

  Unfortunately this does make last-minute patching of the lexer a bit
  messy (for things like -p/--prefix, etc), thanks to Python's
  re.Pattern class not being extendable. To work around this, the Lexer
  class keeps track of the original patterns to allow recompilation.

- Since we no longer tokenize in a separate pass, we can't use the
  None token to match any unmatched tokens.

  Fortunately this can be worked around with sufficiently ugly regex.
  See the 'STUFF' rule.

  It's a good thing Python has negative lookaheads.

  On the flip side, this means we no longer need to explicitly specify
  all possible tokens when multiple tokens overlap.

- Unlike stack.py/csv.py, prettyasserts.py needs multi-token lookahead.

  Fortunately this has a pretty straightforward solution with the
  addition of an optional stack to the Parser class.

  We can even have a bit of fun with Python's with statements (though I
  do wish with statements could have else clauses, so we wouldn't need
  double nesting to catch parser exceptions).

---

In addition to adopting the new Parser class, I also made sure to
eliminate intermediate string allocation through heavy use of Python's
io.StringIO class.

This, plus Parser's cheap shallow chomp/slice operations, gives
prettyasserts.py a much needed speed boost.

(Honestly, the original prettyasserts.py was pretty naive, with the
assumption that it wouldn't be the bottleneck during compilation. This
turned out to be wrong.)

These changes cut total compile time in ~half:

                                          real      user      sys
  before (time make test-runner -j): 0m56.202s 2m31.853s 0m2.827s
  after  (time make test-runner -j): 0m26.836s 1m51.213s 0m2.338s

Keep in mind this includes both prettyasserts.py and gcc -Os (and other
Makefile stuff).
2024-12-17 15:34:44 -06:00
Christopher Haster
dad3367e9e scripts: Adopted Parser in csv.py
It's a bit funny, the motivation for a new Parser class came from the
success of simple regex + space munching in csv.py, but adopting Parser
in csv.py makes sense for a couple reasons:

- Consistency and better code sharing with other scripts that need to
  parse things (stack.py, prettyasserts.py?).

- Should be more efficient, since we avoid copying the entire string
  every time we chomp/slice.

  Though I don't think this really matters for the size of csv.py's
  exprs...

- No need to write every regex twice! Since Parser remembers the last
  match.
2024-12-16 19:27:31 -06:00
Christopher Haster
6a6ed0f741 scripts: Dropped cycle detection from table renderer
Now that cycle detection is always done at result collection time, we
don't need this in the table renderer itself.

This had a tendency to cause problems for non-function scripts (ctx.py,
structs.py).
2024-12-16 19:26:21 -06:00
Christopher Haster
dd389f23ee scripts: Switched to sorted sets for result notes
God, I wish Python had an OrderedSet.

This is a fix for duplicate "cycle detected" notes when using -t/--hot.
This mix of merging both _hot_notes and _notes in the HotResult class is
tricky when the underlying container is a list.

The order is unlikely to be guaranteed anyways, when different results
with different notes are folded.

And if we ever want more control over the order of notes in result
scripts we can always change this back later.
2024-12-16 19:22:14 -06:00
Christopher Haster
3e03c2ee7f scripts: Adopted better input file handling in result scripts
- Error on no/insufficient files.

  Instead of just returning no results. This is more useful when
  debugging complicated bash scripts.

- Use elf magic to allow any file order in perfbd.py/stack.py.

  This was already implemented in stack.py, now also adopted in
  perfbd.py.

  Elf files always start with the magic string "\x7fELF", so we can use
  this to figure out the types of input files without needing to rely on
  argument order.

  This is just one less thing to worry about when invoking these
  scripts.
2024-12-16 19:13:22 -06:00
Christopher Haster
ac79c88c6f scripts: Improved cycle detection notes in scripts
- Prevented childrenof memoization from hiding the source of a
  detected cycle.

- Deduplicated multiple cycle detected notes.

- Fixed note rendering when last column does not have a notes list.
  Currently this only happens when entry is None (no results).
2024-12-16 18:01:46 -06:00
Christopher Haster
faf4d09c34 scripts: Added __repr__ to RInt and friends
Just a minor quality of life feature to help debugging these scripts.
2024-12-16 18:01:46 -06:00
Christopher Haster
8526cd9cf1 scripts: Prevented i/children/notes result field collisions
Without this, naming a column i/children/notes in csv.py could cause
things to break. Unlikely for children/notes, but very likely for i,
especially when benchmarking.

Unfortunately namedtuple makes this tricky. I _want_ to just rename
these to _i/_children/_notes and call the problem solved, but namedtuple
reserves all underscore-prefixed fields for its own use.

As a workaround, the table renderer now looks for _i/_children/_notes at
the _class_ level, as an optional name of which namedtuple field to use.
This way Result types can stay lightweight namedtuples while including
extra table rendering info without risk of conflicts.

This also makes the HotResult type a bit more funky, but that's not a
big deal.
2024-12-15 16:36:14 -06:00
Christopher Haster
183ede1b83 scripts: Option for result scripts to force children ordering
This extends the recursive part of the table renderer to sort children
by the optional "i" field, if available.

Note this only affects children entries. The top-level entries are
strictly ordered by the relevant "by" fields. I just haven't seen a use
case for this yet, and not sorting "i" at the top-level reduces that
number of things that can go wrong for scripts without children.

---

This also rewrites -t/--hot to take advantage of children ordering by
injecting a totally-no-hacky HotResult subclass.

Now -t/--hot should be strictly ordered by the call depth! Though note
entries that share "by" fields are still merged...

This also gives us a way to introduce the "cycle detected" note and
respect -z/--depth, so overall a big improvement for -t/--hot.
2024-12-15 16:35:52 -06:00
Christopher Haster
e6ed785a27 scripts: Removed padding from tail notes in tables
We don't really need padding for the notes on the last column of tables,
which is where row-level notes end up.

This may seem minor, but not padding here avoids quite a bit of
unnecessary line wrapping in small terminals.
2024-12-15 16:35:29 -06:00
Christopher Haster
512cf5ad4b scripts: Adopted ctx.py-related changes in other result scripts
- Adopted higher-level collect data structures:

  - high-level DwarfEntry/DwarfInfo class
  - high-level SymInfo class
  - high-level LineInfo class

  Note these had to be moved out of function scope due to pickling
  issues in perf.py/perfbd.py. These were only function-local to
  minimize scope leak so this fortunately was an easy change.

- Adopted better list-default patterns in Result types:

    def __new__(..., children=None):
        return Result(..., children if children is not None else [])

  A classic python footgun.

- Adopted notes rendering, though this is only used by ctx.py at the
  moment.

- Reverted to sorting children entries, for now.

  Unfortunately there's no easy way to sort the result entries in
  perf.py/perfbd.py before folding. Folding is going to make a mess
  of more complicated children anyways, so another solution is
  needed...

And some other shared miscellany.
2024-12-15 15:41:11 -06:00
Christopher Haster
b4c79c53d2 scripts: csv.py: Fixed NoneType issues with default sort
$ ./scripts/csv.py lfs.code.csv -bfunction -fsize -S
  ... blablabla ...
  TypeError: cannot unpack non-iterable NoneType object

The issue was argparse's const defaults bypassing the type callback, so
the sort field ends up with None when it expects a tuple (well
technically a tuple tuple).

This is only an issue for csv.py because csv.py's sort fields can
contain exprs.
2024-12-15 15:39:04 -06:00
Christopher Haster
e00db216c1 scripts: Consistent table renderer, cycle detection optional
The fact that our scripts' table renderer was slightly different for
recursive scripts (stack.py, perf.py) and non-recursive scripts
(code.py, structs.py) was a ticking time bomb, one innocent edit away
from breaking half the scripts.

The makes the table renderer consistent across all scripts, allowing for
easy copy-pasting when editing at the cost of some unused code in
scripts.

One hiccup with this though is the difference in cycle detection
behavior between scripts:

- stack.py:

    lfsr_bd_sync
    '-> lfsr_bd_prog
        '-> lfsr_bd_sync  <-- cycle!

- structs.py:

    lfsr_bshrub_t
    '-> u
        '-> bsprout
            '-> u  <-- not a cycle!

To solve this the table renderer now accepts a simple detect_cycles
flag, which can be set per-script.
2024-12-14 12:25:15 -06:00
Christopher Haster
ef3accc07c scripts: Tweaked -p/--percent to accept the csv file for diffing
This makes the -p/--percent flag a bit more consistent with -d/--diff
and -c/--compare, both of which change the printing strategy based on
additional context.
2024-11-16 18:01:27 -06:00
Christopher Haster
9a2b561a76 scripts: Adopted -c/--compare in make summary-diff
This showcases the sort of high-level result printing where -c/--compare
is useful:

  $ make summary-diff
              code             data           stack          structs
  BEFORE     57057                0            3056             1476
  AFTER      68864 (+20.7%)       0 (+0.0%)    3744 (+22.5%)    1520 (+3.0%)

There was one hiccup though: how to hide the name of the first field.

It may seem minor, but the missing field name really does help
readability when you're staring at a wall of CLI output.

It's a bit of a hack, but this can now be controlled with -Y/--summary,
which has the sole purpose of disabling the first field name if mixed
with -c/--compare.

-c/--compare is already a weird case for the summary row anyways...
2024-11-16 18:01:15 -06:00
Christopher Haster
29eff6f3e8 scripts: Added -c/--compare for comparing specific result rows
Example:

  $ ./scripts/csv.py lfs.code.csv \
          -bfunction -fsize \
          -clfsr_rbyd_appendrattr
  function                                size
  lfsr_rbyd_appendrattr                   3598
  lfsr_mdir_commit                        5176 (+43.9%)
  lfsr_btree_commit__.constprop.0         3955 (+9.9%)
  lfsr_file_flush_                        2729 (-24.2%)
  lfsr_file_carve                         2503 (-30.4%)
  lfsr_mountinited                        2357 (-34.5%)
  ... snip ...

I don't think this is immediately useful for our code/stack/etc
measurement scripts, but it's certainly useful in csv.py for comparing
results at a high level.

And by useful I mean it replaces a 40-line long awk script that has
outgrown its original purpose...
2024-11-16 17:59:22 -06:00
Christopher Haster
14687a20bf scripts: csv.py: Implicitly convert during string concatenation
This may be a (very javascript-esque) mistake, but implicit conversion
to strings is useful when mixing fields and strings in -b/--by field
exprs:

  $ ./scripts/csv.py input.csv -bcase='"test"+n' -fn

Note that this now (mostly) matches the behavior when the n field is
unspecified:

  $ ./scripts/csv.py input.csv -bcase='"test"+n'

Er... well... mostly. When we specify n as a field, csv.py does
typecheck and parse the field, which ends up sort of canonicalizing the
field, unlike omitting n which leaves n as a string... But at least if
the field was already canonicalized the behavior matches...

It may also be better to force all -b/--by expr inputs to strings first,
but this would require us to know which expr came from where. It also
wouldn't solve the canonicalization problem.
2024-11-16 17:39:39 -06:00
Christopher Haster
47f28946f6 scripts: csv.py: Enforced matching types in ternary branches
So in:

  $ ./scripts/csv.py input.csv -fa='b?c:d'

c and d must have matching types or else an error is raised.

This requires an explicit definition for the ternary operator since it's
a special case in that the type of b does not matter.

Compare to a 3-arg max call:

  $ ./scripts/csv.py input.csv -fa='int(b)?float(c):float(d)'      # ok
  $ ./scripts/csv.py input.csv -fa='max(int(b),float(c),float(d))' # error
2024-11-16 17:30:37 -06:00
Christopher Haster
9e7e79390a scripts: csv.py: Extended -s/-S to support exprs and hidden fields
The main benefit of this is allowing the sort order to be controlled by
fields that don't necessarily need to be printed:

  ./scripts/csv.py input.csv -ba -sb -fc

By default this sorts lexicographically, but this can be changed by
providing an expression:

  ./scripts/csv.py input.csv -ba -sb='int(b)' -fc

Note that sort fields do _not_ change inferred by fields, this allows
sort flags to be added to existing queries without changing the results
too much:

  ./scripts/csv.py input.csv -fc
  ./scripts/csv.py input.csv -sb -fc
2024-11-16 17:30:13 -06:00
Christopher Haster
8911d44073 scripts: csv.py: Fixed field defines hiding field renames
The issue here is quite nuanced, but becomes a problem when you want to
both:

1. Filter results by a given field: -Dmeas=write
2. Output a new value for that field: -bmeas='"write+amor"'

If you didn't guess from the example, this comes up often in scripts
dealing with bench results, where we often find ourselves wanting to
append/merge modified results based on the raw measurements.

Fortunately the fix is relatively easy: We already filter by defines
in our collect function, so we don't really need to filter by defines
again when folding.

Folding occurs after expr evaluation, but collect occurs before, so this
limits filtering to the input fields _before_ expr evaluation.

This does mean we no longer filter on the output of exprs, but I don't
know if such behavior was ever intentionally desired. Worst case it can
be emulated by stacking multiple csv.py calls, which may be annoying,
but is at least well-intentioned and well-defined.

---

Note that the other result scripts, code.py, stack.py, etc, are a bit
different in that they rely on fold-time filtering for filtering
generated results. This may deserve a refactor at some point, but since
these scripts don't also evaluate exprs, it's not an immediate problem.
2024-11-16 17:25:21 -06:00
Christopher Haster
2fa968dd3f scripts: csv.py: Fixed divide-by-zero, return +-inf
This may make some mathematician mad, but these are informative scripts.
Returning +-inf is much more useful than erroring when dealing with
several hundred rows of results.

And hey, if it's good enough for IEEE 754, it's good enough for us :)

Also fixed a division operator mismatch in RFrac that was causing
problems.
2024-11-16 16:47:48 -06:00
Christopher Haster
5dc9eabbf7 scripts: csv.py: Fixed use of __div__ vs __truediv__
Not sure if this is an old habit from Python 2, or just because it looks
nicer next to __mul__, __mod__, etc, but in Python 3 this should be
__truediv__ (or __floordiv__), not __div__.
2024-11-16 16:38:36 -06:00
Christopher Haster
6714e2869f scripts: csv.py: Made RFloats independent from RInts
The only reason RFloats reused RInt's operator definitions was to save a
few keystrokes. But this dependency is unnecessary and will get in the
way if we ever add a script that only uses RFloats.
2024-11-16 16:10:59 -06:00
Christopher Haster
298441ae74 scripts: csv.py: Added help text over available field exprs
So now the available field exprs can be queried with --help-exprs:

  $ ./scripts/csv.py --help-exprs
  uops:
    +a                    Non-negation
    -a                    Negation
    !a                    1 if a is zero, otherwise 0
  bops:
    a * b                 Multiplication
    a / b                 Division
  ... snip ...

I was a bit torn on if this should be named --help-exprs or --list-exprs
to match test.py/bench.py, but decided on --help-exprs since it's
querying something "inside" the script, whereas test.py/bench.py's
--list-cases is querying something "outside" the script.

Internally this uses Python's docstrings, which is a nice language
feature to lean on.
2024-11-16 15:59:01 -06:00
Christopher Haster
690251c130 scripts: csv.py: Added float mod support
Mainly for consistency with int operators, though it's unclear if either
mod is useful in the context of csv.py and related scripts.

This may be worth reverting at some point.
2024-11-16 15:54:45 -06:00
Christopher Haster
effc959ea9 scripts: csv.py: Improved default typechecking in RExpr
Now, by default, an error is raised if any branch of an expr has an
inconsistent type.

This isn't always what we want. The ternary operator, for example,
doesn't really care if the condition's type doesn't match the branch
arms. But it's a good default, and special cases can always override the
type function with their own explicit typechecking.
2024-11-16 15:45:14 -06:00
Christopher Haster
f31f3fdd68 scripts: csv.py: Fixed missing fields going undetected
There's a bit of a push and pull when it comes to typechecking CSV
fields in our scripts. On one hand, we want the flexibility to accepts
scripts with various mismatched fields, on the other hand, we _really_
want to know if a typo caused a field to be quietly replaced with all
zeros...

I _think_ it's safe to say: if no fields across _all_ input files match
a requested field, we should error.

But I may end up wrong about this. Worst case we can always revert in
the future, maybe with an explicit flag to ignore missing fields.
2024-11-16 14:16:20 -06:00
Christopher Haster
103b251ad8 scripts: csv.py: Various tweaks/cleanup
- Updated the example in the header comment.

  The previous example was way old, from back when fields were separated
  by commas! Introduced in 20ec0be87 in 2022 according to git blame.

- Renamed a couple internal RExpr classes:

  - Not -> NotNot
  - And -> AndAnd
  - Or  -> OrOr
  - Ife -> IfElse

  This is mainly to leave room for bitwise operators in case we every
  want to add them.

- Added isinf, isnan, isint, etc:

  - isint(a)
  - isfloat(a)
  - isfrac(a)
  - isinf(a)
  - isnan(a)

  In theory useful for conditional exprs based on the field's type.

- Accept +-nan as a float literal.

  Niche, but seems necessary for completeness. Unfortunately this does
  mean a field named nan (or inf) may cause problems...
2024-11-16 14:15:36 -06:00
Christopher Haster
0ac326d9cb scripts: Reduced table name widths to 8 chars minimum
I still think the 24 (23+1) char minimum is a good default for 2 column
output such as help text, especially if you don't have automatic width
detection. But our result scripts need to be a bit more flexible.

Consider:

  $ make summary
                              code     data    stack  structs
  TOTAL                      68864        0     3744     1520

Vs:

  $ make summary
              code     data    stack  structs
  TOTAL      68864        0     3744     1520

Up until now we were just kind of working around this with cut -c 25- in
our Makefile, but now that our result scripts automatically scale the
table widths, they should really just default to whatever is the most
useful.
2024-11-16 13:39:42 -06:00
Christopher Haster
acf34dce2e scripts: csv.py: Fixed lingering undefined renames in diff mode
This lingering reference to renames was missed when refactoring.
2024-11-16 13:22:40 -06:00
Christopher Haster
4e5d1c5e7d scripts: csv.py: Frac expr tweaks
- Allow single-arg frac:

  - frac(a)    => a/a
  - frac(a, b) => a/b

  This was already supported internally.

- Implicitly cast to frac in frac ops:

  - ratio(3) => ratio(3/3) => 1.0 (100%)
  - total(3) => total(3/3) => 3

  This makes a bit more sense than erroring.
2024-11-16 13:16:22 -06:00
Christopher Haster
d4c835ba89 scripts: csv.py: Fixed divide by zero in ratio
This now returns 1.0 if the total part of the fraction is 0.

There may be a better way to handle this, but the intention is for 0/0
to map to 100% for thing like code coverage (cov.py), test coverage
(test.py), etc.
2024-11-16 13:07:35 -06:00
Christopher Haster
cc25b39926 scripts: csv.py: Fixed by exprs (-ba=b) when results are missing fields
This easily happens when merging csv scripts with different results,
such as code.py and stack.py by function names.
2024-11-16 13:06:31 -06:00
Christopher Haster
1712a5bd99 scripts: csv.py: Filled out remaining ops, dropped bitwise ops, cleanup
So csv.py should now be mostly feature complete, aside from bugs.

I ended up dropping most of the bitwise operations for now. I can't
really see them being useful since csv.py and related scripts are
usually operating on purely numerical data. Worst case we can always add
them back in at some point.

I also considered dropping the logical/ternary operators, but even
though I don't see an immediate use case, the flexibility
logical/ternary operators add to a language is too much to pass on.

Another interesting thing to note is the extension of all fold functions
to operate on exprs if more than one argument is provided:

- max(1)       => 1, fold=max
- max(1, 2)    => 2, fold=sum
- max(1, 2, 3) => 3, fold=sum

To be honest, this is mainly just to allow a binary max/min function
without awkward naming conflicts.

Other than those changes this was pretty simple fill-out-the-definition
work.
2024-11-16 12:34:56 -06:00
Christopher Haster
ac0aa3633e scripts: csv.py: RExpr decorators to help simplify func/uop/bop parsing
This was more tricky than expected since Python's class scope is so
funky (I just eneded up with using lazy cached __get__ functions that
scan the RExpr class for tagged members), but these decorators help avoid
repeated boilerplate for common expr patterns.

We can even deduplicate binary expr parsing without sacrificing
precedence.
2024-11-16 12:33:41 -06:00
Christopher Haster
4061891a02 scripts: csv.py: Adopting full expr parser for field exprs
This is a work-in-progress, but the general idea is to replace the
existing rename mechanic in csv.py with a full expr parser:

  $ ./scripts/csv.py input.csv -ba=x -fb=y+z

I've been putting this off for a while, as it feels like too big a jump
in complexity for what was intended to be a simple script. But
complexity is a bit funny in programming. Even if a full parser is more
difficult to implement, if it's the right grammar for the job, the
resulting script should end up both easier to understand and easier to
extend.

The original intention was that any sufficiently complicated math could
be implemented in ad-hoc Python scripts that operate directly on the CSV
files, but CSV parsing in Python is annoying enough that this never
really worked well.

But I'm probably overselling the complexity. This is classic CS stuff:

  1. build a syntax tree
  2. map symbols to input fields
  3. typecheck, fold, eval, etc

One neat thing is that in addition to providing type and eval
information, our exprs can also provide information on how to "fold" the
field after eval. This kicks in when merging muliple rows when grouping
by -b/--by, and for finding the TOTAL results.

This can be used to merge stack results correctly with max:

  $ ./scripts/csv.py stack.csv \
          -fframe='sum(frame)' -flimit='max(limit)'

Or can be used to find other interesting measurements:

  $ ./scripts/csv.py stack.csv \
          -favg='avg(frame)' -fstddev='stddev(frame)'

These changes also make the eval order of input/output fields much
stricter which is probably a good thing.

This should replace all of the somewhat hacky fake-expr flags in csv.py:

- --int     => -fa='int(b)'
- --float   => -fa='float(b)'
- --frac    => -fa='frac(b)'
- --sum     => -fa='sum(b)'
- --prod    => -fa='prod(b)'
- --min     => -fa='min(b)'
- --max     => -fa='max(b)'
- --avg     => -fa='avg(b)'
- --stddev  => -fa='stddev(b)'
- --gmean   => -fa='gmean(b)'
- --gstddev => -fa='gstddev(b)'

If you squint you might be able to see a pattern.
2024-11-16 11:46:18 -06:00
Christopher Haster
7cfcc1af1d scripts: Renamed summary.py -> csv.py
This seems like a more fitting name now that this script has evolved
into more of a general purpose high-level CSV tool.

Unfortunately this does conflict with the standard csv module in Python,
breaking every script that imports csv (which is most of them).
Fortunately, Python is flexible enough to let us remove the current
directory before imports with a bit of an ugly hack:

  # prevent local imports
  __import__('sys').path.pop(0)

These scripts are intended to be standalone anyways, so this is probably
a good pattern to adopt.
2024-11-09 12:31:16 -06:00