littlefs

Author	SHA1	Message	Date
Christopher Haster	b2768becaa	scripts: Added -l/--labels to csv.py This gives csv.py access to a hidden feature in our table renderer used by some of the other scripts: fields that affect by-field grouping, but aren't actually printed. For example, this prevents summing same named functions in different files, but only shows the function name in the table render: $ ./scripts/csv.py lfs.code.csv -bfile -bfunction -lfunction function size lfs_alloc 398 lfs_alloc_discard 31 lfs_alloc_findfree 77 ... This is especially useful when enumerating results. For example, this prevents any summing without extra table noise: $ ./scripts/csv.py lfs.code.csv -i -bfunction -fsize -lfunction function size lfs_alloc 398 lfs_alloc_discard 31 lfs_alloc_findfree 77 ... I also tweaked -b/--by field defaults a bit to account to enumerate/label fields a bit better.	2025-03-12 19:10:17 -05:00
Christopher Haster	748815bb46	scripts: Disentangled -r/--hot and -i/--enumerate This removes most of the special behavior around how -r/--hot and -i/--enumerate interact. This does mean -r/--hot risks folding results if -i/--enumerate is not specified, but this is _technically_ a valid operation. For most of the recursive result scripts, I've replaced the "i" field with separate "z" and "i" fields for depth and field number, which I think is a bit more informative/useful. I've also added a default-hidden "off" field to structs.py/ctx.py, since we have that info available. I considered replacing "i" with this, but decided against it since non-zero offsets for union members would risk being confusing/mistake prone.	2025-03-12 19:10:17 -05:00
Christopher Haster	ac30a20d12	scripts: Reworked to support optional json input/output Guh This may have been more work than I expected. The goal was to allowing passing recursive results (callgraph info, structs, etc) between scripts, which is simply not possible with csv files. Unfortunately, this raised a number of questions: What happens if a script receives recursive results? -d/--diff with recursive results? How to prevent folding of ordered results (structs, hot, etc) in piped scripts? etc. And ended up with a significant rewrite of most of the result scripts' internals. Key changes: - Most result scripts now support -O/--output-json in addition to -o/--json, with -O/--output-json including any recursive results in the "children" field. - Most result scripts now support both csv and json as input to relevant flags: -u/--use, -d/--diff, -p/--percent. This is accomplished by looking for a '[' as the first character to decide if an input file is json or csv. Technically this breaks if your json has leading whitespace, but why would you ever keep whitespace around in json? The human-editability of json was already ruined the moment comments were disallowed. - csv.py requires all fields to be explicitly defined, so added -i/--enumerate, -Z/--children, and -N/--notes. At least we can provide some reasonable defaults so you shouldn't usually need to type out the whole field. - Notably, the rendering scripts (plot.py, treemapd3.py, etc) and test/bench scripts do _not_ support json. csv.py can always convert to/from json when needed. - The table renderer now supports diffing recursive results, which is nice for seeing how the hot path changed in stack.py/perf.py/etc. - Moved the -r/--hot logic up into main, so it also affects the outputted results. Note it is impossible for -z/--depth to _not_ affect the outputted results. - We now sort in one pass, which is in theory more efficient. - Renamed -t/--hot -> -r/--hot and -R/--reverse-hot, matching -s/-S. - Fixed an issue with -S/--reverse-sort where only the short form was actually reversed (I misunderstood what argparse passes to Action classes). - csv.py now supports json input/output, which is funny.	2025-03-12 19:09:43 -05:00
Christopher Haster	361cd3fec0	scripts: Added missing sys imports Unfortunately the import sys in the argparse block was hiding missing sys imports. The mistake was assuming the import sys in Python would limit the scope to that if block, but Python's late binding strikes again...	2025-01-28 14:41:45 -06:00
Christopher Haster	0adec7f15c	scripts: Replaced __builtins__ with builtins Apparently __builtins__ is a CPython implementation detail, and behaves differently when executed vs imported??? import builtins is the correct way to go about this.	2025-01-28 14:41:45 -06:00
Christopher Haster	62cc4dbb14	scripts: Disabled local import hack on import Moved local import hack behind if __name__ == "__main__" These scripts aren't really intended to be used as python libraries. Still, it's useful to import them for debugging and to get access to their juicy internals.	2025-01-28 14:41:30 -06:00
Christopher Haster	1d8d0785fc	scripts: More flags to control table renderer, -Q/--small-table, etc Instead of trying to be too clever, this just adds a bunch of small flags to control parts of table rendering: - --no-header - Don't show the header. - --small-header - Don't show by field names. - --no-total - Don't show the total. - -Q/--small-table - Equivalent to --small-header + --no-total. Note that -Q/--small-table replaces the previous -Y/--summary + -c/--compare hack, while also allowing a similar table style for non-compare results.	2024-12-18 14:03:35 -06:00
Christopher Haster	a3ac512cc1	scripts: Adopted Parser class in prettyasserts.py This ended up being a pretty in-depth rework of prettyasserts.py to adopt the shared Parser class. But now prettyasserts.py should be both more robust and faster. The tricky parts: - The Parser class eagerly munches whitespace by default. This is usually a good thing, but for prettyasserts.py we need to keep track of the whitespace somehow in order to write it to the output file. The solution here is a little bit hacky. Instead of complicating the Parser class, we implicitly add a regex group for whitespace when compiling our lexer. Unfortunately this does make last-minute patching of the lexer a bit messy (for things like -p/--prefix, etc), thanks to Python's re.Pattern class not being extendable. To work around this, the Lexer class keeps track of the original patterns to allow recompilation. - Since we no longer tokenize in a separate pass, we can't use the None token to match any unmatched tokens. Fortunately this can be worked around with sufficiently ugly regex. See the 'STUFF' rule. It's a good thing Python has negative lookaheads. On the flip side, this means we no longer need to explicitly specify all possible tokens when multiple tokens overlap. - Unlike stack.py/csv.py, prettyasserts.py needs multi-token lookahead. Fortunately this has a pretty straightforward solution with the addition of an optional stack to the Parser class. We can even have a bit of fun with Python's with statements (though I do wish with statements could have else clauses, so we wouldn't need double nesting to catch parser exceptions). --- In addition to adopting the new Parser class, I also made sure to eliminate intermediate string allocation through heavy use of Python's io.StringIO class. This, plus Parser's cheap shallow chomp/slice operations, gives prettyasserts.py a much needed speed boost. (Honestly, the original prettyasserts.py was pretty naive, with the assumption that it wouldn't be the bottleneck during compilation. This turned out to be wrong.) These changes cut total compile time in ~half: real user sys before (time make test-runner -j): 0m56.202s 2m31.853s 0m2.827s after (time make test-runner -j): 0m26.836s 1m51.213s 0m2.338s Keep in mind this includes both prettyasserts.py and gcc -Os (and other Makefile stuff).	2024-12-17 15:34:44 -06:00
Christopher Haster	4c87d59c7b	scripts: Simplified result->file mapping, dropped collect_dwarf_files This reverts per-result source file mapping, and tears out of a bunch of messy dwarf parsing code. Results from the same .o file are now mapped to the same source file. This was just way too much complexity for slightly better result->file mapping, which risked losing results accidentally mapped to the wrong file. --- I was originally going to revert all the way back to relying strictly on the .o name and --build-dir (`490e1c4`) (this is the simplest solution), but after poking around in dwarf-info a bit, I realized we do have access to the original source file in DW_TAG_compile_unit's DW_AT_comp_dir + DW_AT_name. This is much simpler/more robust than parsing objdump --dwarf=rawline, and avoid needing --build-dir in a bunch of scripts. --- This also reverts stack.py to rely only on the .ci files. These seem as reliable as DW_TAG_compile_unit while simplifying things significantly. Symbol mapping used to be a problem, but this was fixed by using the symbol in the title field instead of the label field (which strips some optimization suffixes?)	2024-12-17 15:34:39 -06:00
Christopher Haster	dad3367e9e	scripts: Adopted Parser in csv.py It's a bit funny, the motivation for a new Parser class came from the success of simple regex + space munching in csv.py, but adopting Parser in csv.py makes sense for a couple reasons: - Consistency and better code sharing with other scripts that need to parse things (stack.py, prettyasserts.py?). - Should be more efficient, since we avoid copying the entire string every time we chomp/slice. Though I don't think this really matters for the size of csv.py's exprs... - No need to write every regex twice! Since Parser remembers the last match.	2024-12-16 19:27:31 -06:00
Christopher Haster	5d777f84ad	scripts: Use depth to limit recursive result collection If we're not using these results, no reason to collect all of the children. Note that we still need to recurse for other measurements (limit, struct size, etc). This has a measurable, but small, impact on runtime: stack.py -z0 -Y: 0.202s stack.py -z1 -Y: 0.162s (~-19.8%) ctx.py -z0 -Y: 0.112s ctx.py -z1 -Y: 0.098s (~-12.5%)	2024-12-16 19:27:12 -06:00
Christopher Haster	6a6ed0f741	scripts: Dropped cycle detection from table renderer Now that cycle detection is always done at result collection time, we don't need this in the table renderer itself. This had a tendency to cause problems for non-function scripts (ctx.py, structs.py).	2024-12-16 19:26:21 -06:00
Christopher Haster	dd389f23ee	scripts: Switched to sorted sets for result notes God, I wish Python had an OrderedSet. This is a fix for duplicate "cycle detected" notes when using -t/--hot. This mix of merging both _hot_notes and _notes in the HotResult class is tricky when the underlying container is a list. The order is unlikely to be guaranteed anyways, when different results with different notes are folded. And if we ever want more control over the order of notes in result scripts we can always change this back later.	2024-12-16 19:22:14 -06:00
Christopher Haster	3e03c2ee7f	scripts: Adopted better input file handling in result scripts - Error on no/insufficient files. Instead of just returning no results. This is more useful when debugging complicated bash scripts. - Use elf magic to allow any file order in perfbd.py/stack.py. This was already implemented in stack.py, now also adopted in perfbd.py. Elf files always start with the magic string "\x7fELF", so we can use this to figure out the types of input files without needing to rely on argument order. This is just one less thing to worry about when invoking these scripts.	2024-12-16 19:13:22 -06:00
Christopher Haster	4325a06277	scripts: Fixed incorrect files on recursive results It's been a while since I've been hurt by Python's late-binding variables. In this case the scope-creep of the "file" variable hid that we didn't actually know which recursive result belonged to which file. Instead we were just assigning whatever the most recent top-level result was. This is fixed by looking up the correct file in childrenof. Though this unfortunately does add quite a bit of noise.	2024-12-16 19:12:46 -06:00
Christopher Haster	c8c12ffae8	scripts: Reverted stack.py to use -fcallgraph-info=su again See previous commit for the issues with stack.py's current approach. I'm convinced dwarf-info simply does not contain enough info to figure out stack usage. There is one last idea, which is to parse the dissassembly. In theory you only need to understand calls, branches (for control-flow), and push/pop instructions to figure out the worst-case stack usage. But this would be ISA-specific and error-prone, so it probably shouldn't _replace_ the -fcallgraph-info=su based stack.py. So, out of ideas, reverting. --- It's worth noting this isn't a trivial revert. There's a couple interesting changes in stack.py: - We now use .o files to map callgraph nodes to relevant symbol names. This should be a bit more robust than relying only on the names in the .ci files, and guarantees function names line up with other symbol-based scripts (code.py, ctx.py, etc). This also lets us warn on missing callgraph nodes, in case the callgraph info is incomplete. - Callgraph parsing should be quite a bit more robust now. Added a small (and reusable?) Parser class. - Moved cycle detection into result collection. This should let us drop cycle detection from the table renderer eventually.	2024-12-16 18:10:23 -06:00
Christopher Haster	0e658b8246	scripts: Attempted reimp of stack.py using dwarf variable tags Problem: I misunderstood the purpose of .debug_frames (objdump --dwarf=frames). The purpose of .debug_frames is not to record the size of function stack frames, but to only tell a debugger how to access the previous function's stack frame. It just so happens that this _coincidentally_ tells you the stack frame size when compiling with -fomit-frame-pointer. With -fno-omit-frame-pointer (common on some archs), .debug_frames just says "hey here's the frame pointer" (DW_CFA_def_cfa_register), which tells us nothing about the function's actual stack usage. So unfortunately .debug_frames does not provide enough info on its own... --- This commit was an attempt to find the actual stack usage by looking at the relevant variable info (DW_TAG_variable, etc) in function's dwarf info, but this approach is also not looking very good... 1. The numbers do not appear correct: before -fcallgraph-info=su: 2720 after --dwarf=info: 3558 after --dwarf=info --no-shrinkwrap: 3922 (this is with -fno-omit-frame-pointer) In hindsight, this approach is fundamentally flawed. While the variable tags does give us a lower bound on stack usage, it doesn't tell us about implicit compiler variables and various stack push/pops as a part of expression evaluation. As far as I can tell there's simply not enough info in dwarf info to find an accurate upper bound on stack usage. 2. This approach is quite a bit more complicated, since we need: 1. Dwarf info (--dwarf=info) to find variable tags. 2. Location info (--dwarf=loc) to map var allocations to address ranges. 3. Range info (--dwarf=Ranges) to map lexical blocks to address ranges when var allocation is implicit (not implemented). 4. And we still need frame info (--dwarf=frames)! since var allocations are frame-relative. 3. Also dwarf info is not guaranteed to contain the whole callgraph. It seems callgraph info is actually _omitted_ with -O0?? I guess this is because the callgraph info is a side-effect of some compiler pass? This seems a bit backwards. Dwarf does have a flag (DW_AT_call_all_calls) to indicate when callgraph info is complete, but it doesn't seem to be set reliably? Even with optimizations, lfsr_bd_sync, _and only lfsr_bd_sync_, is missing the DW_AT_call_all_calls flag. I have no idea why. The flag is still present in lfsr_bd_erase, lfsr_bd_read, and other functions with function pointers... So I think this will probably be reverted.	2024-12-16 18:01:47 -06:00
Christopher Haster	14e7501e5c	scripts: Minor stack.py fixes - Always interpret DW_AT_low_pc/high_pc/call_return_pc as hex. Clang populates these with hex digits without a 0x prefix. - Don't ignore callees with no name. Now that DW_AT_abstract_origin is fixed, a callee with no name should be an error.	2024-12-16 18:01:46 -06:00
Christopher Haster	ac79c88c6f	scripts: Improved cycle detection notes in scripts - Prevented childrenof memoization from hiding the source of a detected cycle. - Deduplicated multiple cycle detected notes. - Fixed note rendering when last column does not have a notes list. Currently this only happens when entry is None (no results).	2024-12-16 18:01:46 -06:00
Christopher Haster	56d888933f	scripts: Reworked stack.py to use dwarf, dropped -fcallgraph-info=su There were a lot of small challenges (see previous commits), but this commit reworks stack.py to rely only on dwarf-info and symbols to build stack + callgraph info. Not only does this remove an annoying dependency on a GCC-specific flag, but it also should give us more correct stack measurements by only penalizing calls for the stack usage at the call site. This should better account for things like shrinkwrapping, which make the -fcallgraph-info=su results look worse than they actually are. To make this work required jumping through a couple hoops: 1. Map symbols -> dwarf entries by address (DW_AT_low_pc). We use symbols here to make sure function names line up with other scripts. Note that there can be multiple dwarf entries with the same name due to optimization passes. Apparently the optimized name is not included because that would be too useful. 2. Find each functions' frame info. This is stored in the .debug_frames section (objdump --dwarf=frames), and requires _yet another state machine_ to parse, but gives us the stack frame info for each function at the instruction level, so that's nice. 3. Find call sites (DW_TAG_call_site). The hierchical nesting of DW_TAG_lexical_blocks gets a bit annoying here, but ultimately we can find all DW_TAG_call_sites by looking at the DW_TAG_subprogram's children tags. 4. Map call sites to frame info. This gets funky. Finding the target function is simple enough, DW_AT_call_origin contains its dwarf offset (but why is this the _origin_?). But we don't actually know what address the call originated from. Fortunately we do know the return address, DW_AT_call_return_pc? The instruction before DW_AT_call_return_pc should be the call instruction. Subtracting 1 will awkwardly put us in the middle of the instruction, but it should at least map to the correct stack frame? And without ISA-specific info it's the best we can do. It's messy, but this should be all the info we need. --- To build confidence in the new script, I included the --no-shrinkwrap flag, which reverts to penalizing each call site for the function's worst-case stack frame. This makes it easy to compare against the -fcallgraph-info=su approach: with -fcallgraph-info=su: 2624 with --dwarf=info --no-shrinkwrap: 2624 I was hoping that accounting for shrinkwrap-like optimizations would reveal a lower stack cost, but for better or worse it seems that worst-case stack usage is unchanged: with --dwarf=info --no-shrinkwrap: 2624 with --dwarf=info: 2624 Still, it's good to know that our stack measurement is correct.	2024-12-16 18:01:46 -06:00
Christopher Haster	faf4d09c34	scripts: Added __repr__ to RInt and friends Just a minor quality of life feature to help debugging these scripts.	2024-12-16 18:01:46 -06:00
Christopher Haster	8526cd9cf1	scripts: Prevented i/children/notes result field collisions Without this, naming a column i/children/notes in csv.py could cause things to break. Unlikely for children/notes, but very likely for i, especially when benchmarking. Unfortunately namedtuple makes this tricky. I _want_ to just rename these to _i/_children/_notes and call the problem solved, but namedtuple reserves all underscore-prefixed fields for its own use. As a workaround, the table renderer now looks for _i/_children/_notes at the _class_ level, as an optional name of which namedtuple field to use. This way Result types can stay lightweight namedtuples while including extra table rendering info without risk of conflicts. This also makes the HotResult type a bit more funky, but that's not a big deal.	2024-12-15 16:36:14 -06:00
Christopher Haster	183ede1b83	scripts: Option for result scripts to force children ordering This extends the recursive part of the table renderer to sort children by the optional "i" field, if available. Note this only affects children entries. The top-level entries are strictly ordered by the relevant "by" fields. I just haven't seen a use case for this yet, and not sorting "i" at the top-level reduces that number of things that can go wrong for scripts without children. --- This also rewrites -t/--hot to take advantage of children ordering by injecting a totally-no-hacky HotResult subclass. Now -t/--hot should be strictly ordered by the call depth! Though note entries that share "by" fields are still merged... This also gives us a way to introduce the "cycle detected" note and respect -z/--depth, so overall a big improvement for -t/--hot.	2024-12-15 16:35:52 -06:00
Christopher Haster	e6ed785a27	scripts: Removed padding from tail notes in tables We don't really need padding for the notes on the last column of tables, which is where row-level notes end up. This may seem minor, but not padding here avoids quite a bit of unnecessary line wrapping in small terminals.	2024-12-15 16:35:29 -06:00
Christopher Haster	512cf5ad4b	scripts: Adopted ctx.py-related changes in other result scripts - Adopted higher-level collect data structures: - high-level DwarfEntry/DwarfInfo class - high-level SymInfo class - high-level LineInfo class Note these had to be moved out of function scope due to pickling issues in perf.py/perfbd.py. These were only function-local to minimize scope leak so this fortunately was an easy change. - Adopted better list-default patterns in Result types: def __new__(..., children=None): return Result(..., children if children is not None else []) A classic python footgun. - Adopted notes rendering, though this is only used by ctx.py at the moment. - Reverted to sorting children entries, for now. Unfortunately there's no easy way to sort the result entries in perf.py/perfbd.py before folding. Folding is going to make a mess of more complicated children anyways, so another solution is needed... And some other shared miscellany.	2024-12-15 15:41:11 -06:00
Christopher Haster	e00db216c1	scripts: Consistent table renderer, cycle detection optional The fact that our scripts' table renderer was slightly different for recursive scripts (stack.py, perf.py) and non-recursive scripts (code.py, structs.py) was a ticking time bomb, one innocent edit away from breaking half the scripts. The makes the table renderer consistent across all scripts, allowing for easy copy-pasting when editing at the cost of some unused code in scripts. One hiccup with this though is the difference in cycle detection behavior between scripts: - stack.py: lfsr_bd_sync '-> lfsr_bd_prog '-> lfsr_bd_sync <-- cycle! - structs.py: lfsr_bshrub_t '-> u '-> bsprout '-> u <-- not a cycle! To solve this the table renderer now accepts a simple detect_cycles flag, which can be set per-script.	2024-12-14 12:25:15 -06:00
Christopher Haster	ef3accc07c	scripts: Tweaked -p/--percent to accept the csv file for diffing This makes the -p/--percent flag a bit more consistent with -d/--diff and -c/--compare, both of which change the printing strategy based on additional context.	2024-11-16 18:01:27 -06:00
Christopher Haster	9a2b561a76	scripts: Adopted -c/--compare in make summary-diff This showcases the sort of high-level result printing where -c/--compare is useful: $ make summary-diff code data stack structs BEFORE 57057 0 3056 1476 AFTER 68864 (+20.7%) 0 (+0.0%) 3744 (+22.5%) 1520 (+3.0%) There was one hiccup though: how to hide the name of the first field. It may seem minor, but the missing field name really does help readability when you're staring at a wall of CLI output. It's a bit of a hack, but this can now be controlled with -Y/--summary, which has the sole purpose of disabling the first field name if mixed with -c/--compare. -c/--compare is already a weird case for the summary row anyways...	2024-11-16 18:01:15 -06:00
Christopher Haster	29eff6f3e8	scripts: Added -c/--compare for comparing specific result rows Example: $ ./scripts/csv.py lfs.code.csv \ -bfunction -fsize \ -clfsr_rbyd_appendrattr function size lfsr_rbyd_appendrattr 3598 lfsr_mdir_commit 5176 (+43.9%) lfsr_btree_commit__.constprop.0 3955 (+9.9%) lfsr_file_flush_ 2729 (-24.2%) lfsr_file_carve 2503 (-30.4%) lfsr_mountinited 2357 (-34.5%) ... snip ... I don't think this is immediately useful for our code/stack/etc measurement scripts, but it's certainly useful in csv.py for comparing results at a high level. And by useful I mean it replaces a 40-line long awk script that has outgrown its original purpose...	2024-11-16 17:59:22 -06:00
Christopher Haster	2fa968dd3f	scripts: csv.py: Fixed divide-by-zero, return +-inf This may make some mathematician mad, but these are informative scripts. Returning +-inf is much more useful than erroring when dealing with several hundred rows of results. And hey, if it's good enough for IEEE 754, it's good enough for us :) Also fixed a division operator mismatch in RFrac that was causing problems.	2024-11-16 16:47:48 -06:00
Christopher Haster	5dc9eabbf7	scripts: csv.py: Fixed use of __div__ vs __truediv__ Not sure if this is an old habit from Python 2, or just because it looks nicer next to __mul__, __mod__, etc, but in Python 3 this should be __truediv__ (or __floordiv__), not __div__.	2024-11-16 16:38:36 -06:00
Christopher Haster	0ac326d9cb	scripts: Reduced table name widths to 8 chars minimum I still think the 24 (23+1) char minimum is a good default for 2 column output such as help text, especially if you don't have automatic width detection. But our result scripts need to be a bit more flexible. Consider: $ make summary code data stack structs TOTAL 68864 0 3744 1520 Vs: $ make summary code data stack structs TOTAL 68864 0 3744 1520 Up until now we were just kind of working around this with cut -c 25- in our Makefile, but now that our result scripts automatically scale the table widths, they should really just default to whatever is the most useful.	2024-11-16 13:39:42 -06:00
Christopher Haster	434479f101	scripts: Adopted csv.py-related result-type tweaks in all scripts - RInt/RFloat now accepts implicitly castable types (mainly RInt(RFloat(x)) and RFloat(RInt(x))). - RInt/RFloat/RFrac are now "truthy", implements __bool__. - More operator support for RInt/RFloat/RFrac: - __pos__ => +a - __neg__ => -a - __abs__ => abs(a) - __div__ => a/b - __mod__ => a%b These work in Python, but are mainly used to implement expr eval in csv.py.	2024-11-16 13:37:15 -06:00
Christopher Haster	7cfcc1af1d	scripts: Renamed summary.py -> csv.py This seems like a more fitting name now that this script has evolved into more of a general purpose high-level CSV tool. Unfortunately this does conflict with the standard csv module in Python, breaking every script that imports csv (which is most of them). Fortunately, Python is flexible enough to let us remove the current directory before imports with a bit of an ugly hack: # prevent local imports __import__('sys').path.pop(0) These scripts are intended to be standalone anyways, so this is probably a good pattern to adopt.	2024-11-09 12:31:16 -06:00
Christopher Haster	007ac97bec	scripts: Adopted double-indent on multiline expressions This matches the style used in C, which is good for consistency: a_really_long_function_name( double_indent_after_first_newline( single_indent_nested_newlines)) We were already doing this for multiline control-flow statements, simply because I'm not sure how else you could indent this without making things really confusing: if a_really_long_function_name( double_indent_after_first_newline( single_indent_nested_newlines)): do_the_thing() This was the only real difference style-wise between the Python code and C code, so now both should be following roughly the same style (80 cols, double-indent multiline exprs, prefix multiline binary ops, etc).	2024-11-06 15:31:17 -06:00
Christopher Haster	48c2e7784b	scripts: Renamed import math alias m -> mt Mainly to avoid conflicts with match results m, this frees up the single letter variables m for other purposes. Choosing a two letter alias was surprisingly difficult, but mt is nice in that it somewhat matches it (for itertools) and ft (for functools).	2024-11-05 01:58:40 -06:00
Christopher Haster	96ddc72481	scripts: Moved hot path calculation before recursive rendering So now the hot path participates in sorting, folding, etc: $ ./scripts/stack.py ./lfs.ci ./lfs_util.ci \ -Dfunction=lfsr_mount -t -sframe function frame limit lfsr_mount 96 2736 \|-> lfsr_mdir_commit 512 2368 \|-> lfsr_btree_commit__.constprop 336 1648 \|-> lfs_alloc 272 1296 \|-> lfsr_btree_commit 208 1856 \|-> lfsr_btree_lookupnext_ 208 720 \|-> lfsr_mtree_gc 192 2560 \|-> lfsr_mtree_traverse 176 1024 \|-> lfsr_rbyd_lookupnext 160 448 \|-> lfsr_bd_readtag.constprop 128 288 \|-> lfsr_mtree_lookup 128 848 \|-> lfsr_bd_read 80 160 \|-> lfsr_bd_read__ 80 80 \|-> lfsr_fs_gc 80 2640 \|-> lfsr_rbyd_sublookup 64 512 '-> lfsr_rbyd_alloc 16 1312 TOTAL 96 2736 This risks some rather unintuitive behavior now that the hot path rendering no longer matches the call stack, but in theory the extra sorting features are more useful? This is a bit of an experiment, if this is more confusing than useful, we can always revert to the strict call-order ordering. Note that you can _usually_ get the call-order ordering by sorting by limit, but this trick breaks if any call frames are zero sized...	2024-11-05 01:23:01 -06:00
Christopher Haster	ade563cc24	scripts: Removed outdated non-terminating warning from scripts All of these scripts have cycle detectors now, so this warning should not longer be valid.	2024-11-04 18:26:22 -06:00
Christopher Haster	c0a9af1e9a	scripts: Moved recursive entry generation before table rendering This fixes an issue where mixing recursive renderers (-t/--hot or -z/--depth) with defines (-Dfunction=lfsr_mount) would not account for children entry widths. An unexpected side-effect of no longer filtering the children entries. We could continue to try to estimate the width without table rendering, but it would basically need two full recursive pass at this point... Instead, I've just moved the recursive stuff before table rendering, which should remove any issues with width calculation while also deduplicating the recursive passes. It's invasive for a small change, but probably worthwhile long term. The downside is this does mean our recursive scripts now build the full table (including all recursive calls!) before they start printing. When mixed with unbounded recursive depth (-z0 or --depth=0) this can get quite large and cause quite a slow start. But I guess that was the tradeoff in adopting this sort of intermediate table rendering... At least it does make the code simpler and less bug prone...	2024-11-04 18:18:58 -06:00
Christopher Haster	0c3868f92c	scripts: Fully connected graph in stack.py, no more recursive folding This makes -D/--define more useful in stack.py/perf.py/perfbd.py by no longer hiding undfined children entries. For example: $ ./scripts/stack.py lfs.ci lfs_util.ci -Dfunction=lfsr_mount -t function frame limit lfsr_mount 96 2816 \|-> lfsr_fs_gc 80 2720 \|-> lfsr_mtree_gc 176 2640 \|-> lfsr_mdir_commit 576 2464 ... snip ... Now shows all functions in the hot path of lfsr_mount, where before it would only show functions in the hot path of lfsr_mount that were also _named_ lfsr_mount. The previous behavior was technically not wrong... but not very useful (and confusing). --- This was actually quite a bit annoying to get working because of the possibility of function call cycles. I ended up turning stack.py's result type into a fully connected graph, which only works because Python has a cycle detector. (Actually this script is so short-lived we probably wouldn't care if this leaked memory.) A nice side effect of this is now all the recursive scripts (stack.py, perf.py, and perfbd.py) share the same internal result representation and recursive printing logic, which is probably a good thing.	2024-11-04 18:12:57 -06:00
Christopher Haster	711cebfcf3	scripts: Simplified memoization of stack.py's limit calculation While a decorator does a good job of separating concerns here, it's a bit overkill for a single function.	2024-11-04 18:09:33 -06:00
Christopher Haster	d324333903	scripts: Fixed names/lines falling out of sync in diff table renderers As a convenience, -d/--diff in our measurement scripts hides entries that are unchanged by default. Unfortunately this was broken during a recent refactor that ended up filtering the line info but not the actual names. Instead of reverting the broken part of the refactor, I've just moved the filtering up to where we calculate the names. Hopefully this fixes the bug while also simplifying this messy chunk of a logic a bit.	2024-11-04 18:04:58 -06:00
Christopher Haster	e32af5cd8a	scripts: Added -t/--hot to recursive scripts, stack.py, etc This is mainly useful for stack.py, where -t/--hot lets you quickly see everything that contributes to the stack limit for each function. This was (and still is) possible with -s + -z, but it was pretty annoying to use: - The stack trace rendered _diagonally_ as a consequence of -z, which is probably the worst use of screen real estate. - This trick only really worked with -s, which was the opposite order of what you usually want on the command line: -S. Adding a special for-purpose -t/--hot flag makes looking at the hot path much easier, at the cost of more hacky python code (and I _mean_ hacky, making the hot path selection useful while following exising sort rules was annoyingly complicated). Also added -t/--hot to perf.py and perfbd.py for consistency, though it makes a bit less sense there. Also also reworked related code in all three scripts: stack.py, perf.py, perfbd.py. The logic should be a bit more equivalent, and perf.py/perfbd.py detect cycles now.	2024-11-04 18:03:59 -06:00
Christopher Haster	904c2eddd7	scripts: Memoized stack.py's limit calculation This is a pretty classic case for memoization. We don't really need to recalculate every stack limit at every call site. Cuts the runtime in half: before: 0.335s after: 0.139s (-58.5%) --- Unfortunately functools.cache was not fit for purpose. It's stuck using all parameters as the key, which breaks on the "seen" parameter we use for cycle detection that otherwise has no impact on results. Fortunately decorators aren't too difficult in Python, so I just rolled my own (cache1).	2024-11-04 17:54:42 -06:00
Christopher Haster	fc45af3f6e	scripts: Added better cycles detection to stack.py stack.py actually already had a simple cycle detector, since we needed one to calculate stack limits without getting stuck. Copying this simple cycle detector into the actual table rendering code lets us print a nice little "cycle detected" message, instead of just vomiting to stdout forever: $ ./scripts/stack.py lfs.ci lfs_util.ci -z -s function frame limit lfsr_format 320 ∞ \|-> lfsr_mountinited 304 ∞ \| \|-> lfsr_mountmroot 80 ∞ \| \| \|-> lfsr_mountmroot 80 ∞ (cycle detected) \| \| \|-> lfsr_mdir_lookup 48 576 ... snip ... The cycle detector is a bit naive, just building a new set each step, but it gets the job done. As for perf.py and perfbd.py, it turns out they can't actually create cycles, so no need for a cycle detector. This is good because I didn't really want to test these scripts again :)	2024-11-04 17:54:11 -06:00
Christopher Haster	54d77da2f5	Dropped csv field prefixes in scripts The original idea was to allow merging a whole bunch of different csv results into a single lfs.csv file, but this never really happened. It's much easier to operate on smaller context-specific csv files, where the field prefix: - Doesn't really add much information - Requires more typing - Is confusing in how it doesn't match the table field names. We can always use summary.py -fcode_size=size to add prefixes when necessary anyways.	2024-06-02 19:19:46 -05:00
Christopher Haster	169952dec0	Tweaked scripts to render new entry ratios as +∞% We already rely on this symbol in these scripts, so might use it to display the mathematically correct ratio for new entries. This has the added benefit of ordering new entries vs extremely big changes correctly: $ ./scripts/code.py -u test.after.csv -d test.before.csv function (1 added, 0 removed) osize nsize dsize test_a - 49 +49 (+∞%) test_b 19 719 +700 (+3684.2%) test_c 91 191 +100 (+109.9%) TOTAL 110 959 +849 (+771.8%)	2024-06-02 19:19:46 -05:00
Christopher Haster	06bfed7a8b	Interspersed precent/notes in measurement scripts This is a bit more complicated, but make testmarks really showed how confusing this could get. Now, instead of: suite passed time test_alloc 304/304 1.6 (100.0%) test_badblocks 6880/6880 1323.3 (100.0%) ... snip ... test_rbyd 385878/385878 592.7 (100.0%) test_relocations 7899/7899 318.8 (100.0%) TOTAL 548206/548206 6229.7 (100.0%) Percents/notes are interspersed next to their relevant fields: suite passed time test_alloc 304/304 (100.0%) 1.6 test_badblocks 6880/6880 (100.0%) 1323.3 ... snip ... test_rbyd 385878/385878 (100.0%) 592.7 test_relocations 7899/7899 (100.0%) 318.8 TOTAL 548206/548206 (100.0%) 6229.7 Note has no effect on scripts with only a single field (code.py, etc). But it does make multi-field diffs a bit more readable: $ ./scripts/stack.py -u after.stack.csv -d before.stack.csv -p function frame limit lfsr_bd_sync 8 (+100.0%) 216 (+100.0%) lfsr_bd_flush 40 (+25.0%) 208 (+4.0%) ... snip ... lfsr_file_flush 32 (+0.0%) 2424 (-0.3%) lfsr_file_flush_ 216 (-3.6%) 2392 (-0.3%) TOTAL 9008 (+0.4%) 2600 (-0.3%)	2024-06-02 19:19:38 -05:00
Christopher Haster	a9f6b6e903	Renamed internal script result types * -> R* So Int -> RInt, Frac -> RFrac, etc. This just helps distinguish these types from builtin types, which could be confusing.	2024-05-18 13:00:15 -05:00
Christopher Haster	03ea2e6ac5	Tweaked cov.py, summary.py, to render fraction percents as notes This matches how diff percentages are rendered, and simplifies the internal table rendering by making Frac less of a special case. It also allows for other type notes in the future. One concern is how all the notes are shoved to the side, which may make it a bit harder to find related percentages. If this becomes annoying we should probably look into interspersing all notes (including diff percentages) between the relevant columns. Before: function lines branches lfsr_rbyd_appendattr 230/231 99.6% 172/192 89.6% lfsr_rbyd_p_recolor 33/34 97.1% 11/12 91.7% lfs_alloc 40/42 95.2% 21/24 87.5% lfsr_rbyd_appendcompaction 54/57 94.7% 39/42 92.9% ... After: function lines branches lfsr_rbyd_appendattr 230/231 172/192 (99.6%, 89.6%) lfsr_rbyd_p_recolor 33/34 11/12 (97.1%, 91.7%) lfs_alloc 40/42 21/24 (95.2%, 87.5%) lfsr_rbyd_appendcompaction 54/57 39/42 (94.7%, 92.9%) ...	2024-05-18 13:00:15 -05:00

1 2

79 Commits