Moved local import hack behind if __name__ == "__main__"
These scripts aren't really intended to be used as python libraries.
Still, it's useful to import them for debugging and to get access to
their juicy internals.
This is now inconsistent with csv.py, and I don't really want to add a
full expr parser to every script that might want to rename fields.
Field renaming (or any expr really!) can be accomplished with
intermediate calls to csv.py anyways. No reason to make these scripts
more complicated than they need to be.
This seems like a more fitting name now that this script has evolved
into more of a general purpose high-level CSV tool.
Unfortunately this does conflict with the standard csv module in Python,
breaking every script that imports csv (which is most of them).
Fortunately, Python is flexible enough to let us remove the current
directory before imports with a bit of an ugly hack:
# prevent local imports
__import__('sys').path.pop(0)
These scripts are intended to be standalone anyways, so this is probably
a good pattern to adopt.
This matches the style used in C, which is good for consistency:
a_really_long_function_name(
double_indent_after_first_newline(
single_indent_nested_newlines))
We were already doing this for multiline control-flow statements, simply
because I'm not sure how else you could indent this without making
things really confusing:
if a_really_long_function_name(
double_indent_after_first_newline(
single_indent_nested_newlines)):
do_the_thing()
This was the only real difference style-wise between the Python code and
C code, so now both should be following roughly the same style (80 cols,
double-indent multiline exprs, prefix multiline binary ops, etc).
Mainly to avoid conflicts with match results m, this frees up the single
letter variables m for other purposes.
Choosing a two letter alias was surprisingly difficult, but mt is nice
in that it somewhat matches it (for itertools) and ft (for functools).
- Not as easy to read as --ggplot, the light shades are maybe poorly
suited for plots vs other larger block elements on GitHub. I don't
know, I'm not really a graphic designer.
- GitHub may be a moving target in the future.
- GitHub is already a moving target because it has like 9 different
optional color schemes (which is good!), so most of the time the
colors won't match anyways.
- The neutral gray of --ggplot works just as well outside of GitHub.
Worst case, --github was just a preset color palette, so it could in
theory be emulated with --foreground + --background + --font-color.
Previously, any labeling was _technically_ possible, but tricky to get
right and usually required repeated renderings.
It evolved out of the way colors/formats were provided: a cycled
order-significant list that gets zipped with the datasets. This works
ok for somewhat arbitrary formatting, such as colors/formats, but falls
apart for labels, where it turns out to be somewhat important what
exactly you are labeling.
The new scheme makes the label's relationship explicit, at the cost of
being a bit more verbose:
$ ./scripts/plotmpl.py bench.csv -obench.svg \
-Linorder=0,4096,avg,bench_readed \
-Lreversed=1,4096,avg,bench_readed \
-Lrandom=2,4096,avg,bench_readed
This could also be adopted in the CSV manipulation scripts (code.py,
stack.py, summary.py, etc), but I don't think it would actually see that
much use. You can always awk the output to change names and it would add
more complexity to a set of scripts that are probably already way
over-designed.
Note there's a bit of subtlety here, field _types_ are still infered,
but the intention of the fields, i.e. if the field contains data vs
row name/other properties, must be unambiguous in the scripts.
There is still a _tiny_ bit of inference. For most scripts only one
of --by or --fields is strictly needed, since this makes the purpose of
the other fields unambiguous.
The reason for this change is so the scripts are a bit more reliable,
but also because this simplifies the data parsing/inference a bit.
Oh, and this also changes field inference to use the csv.DictReader's
fieldnames field instead of only inspecting the returned dicts. This
should also save a bit of O(n) overhead when parsing CSV files.
The whitespace sensitivity of field args was starting to be a problem,
mostly for advanced plotmpl.py usage (which tbf might be appropriately
described as "super hacky" in how it uses CLI parameters):
./scripts/plotmpl.py \
-Dcase=" \
bench_rbyd_attr_append, \
bench_rbyd_attr_remove, \
bench_rbyd_attr_fetch, \
..."
This may present problems when parsing CSV files with whitespace, in
theory, maybe. But given the scope of these scripts for littlefs...
just don't do that. Thanks.
With the quantity of data being output by bench.py now, filtering ASAP
while parsing CSV files is a valuable optimization. And thanks to how
CSV files are structured, we can even avoid ever loading the full
contents into RAM.
This does end up with use filtering for defines redundantly in a few
places, but this is well worth the saved overhead from early filtering.
Also tried to clean up the plot.py/plotmpl.py's data folding path,
though that may have been wasted effort.
These benchmarks are now more useful for seeing how these B-trees perform.
In plot.py/plotmpl.py:
- Added --legend as another alias for -l, --legend-right.
- Allowed omitting of datasets from the legend by using empty strings
in --labels.
- Do not sum multiple data points on the same x coordinate. This was a
bad idea that risks invalid results going unnoticed.
As a plus multiple data points on the same x coordinate can be abused for
a cheap representation of measurement error.
- Added both uattr (limited to 256) and id (limited to 65535) benchmarks
covering the main rbyd operations
- Fixed issue where --defines gets passed to the test/bench runners when
querying id-specific information. After changing the test/bench
runners to prioritize explicit defines, this causes problems for
recorded benchmark results and debug related things.
- In plot.py/plotmpl.py, made --by/-x/-y in subplots behave somewhat
reasonably, contributing to a global dataset and the figure's legend,
colors, etc, but only shown in the specified subplot. This is useful
mainly for showing different -y values on different subplots.
- In plot.py/plotmpl.py, added --labels to allow explicit configuration
of legend labels, much like --colors/--formats/--chars/etc. This
removes one of the main annoying needs for modifying benchmark results.
Driven primarily by a want to compare measurements of different runtime
complexities (it's difficult to fit O(n) and O(log n) on the same plot),
this adds the ability to nest subplots in the same .svg which try to align
as much as possible. This turned out to be surprisingly complicated.
As a part of this, adopted matplotlib's relatively recent
constrained_layout, which behaves much more consistently.
Also dropped --legend-left, no one should really be using that.
The difference between ggplot's gray and GitHub's gray was a bit jarring.
This also adds --foreground and --font-color for this sort of additional
color control without needing to add a new flag for every color scheme
out there.
- Fixed prettyasserts.py parsing when '->' is in expr
- Made prettyasserts.py failures not crash (yay dynamic typing)
- Fixed the initial state of the emubd disk file to match the internal
state in RAM
- Fixed true/false getting changed to True/False in test.py/bench.py
defines
- Fixed accidental substring matching in plot.py's --by comparison
- Fixed a missed LFS_BLOCk_CYCLES in test_superblocks.toml that was
missed
- Changed test.py/bench.py -v to only show commands being run
Including the test output is still possible with test.py -v -O-, making
the implicit inclusion redundant and noisy.
- Added license comments to bench_runner/test_runner
Note that plotmpl.py tries to share many arguments with plot.py,
allowing plot.py to act as a sort of draft mode for previewing plots
before creating an svg.