1. Being able to inspect results before benchmarks complete was useful
to track their status. It also allows some analysis even if a
benchmark fails.
2. Moving these scripts out of bench.py allows them to be a bit more
flexible, at the cost of CSV parsing/structuring overhead.
3. Writing benchmark measurements immediately avoids RAM buildup as we
store intermediate measurements for each bench permutation. This may
increase the IO bottleneck, but we end up writing the same number of
lines, so not sure...
I realize avg.py has quite a bit of overlap with summary.py, but I don't
want to entangle them further. summary.py is already trying to do too
much as is...