Commit Graph

79 Commits

Author SHA1 Message Date
Christopher Haster
531c2bcc4c Quieted test.py/bench.py status when stdout is aimed at stdout
This is a condition for specifically the -O- pattern. Doing anything
fancier would be too much, so anything clever such as -O/dev/stdout
will still be clobbered.

This was a common enough pattern and the status updates clobbering
stdout was annoying enough that I figured this warranted a special case.
2024-03-20 13:58:22 -05:00
Christopher Haster
76593711ab Added -f/--fail to test.py/bench.py
This just tells test.py/bench.py to pretend the test failed and trigger
any conditional utilities. This can be combined with --gdb to easily
inspect a test that isn't actually failing.

Up until this point I've just been inserting assert(false) when needed,
which is clunky.
2024-03-20 13:50:04 -05:00
Christopher Haster
1422a61d16 Made generated prettyasserts more debuggable
The main star of the show is the adoption of __builtin_trap() for
aborting on assert failure. I discovered this GCC/Clang extension
recently and it integrates much, _much_ better with GDB.

With stdlib's abort(), GDB drops you off in several layers of internal
stdlib functions, which is a pain to navigate out of to get to where the
assert actually happened. With __builtin_trap(), GDB stops immediately,
making debugging quick and easy.

This is great! The pain of debugging needs to come from understanding
the error, not just getting to it.

---

Also tweaked a few things with the internal print functions to make
reading the generated source easier, though I realize this is a rare
thing to do.
2024-02-14 01:14:36 -06:00
Christopher Haster
06a360462a Simplified test/bench suite finding logic in test.py/bench.py
These just take normal paths now, we weren't even using the magic
test/bench suite finding logic since it's easier to just pass everything
explicitly in our Makefile.

The original test/bench suite finding logic was a bad idea anyways. This
is what globs are for, and having custom path chasing logic is
inconsistent and risks confusion.
2024-02-14 00:25:10 -06:00
Christopher Haster
a124ee54e7 Reworked test/bench defines to map to global variables
Motivation:

- Debuggability. Accessing the current test/bench defines from inside
  gdb was basically impossible for some dumb macro-debug-info reason I
  can't figure out.

  In theory, GCC provides a .debug_macro section when compiled with -g3.
  I can see this section with objdump --dwarf=macro, but somehow gdb
  can't seem to find any definitions? I'm guess the #line source
  remapping is causing things to break somehow...

  Though even if macro-debugging gets fixed, which would be valuable,
  accessing defines in the current test/bench runner can trigger quite
  a bit of hidden machinery. This risks side-effects, which is never
  great when debugging.

  All of this is quite annoying because the test/bench defines is
  usually the most important piece of information when debugging!

  This replaces the previous hidden define machinery with simple global
  variables, which gdb can access no problem.

- Also when debugging we no longer awkwardly step into the test_define
  function all the time!

- In theory, global variables, being a simple memory access, should be
  quite a bit faster than the hidden define machinery. This does matter
  because running tests _is_ a dev bottleneck.

  In practice though, any performance benefit is below the noise floor,
  which isn't too surprising (~630s +-~20s).

- Using global variables for defines simplifies the test/bench runner
  quite a bit.

  Though some of the previous complexity was due to a whole internal
  define caching system, which was supposed to lazily evaluate test
  defines to avoid evaluating defines we don't use. This all proved to
  be useless because the first thing we do when running each test is
  evaluate all defines to generate the test id (lol).

So now, instead of lazily evaluating and caching defines, we just
generate global variables during compilation and evaluate all defines
for each test permutation immediately before running.

This relies heavily on __attribute__((weak)) symbols, and lets the
linker really shine.

As a funny perk this also effectively interns all test/bench defines by
the address of the resulting global variable. So we don't even need to
do string comparisons when mapping suite-level defines to the
runner-level defines.

---

Perhaps the more interesting thing to note, is the change in strategy in
how we actually evaluate the test defines.

This ends up being a surprisingly tricky problem, due to the potential
of mutual recursion between our defines.

Previously, because our define machinery was lazy, we could just
evaluate each define on demand. If a define required another define, it
would lazily trigger another evaluation, implicitly recursing through
C's stack. If cyclic, this would eventually lead to a stack overflow,
but that's ok because it's a user error to let this happen.

The "correct" way, at least in terms of being computationally optimal,
would be to topologically sort the defines and evaluate the resulting
tree from the leaves up.

But I ain't got time for that, so the solution here is equal parts
hacky, simple, and effective.

Basically, we just evaluate the defines repeatedly until they stop
changing:

- Initially, mutually recursive defines may read the uninitialized
  values of their dependencies, and end up with some arbitrarily wrong
  result. But as the defines are repeatedly evaluated, assuming no
  cycles, the correct results should eventually bubble up the tree until
  all defines converge to the correct value.

- This is O(n*e) vs O(n+e), but our define graph is usually quite
  shallow.

- To prevent non-halting, we error after an arbitrary 1000 iterations.
  If you hit this, it's likely because there is a cycle in the define
  graph.

  This is runtime configurable via the new --define-depth flag.

- To keep things consistent and reproducible, we zero initialize all
  defines before the first evaluation.

  I don't think this is strictly necessary, but it's important for the
  test runner to have the exact same results on every run. No one wants
  a "works on my machine" situation when the tests are involved.

Experimentation shows we only need an evaluation depth of 2 to
successfully evaluate the current set of defines:

  $ ./runners/test_runner --list-defines --define-depth=2

And any performance impact is negligible (~630s +-~20s).
2024-02-13 18:59:58 -06:00
Christopher Haster
724fc5fc91 Hide gdb info header from test.py/bench.py --gdb
This was too noisy when intermingled with other debug output
test.py/bench.py prints when dropping into gdb.
2024-02-03 18:14:56 -06:00
Christopher Haster
161cd9e6da Fixed race condition killing test processes in test/bench.py
Note sure why we weren't hitting this earlier, but I've been hitting
this race condition a bunch recently and it's annoying.

Now every failed process kills the other test processes unconditionally.

It's not clear if this actually _fixes_ the race condition or just makes
it less likely, but it's good enough to keep the test script user
friendly.
2023-12-17 15:18:26 -06:00
Christopher Haster
d485795336 Removed concept of geometries from test/bench runners
This turned out to not be all that useful.

Tests already take quite a bit to run, which is a good thing! We have a
lot of tests! 942.68s or ~15 minutes of tests at the time of writing to
be exact. But simply multiplying the number of tests by some number of
geometries is heavy handed and not a great use of testing time.

Instead, tests where different geometries are relevant can parameterize
READ_SIZE/PROG_SIZE/BLOCK_SIZE at the suite level where needed. The
geometry system was just another define parameterization layer anyways.

Testing different geometries can still be done in CI by overriding the
relevant defines anyways, and it _might_ be interesting there.
2023-12-06 22:23:41 -06:00
Christopher Haster
6d81b0f509 Changed --context short flag to -C in scripts
This matches diff and grep, and avoids lower-case conflicts in
test.py/bench.py.
2023-11-06 01:59:03 -06:00
Christopher Haster
d1b9a2969f Added -F/--failures to test.py/bench.py to limit failures when -k/--keep-going
The -k/--keep-going option has been more or less useless before this
since it would completely flood the screen/logs when a bug triggers
multiple test failures, which is common.

Some things to note:

- RAM management is tricky with -k/--keep-going, if we try to save logs
  and filter after running everything we quickly fill up memory.

- Failing test cases are a much slower path than successes since we need
  to kill and restart the underlying test_runner, its state can't be
  trusted anymore. This is a-ok since hopefully you usually hope for
  many more successes than failures. Unfortunately it can make
  -k/--keep-going quite slow.

---

ALSO -- warning this is a tangent rant-into-the-void -- I have
discovered that Ubuntu has a "helpful" subsystem named Apport that tries
to record/log/report any process crash in the system. It is "disabled" by
default, but the way it's disabled requires LAUNCHING A PYTHON
INTERPRETER to check a flag on every segfault/assert failure.

This is what it does when it's "disabled"!

This subsystem is fundamentally incompatible with any program that
intentionally crashes subprocesses, such as our test runner. The sheer
amount of python interpreters being launched quickly eats through all
available RAM and starts OOM killing half the processes on the system.

If anyone else runs into this, a shallow bit of googling suggests the
best solution is to just disable Apport. It is not a developer friendly
subsystem:

  $ sudo systemctl disable apport.service

Removing Apport brings RAM usage back down to a constant level, even
with absurd numbers of test failures. And here I thought I had memory
leak somewhere.
2023-11-06 01:55:28 -06:00
Christopher Haster
1e4d4cfdcf Tried to write errors to stderr consistently in scripts 2023-11-05 15:55:07 -06:00
Christopher Haster
fb9277feac Tweaked test.py/bench.py to allow no suites to test compilation
This is mainly to allow bench_runner to at least compile after moving
benches out of tree.

Also cleaned up lingering runner/suite munging leftover from the change
to an optional -R/--runner parameter.
2023-11-03 11:15:45 -05:00
Christopher Haster
39f417db45 Implemented a filesystem traversal that understands file bptrs/btrees
Ended up changing the name of lfsr_mtree_traversal_t -> lfsr_traversal_t,
since this behaves more like a filesytem-wide traversal than an mtree
traversal (it returns several typed objects, not mdirs like the other
mtree functions for one).

As a part of this changeset, lfsr_btraversal_t (was lfsr_btree_traversal_t)
and lfsr_traversal_t no longer return untyped lfsr_data_ts, but instead
return specialized lfsr_{b,t}info_t structs. We weren't even using
lfsr_data_t for its original purpose in lfsr_traversal_t.

Also changed lfsr_traversal_next -> lfsr_traversal_read, you may notice
at this point the changes are intended to make lfsr_traversal_t look
more like lfsr_dir_t for consistency.

---

Internally lfsr_traversal_t now uses a full state machine with its own
enum due to the complexity of traversing the filesystem incrementally.

Because creating diagrams is fun, here's the current full state machine,
though note it will need to be extended for any
parity-trees/free-trees/etc:

  mrootanchor
       |
       v
  mrootchain
  .-'  |
  |    v
  |  mtree ---> openedblock
  '-. | ^           | ^
    v v |           v |
   mdirblock    openedbtree
      | ^
      v |
   mdirbtree

I'm not sure I'm happy with the current implementation, and eventually
it will need to be able to handle in-place repairs to the blocks it
sees, so this whole thing may need a rewrite.

But in the meantime, this passes the new clobber tests in test_alloc, so
it should be enough to prove the file implementation works. (which is
definitely is not fully tested yet, and some bugs had to be fixed for
the new tests in test_alloc to pass).

---

Speaking of test_alloc.

The inherent cyclic dependency between files/dirs/alloc makes it a bit
hard to know what order to test these bits of functionality in.

Originally I was testing alloc first, because it seems you need to be
confident in your block allocator before you can start testing
higher-level data structures.

But I've gone ahead and reversed this order, testing alloc after
files/dirs. This is because of an interesting observation that if alloc
is broken, you can always increase the test device's size to some absurd
number (-DDISK_SIZE=16777216, for example) to kick the can down the
road.

Testing in this order allows alloc to use more high-level APIs and
focus on corner cases where the allocator's behavior requires subtlety
to be correct (e.g. ENOSPC).
2023-10-14 01:13:40 -05:00
Christopher Haster
52113c6ead Moved the test/bench runner path behind an optional flag
So now instead of needing:

  ./scripts/test.py ./runners/test_runner test_dtree

You can just do:

  ./scripts/test.py test_dtree

Or with an explicit path:

  ./scripts/test.py -R./runners/test_runner test_dtree

This makes it easier to run the script manually. And, while there may be
some hiccups with the implicit relative path, I think in general this will
make the test/bench scripts easier to use.

There was already an implicit runner path, though only if the test suite
was completely omitted. I'm not sure that would ever have actually
been useful...

---

Also increased the permutation field size in --list-*, since I noticed it
was overflowing.
2023-10-14 00:54:28 -05:00
Christopher Haster
e7bf5ad82f Added scripts/crc32c.py
This seems like a useful script to have.
2023-09-15 18:42:48 -05:00
Christopher Haster
528f104cb4 Enabled internal test code at the suite-level
Test suites already had the ability to provide suite-level code via the
"code" attribute, but this was placed in the suite's generated source
file, making it inaccessbile to internal tests.

This change allows suite code to be placed in the same place as internal
tests, via the "in" attribute, though this has some caveats:

1. Suite-level code generally declares helper functions in global scope.
   We don't parse this code or anything, so name collisions between
   helper functions across different test suites is up to the developer
   to resolve.

2. Internal suite-level code has access to internal functions/variables/
   etc, this means we can't place a copy in our suite's generate source
   and expect it to compile. For this reason, internal suite-level code
   is unavailable for non-internal tests in the suite.

   This also means you only get to place internal suite-level code in a
   single source file. Though this is not really an issue since littlefs
   is basically a single file...
2023-08-19 12:20:13 -05:00
Christopher Haster
4efb55e0d7 In tests/benches, renamed cfg -> CFG
This is to better indicate this is a runner generated variable.
2023-08-04 14:05:07 -05:00
Christopher Haster
1c128afc90 Renamed internal runner field filter -> if_
This makes it more consistent with the actual test field, at the cost of
the symbol collision.
2023-08-04 13:54:10 -05:00
Christopher Haster
5be7bae518 Replaced tn/bn prefixes with an actual dependency system in tests/benches
The previous system of relying on test name prefixes for ordering was
simple, but organizing tests by dependencies and topologically sorting
during compilation is 1. more flexible and 2. simplifies test names,
which get typed a lot.

Note these are not "hard" dependencies, each test suite should work fine
in isolation. These "after" dependencies just hint an ordering when all
tests are ran.

As such, it's worth noting the tests should NOT error of a dependency is
missing. This unfortunately makes it a bit hard to catch typos, but
allows faster compilation of a subset of tests.

---

To make this work the way tests are linked has changed from using custom
linker section (fun linker magic!) to a weakly linked array appended to
every source file (also fun linker magic!).

At least with this method test.py has strict control over the test
ordering, and doesn't depend on 1. the order in which the linker merges
sections, and 2. the order tests are passed to test.py. I didn't realize
the previous system was so fragile.
2023-08-04 13:33:00 -05:00
Christopher Haster
c5e84e874f Changed how fuzz tests are iterated to allow powerloss-fuzz testing
Instead of iterating over a number of seeds in the test itself, the
seeds are now permuted as a part of normal test defines.

This lets each seed take advantage of other test features, mainly the
ability to test powerlosses heuristically.

This is probably how it should have been done in the first place, but
the permutation tests can't do this since the number of permutations
changes as the size of the test input changes. The test define system
can't handle that very well.

The tradeoffs here are:

- We can't do cross-fuzz checks, such as the balance checks in the rbyd
  tests, though those really should be moved to benchmarks anyways.

- The large number of cheap fuzz permutations skews the total
  permutation count, though I'm not sure this matters.

  before: 3083 permutations (-Gnor)
  after: 409893 permutations (-Gnor)
2023-07-18 21:40:44 -05:00
Christopher Haster
b05db8e3d3 Added support for lists of conditional ifs in test/bench.py
Any conditions in both the suites and cases are anded together to
determine when the test/bench should run.

Accepting a list here makes it easier to compose multiple conditions,
since toml-level elements are a bit easier to modify than strings of
C expressions.
2023-06-01 17:40:51 -05:00
Christopher Haster
07244fb2d4 In test/bench.py, added "internal" flag
This marks internal tests/benches (case.in="lfs.c") with an otherwise-unused
flag that is printed during --summary/--list-*. This just helps identify which
tests/benches are internal.
2023-06-01 17:40:48 -05:00
Christopher Haster
82027f3d90 Changed bench/test.py to error if explicit suite/case can't be found
Previously no matches would noop, which, while consistent with an empty
test suite that contains no tests but shouldn't really error, this made
it easy to miss when a typo would cause tests to be missed.

Also added a bit of color to script-level errors in test/bench.py
2023-06-01 17:16:21 -05:00
Christopher Haster
9b033987ef Renamed --gdb-case => --gdb-permutation for correctness 2023-03-19 01:21:27 -05:00
Christopher Haster
83eba5268d Added support for globs in test.py/bench.py, better -b/-B
This reworks test.py/bench.py a bit to map arguments to ids as a first
step instead of defering as much as possible. This is a better design
and avoids the hackiness around -b/-B. As a plus, test_id globbing is
easy to add.
2023-03-17 15:15:53 -05:00
Christopher Haster
59a57cb767 Reworked test_runner/bench_runner to evaluate define permutations lazily
I wondered if walking in Python 2's footsteps was going to run into the
same issues and sure enough, memory backed iterators became unweildy.

The motivation for this change is that large ranges in tests, such as
iterators over seeds or permutations, became prohibitively expensive to
compile. This meant more iteration moving into tests with more steps to
reproduce failures. This sort of defeats the purpuse of the test
framework.

The solution here is to move test permutation generation out of test.py
and into the test runner itself. The allows defines to generate their
values programmatically.

This does conflict with the test frameworks support of sets of explicit
permutations, but this is fixed by also moving these "permutation sets"
down into the test runner.

I guess it turns out the closer your representation matches your
implementation the better everythign works.

Additionally the define caching layer got a bit of tweaking. We can't
precalculate the defines because of mutual recursion, but we can
precalculate which define/permutation each define id maps to. This is
necessary as otherwise figuring out each define's define-specific
permutation would be prohibitively expensive.
2023-03-17 15:06:56 -05:00
Christopher Haster
a20625be7c Allowed empty suites in test.py/bench.py
This happens when you need to comment out an entire suite due to
temporary changes.
2023-03-17 14:20:09 -05:00
Christopher Haster
9a8e1d93c6 Added some rbyd benchmarks, fixed/tweaked some related scripts
- Added both uattr (limited to 256) and id (limited to 65535) benchmarks
  covering the main rbyd operations

- Fixed issue where --defines gets passed to the test/bench runners when
  querying id-specific information. After changing the test/bench
  runners to prioritize explicit defines, this causes problems for
  recorded benchmark results and debug related things.

- In plot.py/plotmpl.py, made --by/-x/-y in subplots behave somewhat
  reasonably, contributing to a global dataset and the figure's legend,
  colors, etc, but only shown in the specified subplot. This is useful
  mainly for showing different -y values on different subplots.

- In plot.py/plotmpl.py, added --labels to allow explicit configuration
  of legend labels, much like --colors/--formats/--chars/etc. This
  removes one of the main annoying needs for modifying benchmark results.
2023-02-12 17:14:42 -06:00
Christopher Haster
c2147c45ee Added --gdb-pl to test.py for breaking on specific powerlosses
This allows debugging strategies such as binary searching for the point
of "failure", which may be more complex than simply failing an assert.
2022-12-17 12:39:42 -06:00
Christopher Haster
801cf278ef Tweaked/fixed a number of small runner things after a bit of use
- Added support for negative numbers in the leb16 encoding with an
  optional 'w' prefix.

- Changed prettyasserts.py rule to .a.c => .c, allowing other .a.c files
  in the future.

- Updated .gitignore with missing generated files (tags, .csv).

- Removed suite-namespacing of test symbols, these are no longer needed.

- Changed test define overrides to have higher priority than explicit
  defines encoded in test ids. So:

    ./runners/bench_runner bench_dir_open:0f1g12gg2b8c8dgg4e0 -DREAD_SIZE=16

  Behaves as expected.

  Otherwise it's not easy to experiment with known failing test cases.

- Fixed issue where the -b flag ignored explicit test/bench ids.
2022-12-17 12:35:44 -06:00
Christopher Haster
397aa27181 Removed unnecessarily heavy RAM usage from logs in bench/test.py
For long running processes (testing with >1pls) these logs can grow into
multiple gigabytes, humorously we never access more than the last n lines
as requested by --context. Piping the stdout with --stdout does not use
additional RAM.
2022-12-06 23:07:28 -06:00
Christopher Haster
eba5553314 Fixed hidden orphans by separating deorphan search into two passes
This happens in rare situations where there is a failed mdir relocation,
interrupted by a power-loss, containing the destination of a directory
rename operation, where the directory being renamed preceded the
relocating mdir in the mdir tail-list. This requires at some point for a
previous directory rename to create a cycle.

If this happens, it's possible for the half-orphan to contain the only
reference to the renamed directory. Since half-orphans contain outdated
state when viewed through the mdir tail-list, the renamed directory
appears to be a full-orphan until we fix the relocating half-orphan.
This causes littlefs to incorrectly remove the renamed directory from
the mdir tail-list, causes catastrophic problems down the line.

The source of the problem is that the two different types of orphans
really operate on two different levels of abstraction: half-orphans fix
failed mdir commits, while full-orphans fix directory removes/renames.
Conflating the two leads to situations where we attempt to fix assumed
problems about the directory tree before we have fixed problems with the
mdir state.

The fix here is to separate out the deorphan search into two passes: one
to fix half-orphans and correct any mdir-commits, restoring the mdirs
and gstate to a known good state, then two to fix failed
removes/renames.

---

This was found with the -Plinear heuristic powerloss testing, which now
runs on more geometries. The failing case was:

  test_relocations_reentrant_renames:112gg261dk1e3f3:123456789abcdefg1h1i1j1k1
  l1m1n1o1p1q1r1s1t1u1v1g2h2i2j2k2l2m2n2o2p2q2r2s2t2

Also fixed/tweaked some parts of the test framework as a part of finding
this bug:

- Fixed off-by-one in exhaustive powerloss state encoding.

- Added --gdb-powerloss-before and --gdb-powerloss-after to help debug
  state changes through a failing powerloss, maybe this should be
  expanded to any arbitrary powerloss number in the future.

- Added lfs_emubd_crc and lfs_emubd_bdcrc to get block/bd crcs for quick
  state comparisons while debugging.

- Fixed bd read/prog/erase counts not being copied during exhaustive
  powerloss testing.

- Fixed small typo in lfs_emubd trace.
2022-11-28 12:51:18 -06:00
Christopher Haster
bcc88f52f4 A couple Makefile-related tweaks
- Changed --(tool)-tool to --(tool)-path in scripts, this seems to be
  a more common name for this sort of flag.

- Changed BUILDDIR to not have implicit slash, makes Makefile internals
  a bit more readable.

- Fixed some outdated names hidden in less-often used ifdefs.
2022-11-17 10:26:26 -06:00
Christopher Haster
1a07c2ce0d A number of small script fixes/tweaks from usage
- Fixed prettyasserts.py parsing when '->' is in expr

- Made prettyasserts.py failures not crash (yay dynamic typing)

- Fixed the initial state of the emubd disk file to match the internal
  state in RAM

- Fixed true/false getting changed to True/False in test.py/bench.py
  defines

- Fixed accidental substring matching in plot.py's --by comparison

- Fixed a missed LFS_BLOCk_CYCLES in test_superblocks.toml that was
  missed

- Changed test.py/bench.py -v to only show commands being run

  Including the test output is still possible with test.py -v -O-, making
  the implicit inclusion redundant and noisy.

- Added license comments to bench_runner/test_runner
2022-11-15 13:42:07 -06:00
Christopher Haster
b2a2cc9a19 Added teepipe.py and watch.py 2022-11-15 13:38:13 -06:00
Christopher Haster
3a33c3795b Added perfbd.py and block device performance sampling in bench-runner
Based loosely on Linux's perf tool, perfbd.py uses trace output with
backtraces to aggregate and show the block device usage of all functions
in a program, propagating block devices operation cost up the backtrace
for each operation.

This combined with --trace-period and --trace-freq for
sampling/filtering trace events allow the bench-runner to very
efficiently record the general cost of block device operations with very
little overhead.

Adopted this as the default side-effect of make bench, replacing
cycle-based performance measurements which are less important for
littlefs.
2022-11-15 13:38:13 -06:00
Christopher Haster
490e1c4616 Added perf.py a wrapper around Linux's perf tool for perf sampling
This provides 2 things:

1. perf integration with the bench/test runners - This is a bit tricky
   with perf as it doesn't have its own way to combine perf measurements
   across multiple processes. perf.py works around this by writing
   everything to a zip file, using flock to synchronize. As a plus, free
   compression!

2. Parsing and presentation of perf results in a format consistent with
   the other CSV-based tools. This actually ran into a surprising number of
   issues:

   - We need to process raw events to get the information we want, this
     ends up being a lot of data (~16MiB at 100Hz uncompressed), so we
     paralellize the parsing of each decompressed perf file.

   - perf reports raw addresses post-ASLR. It does provide sym+off which
     is very useful, but to find the source of static functions we need to
     reverse the ASLR by finding the delta the produces the best
     symbol<->addr matches.

   - This isn't related to perf, but decoding dwarf line-numbers is
     really complicated. You basically need to write a tiny VM.

This also turns on perf measurement by default for the bench-runner, but at a
low frequency (100 Hz). This can be decreased or removed in the future
if it causes any slowdown.
2022-11-15 13:38:13 -06:00
Christopher Haster
9507e6243c Several tweaks to script flags
- Changed multi-field flags to action=append instead of comma-separated.
- Dropped short-names for geometries/powerlosses
- Renamed -Pexponential -> -Plog
- Allowed omitting the 0 for -W0/-H0/-n0 and made -j0 consistent
- Better handling of --xlim/--ylim
2022-11-15 13:38:13 -06:00
Christopher Haster
4fe0738ff4 Added bench.py and bench_runner.c for benchmarking
These are really just different flavors of test.py and test_runner.c
without support for power-loss testing, but with support for measuring
the cumulative number of bytes read, programmed, and erased.

Note that the existing define parameterization should work perfectly
fine for running benchmarks across various dimensions:

./scripts/bench.py \
    runners/bench_runner \
    bench_file_read \
    -gnor \
    -DSIZE='range(0,131072,1024)'

Also added a couple basic benchmarks as a starting point.
2022-11-15 13:33:34 -06:00
Christopher Haster
20ec0be875 Cleaned up a number of small tweaks in the scripts
- Added the littlefs license note to the scripts.

- Adopted parse_intermixed_args everywhere for more consistent arg
  handling.

- Removed argparse's implicit help text formatting as it does not
  work with perse_intermixed_args and breaks sometimes.

- Used string concatenation for argparse everywhere, uses backslashed
  line continuations only works with argparse because it strips
  redundant whitespace.

- Consistent argparse formatting.

- Consistent openio mode handling.

- Consistent color argument handling.

- Adopted functools.lru_cache in tracebd.py.

- Moved unicode printing behind --subscripts in traceby.py, making all
  scripts ascii by default.

- Renamed pretty_asserts.py -> prettyasserts.py.

- Renamed struct.py -> struct_.py, the original name conflicts with
  Python's built in struct module in horrible ways.
2022-11-15 13:31:11 -06:00
Christopher Haster
11d6d1251e Dropped namespacing of test cases
The main benefit is small test ids everywhere, though this is with the
downside of needing longer names to properly prefix and avoid
collisions. But this fits into the rest of the scripts with globally
unique names a bit better. This is a C project after all.

The other small benefit is test generators may have an easier time since
per-case symbols can expect to be unique.
2022-09-17 03:03:39 -05:00
Christopher Haster
1fcd82d5d8 Made test.py output parsable by summary.py
Also fixed an issue with truncation that resulted in a bunch of null
bytes being injected into the CSV output.
2022-09-17 03:02:43 -05:00
Christopher Haster
23fba40f20 Added option for updating a CSV file with test results
This is mostly for the bench runner which will contain more interesting
results besides just pass/fail.
2022-09-12 12:17:46 -05:00
Christopher Haster
03c1a4ee2e Added permutations and ranges to test defines
This is really more work for the bench runner. With this change defines
can be manipulated at a rather high level at runtime. Which should be
useful for generating benchmarks across various dimensions.

The define grammar in the test_runner is now a bit more powerful,
accepting:

1. A single value: -DN=42
2. A list of values, which get permuted: -DN=1,2,3
3. A range: -DN=range(10)
4. Some combo: -DN=1,2,range(3,0,-1)

This is more complex in the test .toml defines, which can also be C
expressions:

1. A single value: define=42
2. A single expression: define='42*42'
3. A list: define=[1,2,3]
4. A comma separated string: define='1,2,3'
5. A range: define='42*range(10)'
6. This mess: define=[1,2,'3,4,range(2)*range(2)+3']
2022-09-11 21:47:14 -05:00
Christopher Haster
bfbe44e70d Dropped permutation number for full leb16-encoded defines
This is probably how the test runner should have been implemented in the
first place, but it took a few tries to get here.

This makes it so the test identifier, which is a bit longer now, fully
encodes the state of the defines in the test. This removes the need for
the extra geometry field and allows reproduction of tests with custom
defines at runtime.

The test runner may have already seemed like a solved problem, but these
changes are really to enable repurposing the test runner as a bench
runner.
2022-09-10 15:19:34 -05:00
Christopher Haster
5a2ff178e0 Changed test identifier separator # -> :
Compare:
- test_dirs#reentrant_many_dir#1#ggg1ggg8#123456789abcdef
- test_dirs:reentrant_many_dir:1:ggg1ggg8:123456789abcdef
2022-09-09 23:15:16 -05:00
Christopher Haster
c7f7094a06 Several tweaks to test.py and test runner
These are just some minor quality of life improvements

- Added a "make build-test" alias
- Made test runner a positional arg for test.py since it is almost
  always required. This shortens the command line invocation most of the
  time.
- Added --context to test.py
- Renamed --output in test.py to --stdout, note this still merges
  stderr. Maybe at some point these should be split, but it's not really
  worth it for now.
- Reworked the test_id parsing code a bit.
- Changed the test runner --step to take a range such as -s0,12,2
- Changed tracebd.py --block and --off to take ranges
2022-09-08 19:54:07 -05:00
Christopher Haster
a208d848e5 Reworked test defines a bit to use one common array layout
Previously didn't think this would work without making test.py aware of
the number of implicit defines, which risks being incredibly fragile.
Fortunately it turns out we can defer the actual array size calculation
until the C preprocessor. This simplifies a few things.

Also a bitmap-based caching layer for the defines. Since the test
defines have been upgraded to callbacks recursive defines risk spending
a decent amount of time evaluating on every lookup. Some quick testing
shows 408015154 hits to 46160 misses so that's a good sign.

Also changed the geometries to be their own leb16-encoded part of the
test identifier. This means any geometry can be captured and reproduced
with just the test identifier. Here are the current test geometries:

./runners/test_runner --list-geometries
geometry                    read    prog   erase   count        size  leb16
d,default                     16      16     512    2048     1048576  g1gg2
e,eeprom                       1       1     512    2048     1048576  1gg2
E,emmc                       512     512     512    2048     1048576  gg2
n,nor                          1       1    4096     256     1048576  1ggg1
N,nand                      4096    4096   32768      32     1048576  ggg1ggg8
2022-09-07 01:52:53 -05:00
Christopher Haster
91200e6678 Added tracebd.py, a script for rendering block device operations
Based on a handful of local hacky variations, this sort of trace
rendering is surprisingly useful for getting an understanding of how
different filesystem operations interact with the underlying
block-device.

At some point it would probably be good to reimplement this in a
compiled language. Parsing and tracking the trace output quickly
becomes a bottleneck with the amount of trace output the tests
generate.

Note also that since tracebd.py run on trace output, it can also be
used to debug logged block-device operations post-run.
2022-09-07 01:52:53 -05:00
Christopher Haster
c9a6e3a95b Added tailpipe.py and improved redirecting test trace/log output over fifos
This mostly involved futzing around with some of the less intuitive
parts of Unix's named-pipes behavior.

This is a bit important since the tests can quickly generate several
gigabytes of trace output.
2022-09-07 01:52:49 -05:00