107 Commits

Author SHA1 Message Date
Christopher Haster
71930a5c01 scripts: Tweaked openio comment
Dang, this touched like every single script.
2025-04-16 15:23:06 -05:00
Christopher Haster
b715e9a749 scripts: Prefer 1;30-37m ansi codes over 90-97m
Reading Wikipedia:

> Later terminals added the ability to directly specify the "bright"
> colors with 90–97 and 100–107.

So if we want to stick to one pattern, we should probably go with
brightness as a separate modifier.

This shouldn't noticeably change any script, unless your terminal
interprets 90-97m colors differently from 1;30-37m, in which case things
should be more consistent now.
2025-04-16 15:22:43 -05:00
Christopher Haster
1ac3aae92b scripts: test.py/bench.py: Added -e/--exec shortform flag
Why not, -e/--exec seems useful/general purpose enough to deserve a
shortform flag. Especially since much of our testing involves emulation.

The only risk of conflicts is with -e/--error-* in other scripts, but
the _whole point_ of test.py is to error on failure, so I don't think
this will be an issue.

Note that -E may be more useful for environment variables in the future.

I feel like -e/--exec was more common in other programs, but I've only
found sed -e and perl -e so far. Most programs stick to -c/--command
(bash, python) which would conflict with -c/--compile here.
2025-04-16 15:22:10 -05:00
Christopher Haster
313696ecf9 scripts: Fixed openio issue where some scripts didn't import os
This only failed if "-" was used as an argument (for stdin/stdout), so
the issue was pretty hard to spot.

openio is a heavily copy-pasted function, so it makes sense to just add
the import os to openio directly. Otherwise this mistake will likely
happen again in the future.
2025-03-12 21:18:51 -05:00
Christopher Haster
9e22167a31 scripts: Re-adopted result prefixes
Now that I'm looking into some higher-level scripts, being able to merge
results without first renaming everything is useful.

This gives most scripts an implicit prefix for field fields, but _not_
by fields, allowing easy merging of results from different scripts:

  $ ./scripts/stack.py lfs.ci -o-
  function,stack_frame,stack_limit
  lfs_alloc,288,1328
  lfs_alloc_discard,8,8
  lfs_alloc_findfree,16,32
  ...

At least now these have better support in scripts with the addition of
the --prefix flag (this was tricky for csv.py), which allows explicit
control over field field prefixes:

  $ ./scripts/stack.py lfs.ci -o- --prefix=
  function,frame,limit
  lfs_alloc,288,1328
  lfs_alloc_discard,8,8
  lfs_alloc_findfree,16,32
  ...

  $ ./scripts/stack.py lfs.ci -o- --prefix=wonky_
  function,wonky_frame,wonky_limit
  lfs_alloc,288,1328
  lfs_alloc_discard,8,8
  lfs_alloc_findfree,16,32
  ...
2025-03-12 19:10:17 -05:00
Christopher Haster
ac30a20d12 scripts: Reworked to support optional json input/output
Guh

This may have been more work than I expected. The goal was to allowing
passing recursive results (callgraph info, structs, etc) between
scripts, which is simply not possible with csv files.

Unfortunately, this raised a number of questions: What happens if a
script receives recursive results? -d/--diff with recursive results?
How to prevent folding of ordered results (structs, hot, etc) in piped
scripts? etc.

And ended up with a significant rewrite of most of the result scripts'
internals.

Key changes:

- Most result scripts now support -O/--output-json in addition to
  -o/--json, with -O/--output-json including any recursive results in
  the "children" field.

- Most result scripts now support both csv and json as input to relevant
  flags: -u/--use, -d/--diff, -p/--percent. This is accomplished by
  looking for a '[' as the first character to decide if an input file is
  json or csv.

  Technically this breaks if your json has leading whitespace, but why
  would you ever keep whitespace around in json? The human-editability
  of json was already ruined the moment comments were disallowed.

- csv.py requires all fields to be explicitly defined, so added
  -i/--enumerate, -Z/--children, and -N/--notes. At least we can provide
  some reasonable defaults so you shouldn't usually need to type out the
  whole field.

- Notably, the rendering scripts (plot.py, treemapd3.py, etc) and
  test/bench scripts do _not_ support json. csv.py can always convert
  to/from json when needed.

- The table renderer now supports diffing recursive results, which is
  nice for seeing how the hot path changed in stack.py/perf.py/etc.

- Moved the -r/--hot logic up into main, so it also affects the
  outputted results. Note it is impossible for -z/--depth to _not_
  affect the outputted results.

- We now sort in one pass, which is in theory more efficient.

- Renamed -t/--hot -> -r/--hot and -R/--reverse-hot, matching -s/-S.

- Fixed an issue with -S/--reverse-sort where only the short form was
  actually reversed (I misunderstood what argparse passes to Action
  classes).

- csv.py now supports json input/output, which is funny.
2025-03-12 19:09:43 -05:00
Christopher Haster
86f3bad2a4 scripts: Adopted Attr rework in plot.py/plotmpl.py
Unifying these complicated attr-assigning flags across all the scripts
is the main benefit of the new internal Attr system.

The only tricky bit is we need to somehow keep track of all input fields
in case % modifiers reference fields, when we could previously discard
non-data fields.

Tricky but doable.

Updated flags:

- -L/--label -> -L/--add-label
- --colors -> -C/--add-color
- --formats -> -F/--add-format
- --chars -> -*/--add-char/--chars
- --line-chars -> -_/--add-line-char/--line-chars

I've also tweaked Attr to accept glob matches when figuring out group
assignments. This is useful for matching slightly different, but
similarly named results in our benchmark scripts.

There's probably a clever way to do this by injecting new by fields with
csv.py, but just adding globbing is simpler and makes attr assignment
even more flexible.
2025-03-11 18:09:18 -05:00
Christopher Haster
5aada6f54a test.py/bench.py: Limited -d/--disk and -t/--trace to one thread
It doesn't really make sense to write to disk/trace files with multiple
threads, the result usually ends up clobbered and useless.

If we only pass disk/trace files to the first thread, the result is at
at least useable, even if it only represents 1/j tests.

This is actually quite a nice way to sample filesystem images in
multithreaded tests.

As a side effect, this also changes test.py/bench.py to no longer pass
-d/--disk or -t/--trace to runner queries, which is probably a good
thing? These should be ignored in queries anyways.
2025-02-08 14:53:47 -06:00
Christopher Haster
42c81ef7de scripts: Switched to tomllib/tomli for toml parsing
Found a bug in our toml parser that's difficult to work around:

  defines.GC_FLAGS = """      =>  {
      LFS_GC_MKCONSISTENT             "GC_FLAGS": "blablabla",
          | LFS_GC_LOOKAHEAD      }   // where did defines go?
  """

This appears to be this bug:

https://github.com/uiri/toml/issues/286

But since it was opened 4 years ago, I think it's safe to say this toml
library is now defunct...

---

Apparently tomllib/tomli is the new hotness, which started as tomli
before being adopt in Python 3.11 as tomllib. Fortunately tomli is still
maintained so we don't have to worry about Python versions too much.

Adopting tomli was relatively straightforward, the only hiccup being
that it doesn't support text files? Curious, but fortunately Python
exposes the underlying binary file handle in f.buffer.
2025-01-28 14:41:45 -06:00
Christopher Haster
361cd3fec0 scripts: Added missing sys imports
Unfortunately the import sys in the argparse block was hiding missing
sys imports.

The mistake was assuming the import sys in Python would limit the scope
to that if block, but Python's late binding strikes again...
2025-01-28 14:41:45 -06:00
Christopher Haster
62cc4dbb14 scripts: Disabled local import hack on import
Moved local import hack behind if __name__ == "__main__"

These scripts aren't really intended to be used as python libraries.
Still, it's useful to import them for debugging and to get access to
their juicy internals.
2025-01-28 14:41:30 -06:00
Christopher Haster
25814ed5cb scripts: Fixed failed subprocess stderr, unconditionally forward
It looks like the failure case in our scripts' subprocess stderr
handling was not tested well during a fix to stderr blocking (a735bcd).

This code was attempting to print stderr only if an error occured, but
with stderr=None this just results in a NoneType TypeError.

In retrospect, completely hiding stderr is kind of shitty if a
subprocess fails, but it doesn't seem possible to read from both stdin
and stderr with Python's APIs without getting stuck when the stderr's
buffer is full.

It might be possible to work around this with either multithreading,
select calls, or a temp file, but I'm not sure slightly less verbose
scripts are worth the added complexity in every single subprocess call.

For now just reverting to unconditionally forwarding stderr from the
child process. This is the simplest/most robust option.
2024-12-14 15:08:39 -06:00
Christopher Haster
51b8cdb1f0 scripts: Added -q/--quiet to test.py/bench.py
This will probably only have niche uses, but may be useful for small
test sets or for running specific tests with -O-.

Though it is a bit funny that -q -O- turns test.py/bench.py into more or
less just a complicated way to run a C program.
2024-11-17 23:50:32 -06:00
Christopher Haster
0b450b1184 scripts: Reverted full C exprs in test/bench define ranges
A couple problems:

1. We should probably also support negative ranges, but this is a bit
   annoying since we can't tell if the range is negative or positive
   until expr evaluation.

2. Evaluating the range exprs at compile-time is inconsistent from other
   C exprs in our tests/benches (normal defines, if filters, etc), and
   severely limiting since we can't use other defines before the define
   system is initialized.

2. Attempting to move these range exprs into their own lazily evaluated
   functions does not seem tractable...

   We'd need to evaluate defines to know how many permutations there
   are, but how can we evaluate defines before knowing which permutation
   we're on?

   I think this circular dependency would make the permutation count
   undecidable?

Even if we could move these exprs to their own lazily evaluated
functions (which would solve the inconsistency issue), the complexity
risks outweighing the benefit. Keep in mind it's useful if external
tools can parse our tests. So reverting for now.

Though I am keeping some of the refactoring in test.py/bench.py. Having
a special DRange type is useful if we ever want to add more define
functions in the future.
2024-11-17 23:36:57 -06:00
Christopher Haster
608d8a2bc1 scripts: Enabled full C exprs in test/bench define ranges
This enables full C exprs in test/bench define ranges by simply passing
them on to the C compiler.

So this:

  defines.N = 'range(1,20+1)'

Becomes this, in N's define function:

  if (i < 0 + ((((20+1)-1-(1))/(1) + 1))) return ((i-(0))*(1) + (1));

Which is a bit of a mess, but generates the correct range at runtime.

This allows for much more flexible exprs in range defines without
needing a full expr parser in Python.

Note though that we need to evaluate the range length at compile time.
This is notably before the test/bench define system is initialized, so
all three range args (start, stop, step) are limited to really only
simple C literals and exprs.
2024-11-17 14:36:47 -06:00
Christopher Haster
7cfcc1af1d scripts: Renamed summary.py -> csv.py
This seems like a more fitting name now that this script has evolved
into more of a general purpose high-level CSV tool.

Unfortunately this does conflict with the standard csv module in Python,
breaking every script that imports csv (which is most of them).
Fortunately, Python is flexible enough to let us remove the current
directory before imports with a bit of an ugly hack:

  # prevent local imports
  __import__('sys').path.pop(0)

These scripts are intended to be standalone anyways, so this is probably
a good pattern to adopt.
2024-11-09 12:31:16 -06:00
Christopher Haster
007ac97bec scripts: Adopted double-indent on multiline expressions
This matches the style used in C, which is good for consistency:

  a_really_long_function_name(
          double_indent_after_first_newline(
              single_indent_nested_newlines))

We were already doing this for multiline control-flow statements, simply
because I'm not sure how else you could indent this without making
things really confusing:

  if a_really_long_function_name(
          double_indent_after_first_newline(
              single_indent_nested_newlines)):
      do_the_thing()

This was the only real difference style-wise between the Python code and
C code, so now both should be following roughly the same style (80 cols,
double-indent multiline exprs, prefix multiline binary ops, etc).
2024-11-06 15:31:17 -06:00
Christopher Haster
48c2e7784b scripts: Renamed import math alias m -> mt
Mainly to avoid conflicts with match results m, this frees up the single
letter variables m for other purposes.

Choosing a two letter alias was surprisingly difficult, but mt is nice
in that it somewhat matches it (for itertools) and ft (for functools).
2024-11-05 01:58:40 -06:00
Christopher Haster
6e2af5bf80 Carved out ckreads, disabled at compile-time by default
This moves all ckread-related logic behind the new opt-in compile-time
LFS_CKREADS flag. So in order to use ckreads you need to 1. define
LFS_CKREADS at compile time, and 2. pass LFS_M_CKREADS during
lfsr_mount.

This was always the plan since, even if ckreads worked perfectly, it
adds a significant amount of baggage (stack mostly) to track the
ck context of all reads.

---

This is the first non-trivial opt-in define in littlefs, so more test
framework features!

test.py and build.py now support the optional ifdef attribute, which
makes it easy to indicate a test suite/case should not be compiled when
a feature is missing.

Also interesting to note is the addition of LFS_IFDEF_CKREADS, which
solves several issues (and general ugliness) related to #ifdefs in
expression. For example:

  // does not compile :( (can't embed ifdefs in macros)
  LFS_ASSERT(flags == (
          LFS_M_CKPROGS
              #ifdef LFS_CKREADS
              | LFS_M_CKREADS
              #endif
              ))

  // does compile :)
  LFS_ASSERT(flags == (
          LFS_M_CKPROGS
              | LFS_IFDEF_CKREADS(LFS_M_CKREADS, 0)));

---

This brings us way back down to our pre-ckread levels of code/stack:

                   code          stack
  before-ckreads: 36352           2672
  ckreads:        38060 (+4.7%)   3056 (+14.4%)
  after-ckreads:  36428 (+0.2%)   2680 (+0.3%)

Unfortunately, we do end up with a bit more code cost than where we
started. Mainly due to code moving around to support the ckread
infrastructure:

                   code          stack
  lfsr_bd_readtag:  +52 (+23.2%)    +8 (+10.0%)
  lfsr_rbyd_fetch:  +36 (+5.0%)     +8 (+6.2%, cold)
  lfs_toleb128:     -12 (-25.0%)    -4 (-20.0%, cold)
  total:            +76 (+0.2%)     +8 (+0.3%)

But oh well. Note that some of these changes are good even without
ckreads, such as only parsing the last ecksum tag.
2024-08-16 01:04:03 -05:00
Christopher Haster
a735bcd667 Fixed hanging scripts trying to parse stderr
code.py, specifically, was getting messed up by inconsequential GCC
objdump errors on Clang -g3 generated binaries.

Now stderr from child processes is just redirected to /dev/null when
-v/--verbose is not provided.

If we actually depended on redirecting stderr->stdout these scripts
would have been broken when -v/--verbose was provided anyways. Not
really sure what the original code was trying to do...
2024-06-20 13:04:07 -05:00
Christopher Haster
54d77da2f5 Dropped csv field prefixes in scripts
The original idea was to allow merging a whole bunch of different csv
results into a single lfs.csv file, but this never really happened. It's
much easier to operate on smaller context-specific csv files, where the
field prefix:

- Doesn't really add much information
- Requires more typing
- Is confusing in how it doesn't match the table field names.

We can always use summary.py -fcode_size=size to add prefixes when
necessary anyways.
2024-06-02 19:19:46 -05:00
Christopher Haster
3c5319e125 Tweaked test/bench id globbing to avoid duplicating cases
Before, globs that match both the suite name and case name would cause
end up running the case twice. Which is a bit of a problem, since all
cases contain their suite name as a prefix...

  test_f* => run test_files
             |-> run test_files_hello
             |-> run test_files_trunc
             ...
             run test_files_hello
             run test_files_trunc
             ...

Now we only run matching test cases if no suites were found.

This has the side-effect of making the universal glob, "*", equivalent
to no test ids, which is nice:

  $ ./scripts/test.py -j -b '*'  # equivalent
  $ ./scripts/test.py -j -b      #

This is useful for running a specific problematic test first before
running the all of the tests:

  $ ./scripts/test.py -j -b test_files_trunc '*'
2024-05-29 23:09:45 -05:00
Christopher Haster
31eebc1328 Added -a/--all to test.py/bench.py for bypass test/bench filters
These really shouldn't be used all that often. Test filters are usually
used to protect against invalid test configurations, so if you bypass
test filters, expect things to fail!

But some filters just prevent test cases from taking too long. In these
cases being able to manually bypass the filter is useful for debugging/
benchmarking/etc...
2024-05-28 16:46:40 -05:00
Christopher Haster
e247135805 Fixed test.py crashing on malformed test ids
Sometimes, if test_runner errors before running any tests, the last test
id can end up being None. This broke test output writing, which expected
to be able to parse an id. Instead we should just ignore the malformed
id (it's not like we can write anything relevant about any tests here),
and report it to the user at a higher level.
2024-05-28 14:50:57 -05:00
Christopher Haster
9c9a409524 Added fuzz test attribute
This acts as a marker to indicate a fuzz test. It should reference a
define, usually SEED, that can be randomized to get interesting test
permutations.

This is currently unused, but could lead to some interesting uses such
as time-based fuzz testing. It's also just useful for inspecting the
tests (make test-list).
2024-05-28 12:44:44 -05:00
Christopher Haster
a5fe2706bd Added runtime measurements to test.py -o/--output
Now that we have ~20 minutes of tests, it's good to know _why_ the tests
take ~20 minutes, and if this time is being spent well.

This adds the field test_time to test.py's -o/--output, which reports
the runtime of each test in seconds. This can be organized by suite,
case, etc, with our existing csv scripts.

Note I've limited the precision to only milliseconds (%.6f).
Realistically, this is plenty of precision, and with the number of tests
we have extra digits can really add up!

                             lines                   bytes
  test.csv before:          525593          58432541 56MiB
  test.csv full precision:  525593 (+0.0%)  69817693 67MiB (+19.5%)
  test.csv milli precision: 525593 (+0.0%)  63162935 60MiB (+8.1%)

It still takes a bit of time to process this (50.3s), but now we can see
the biggest culprits of our ~20 minute test time:

  $ ./scripts/summary.py test.csv -bcase -ftest_time -S
  case                                               test_time
  ...
  test_fwrite_hole_compaction                             74.4
  test_fwrite_incr                                       109.7
  test_dirs_mkdir_fuzz                                   115.3
  test_fwrite_overwrite_compaction                       132.4
  test_rbyd_fuzz_append_removes                          134.0
  test_rbyd_fuzz_mixed                                   136.3
  test_rbyd_fuzz_sparse                                  137.4
  test_fwrite_w_seek                                     144.1
  test_rbyd_fuzz_create_deletes                          144.8
  test_dirs_rm_many_backwards                            208.4
  test_dirs_rm_many                                      273.8
  test_fwrite_fuzz_unaligned                             283.2
  test_dread_recursive_rm                                316.7
  test_fwrite_fuzz_aligned                               551.0
  test_dirs_general_fuzz                                 552.8
  test_dirs_rm_fuzz                                      632.7
  test_fwrite_reversed                                   719.0
  test_dirs_mv_fuzz                                     1984.8
  TOTAL                                                 7471.3

Note this machine has 6 cores, 12 hthreads, 7471.3/60/6 => 20.8m, which
is why I don't run these tests single threaded.
2024-05-11 23:37:59 -05:00
Christopher Haster
c3dc7cca10 Fixed underflow issue with truncating test/bench -C/--context
There was no check on context > stdout, so requesting more context than
was actually printed by the test could result in a negative value.
Python "helpfully" interpreted this as a negative index, resulting in
somewhat random context lengths.

This, combined with my tendency to just default to a large number like
--context=100, led to me thinking a test was printing much less than it
actually was...

Don't get me wrong, I love Python, and I think Python's negative indices
are a clever way to add flexibility to slice notation, but the
value-dependent semantics are a pretty unfortunate footgun...
2024-04-09 20:04:07 -05:00
Christopher Haster
2dcde5579b Fixed issue with test.py/bench.py -f/--fail not killing runners
While the -f/--fail logic was correctly terminating the test.py/bench.py
runner thread, it was not terminating the actual underlying test
process. This was causing test.py/bench.py to hang until the test runner
completed all pending tests, which could take quite some time.

This wasn't noticed earlier because test.py/bench.py still reports the
test as failed, and most uses of -f/--fail involve specifying a specific
test case, which usually terminates quite quickly.

What's more interesting is this termination logic was copied from the
handling of ctrl-C/SIGINT/KeyboardInterrupt, but this issue is not
present there because SIGINT would be sent to all processes in the
process tree, terminating the child process anyways.

Fixed by adding an explicit proc.kill() to test.py/bench.py before
tearing down the runner thread.
2024-04-01 17:15:13 -05:00
Christopher Haster
531c2bcc4c Quieted test.py/bench.py status when stdout is aimed at stdout
This is a condition for specifically the -O- pattern. Doing anything
fancier would be too much, so anything clever such as -O/dev/stdout
will still be clobbered.

This was a common enough pattern and the status updates clobbering
stdout was annoying enough that I figured this warranted a special case.
2024-03-20 13:58:22 -05:00
Christopher Haster
76593711ab Added -f/--fail to test.py/bench.py
This just tells test.py/bench.py to pretend the test failed and trigger
any conditional utilities. This can be combined with --gdb to easily
inspect a test that isn't actually failing.

Up until this point I've just been inserting assert(false) when needed,
which is clunky.
2024-03-20 13:50:04 -05:00
Christopher Haster
1422a61d16 Made generated prettyasserts more debuggable
The main star of the show is the adoption of __builtin_trap() for
aborting on assert failure. I discovered this GCC/Clang extension
recently and it integrates much, _much_ better with GDB.

With stdlib's abort(), GDB drops you off in several layers of internal
stdlib functions, which is a pain to navigate out of to get to where the
assert actually happened. With __builtin_trap(), GDB stops immediately,
making debugging quick and easy.

This is great! The pain of debugging needs to come from understanding
the error, not just getting to it.

---

Also tweaked a few things with the internal print functions to make
reading the generated source easier, though I realize this is a rare
thing to do.
2024-02-14 01:14:36 -06:00
Christopher Haster
06a360462a Simplified test/bench suite finding logic in test.py/bench.py
These just take normal paths now, we weren't even using the magic
test/bench suite finding logic since it's easier to just pass everything
explicitly in our Makefile.

The original test/bench suite finding logic was a bad idea anyways. This
is what globs are for, and having custom path chasing logic is
inconsistent and risks confusion.
2024-02-14 00:25:10 -06:00
Christopher Haster
a124ee54e7 Reworked test/bench defines to map to global variables
Motivation:

- Debuggability. Accessing the current test/bench defines from inside
  gdb was basically impossible for some dumb macro-debug-info reason I
  can't figure out.

  In theory, GCC provides a .debug_macro section when compiled with -g3.
  I can see this section with objdump --dwarf=macro, but somehow gdb
  can't seem to find any definitions? I'm guess the #line source
  remapping is causing things to break somehow...

  Though even if macro-debugging gets fixed, which would be valuable,
  accessing defines in the current test/bench runner can trigger quite
  a bit of hidden machinery. This risks side-effects, which is never
  great when debugging.

  All of this is quite annoying because the test/bench defines is
  usually the most important piece of information when debugging!

  This replaces the previous hidden define machinery with simple global
  variables, which gdb can access no problem.

- Also when debugging we no longer awkwardly step into the test_define
  function all the time!

- In theory, global variables, being a simple memory access, should be
  quite a bit faster than the hidden define machinery. This does matter
  because running tests _is_ a dev bottleneck.

  In practice though, any performance benefit is below the noise floor,
  which isn't too surprising (~630s +-~20s).

- Using global variables for defines simplifies the test/bench runner
  quite a bit.

  Though some of the previous complexity was due to a whole internal
  define caching system, which was supposed to lazily evaluate test
  defines to avoid evaluating defines we don't use. This all proved to
  be useless because the first thing we do when running each test is
  evaluate all defines to generate the test id (lol).

So now, instead of lazily evaluating and caching defines, we just
generate global variables during compilation and evaluate all defines
for each test permutation immediately before running.

This relies heavily on __attribute__((weak)) symbols, and lets the
linker really shine.

As a funny perk this also effectively interns all test/bench defines by
the address of the resulting global variable. So we don't even need to
do string comparisons when mapping suite-level defines to the
runner-level defines.

---

Perhaps the more interesting thing to note, is the change in strategy in
how we actually evaluate the test defines.

This ends up being a surprisingly tricky problem, due to the potential
of mutual recursion between our defines.

Previously, because our define machinery was lazy, we could just
evaluate each define on demand. If a define required another define, it
would lazily trigger another evaluation, implicitly recursing through
C's stack. If cyclic, this would eventually lead to a stack overflow,
but that's ok because it's a user error to let this happen.

The "correct" way, at least in terms of being computationally optimal,
would be to topologically sort the defines and evaluate the resulting
tree from the leaves up.

But I ain't got time for that, so the solution here is equal parts
hacky, simple, and effective.

Basically, we just evaluate the defines repeatedly until they stop
changing:

- Initially, mutually recursive defines may read the uninitialized
  values of their dependencies, and end up with some arbitrarily wrong
  result. But as the defines are repeatedly evaluated, assuming no
  cycles, the correct results should eventually bubble up the tree until
  all defines converge to the correct value.

- This is O(n*e) vs O(n+e), but our define graph is usually quite
  shallow.

- To prevent non-halting, we error after an arbitrary 1000 iterations.
  If you hit this, it's likely because there is a cycle in the define
  graph.

  This is runtime configurable via the new --define-depth flag.

- To keep things consistent and reproducible, we zero initialize all
  defines before the first evaluation.

  I don't think this is strictly necessary, but it's important for the
  test runner to have the exact same results on every run. No one wants
  a "works on my machine" situation when the tests are involved.

Experimentation shows we only need an evaluation depth of 2 to
successfully evaluate the current set of defines:

  $ ./runners/test_runner --list-defines --define-depth=2

And any performance impact is negligible (~630s +-~20s).
2024-02-13 18:59:58 -06:00
Christopher Haster
724fc5fc91 Hide gdb info header from test.py/bench.py --gdb
This was too noisy when intermingled with other debug output
test.py/bench.py prints when dropping into gdb.
2024-02-03 18:14:56 -06:00
Christopher Haster
161cd9e6da Fixed race condition killing test processes in test/bench.py
Note sure why we weren't hitting this earlier, but I've been hitting
this race condition a bunch recently and it's annoying.

Now every failed process kills the other test processes unconditionally.

It's not clear if this actually _fixes_ the race condition or just makes
it less likely, but it's good enough to keep the test script user
friendly.
2023-12-17 15:18:26 -06:00
Christopher Haster
d485795336 Removed concept of geometries from test/bench runners
This turned out to not be all that useful.

Tests already take quite a bit to run, which is a good thing! We have a
lot of tests! 942.68s or ~15 minutes of tests at the time of writing to
be exact. But simply multiplying the number of tests by some number of
geometries is heavy handed and not a great use of testing time.

Instead, tests where different geometries are relevant can parameterize
READ_SIZE/PROG_SIZE/BLOCK_SIZE at the suite level where needed. The
geometry system was just another define parameterization layer anyways.

Testing different geometries can still be done in CI by overriding the
relevant defines anyways, and it _might_ be interesting there.
2023-12-06 22:23:41 -06:00
Christopher Haster
6d81b0f509 Changed --context short flag to -C in scripts
This matches diff and grep, and avoids lower-case conflicts in
test.py/bench.py.
2023-11-06 01:59:03 -06:00
Christopher Haster
d1b9a2969f Added -F/--failures to test.py/bench.py to limit failures when -k/--keep-going
The -k/--keep-going option has been more or less useless before this
since it would completely flood the screen/logs when a bug triggers
multiple test failures, which is common.

Some things to note:

- RAM management is tricky with -k/--keep-going, if we try to save logs
  and filter after running everything we quickly fill up memory.

- Failing test cases are a much slower path than successes since we need
  to kill and restart the underlying test_runner, its state can't be
  trusted anymore. This is a-ok since hopefully you usually hope for
  many more successes than failures. Unfortunately it can make
  -k/--keep-going quite slow.

---

ALSO -- warning this is a tangent rant-into-the-void -- I have
discovered that Ubuntu has a "helpful" subsystem named Apport that tries
to record/log/report any process crash in the system. It is "disabled" by
default, but the way it's disabled requires LAUNCHING A PYTHON
INTERPRETER to check a flag on every segfault/assert failure.

This is what it does when it's "disabled"!

This subsystem is fundamentally incompatible with any program that
intentionally crashes subprocesses, such as our test runner. The sheer
amount of python interpreters being launched quickly eats through all
available RAM and starts OOM killing half the processes on the system.

If anyone else runs into this, a shallow bit of googling suggests the
best solution is to just disable Apport. It is not a developer friendly
subsystem:

  $ sudo systemctl disable apport.service

Removing Apport brings RAM usage back down to a constant level, even
with absurd numbers of test failures. And here I thought I had memory
leak somewhere.
2023-11-06 01:55:28 -06:00
Christopher Haster
1e4d4cfdcf Tried to write errors to stderr consistently in scripts 2023-11-05 15:55:07 -06:00
Christopher Haster
fb9277feac Tweaked test.py/bench.py to allow no suites to test compilation
This is mainly to allow bench_runner to at least compile after moving
benches out of tree.

Also cleaned up lingering runner/suite munging leftover from the change
to an optional -R/--runner parameter.
2023-11-03 11:15:45 -05:00
Christopher Haster
39f417db45 Implemented a filesystem traversal that understands file bptrs/btrees
Ended up changing the name of lfsr_mtree_traversal_t -> lfsr_traversal_t,
since this behaves more like a filesytem-wide traversal than an mtree
traversal (it returns several typed objects, not mdirs like the other
mtree functions for one).

As a part of this changeset, lfsr_btraversal_t (was lfsr_btree_traversal_t)
and lfsr_traversal_t no longer return untyped lfsr_data_ts, but instead
return specialized lfsr_{b,t}info_t structs. We weren't even using
lfsr_data_t for its original purpose in lfsr_traversal_t.

Also changed lfsr_traversal_next -> lfsr_traversal_read, you may notice
at this point the changes are intended to make lfsr_traversal_t look
more like lfsr_dir_t for consistency.

---

Internally lfsr_traversal_t now uses a full state machine with its own
enum due to the complexity of traversing the filesystem incrementally.

Because creating diagrams is fun, here's the current full state machine,
though note it will need to be extended for any
parity-trees/free-trees/etc:

  mrootanchor
       |
       v
  mrootchain
  .-'  |
  |    v
  |  mtree ---> openedblock
  '-. | ^           | ^
    v v |           v |
   mdirblock    openedbtree
      | ^
      v |
   mdirbtree

I'm not sure I'm happy with the current implementation, and eventually
it will need to be able to handle in-place repairs to the blocks it
sees, so this whole thing may need a rewrite.

But in the meantime, this passes the new clobber tests in test_alloc, so
it should be enough to prove the file implementation works. (which is
definitely is not fully tested yet, and some bugs had to be fixed for
the new tests in test_alloc to pass).

---

Speaking of test_alloc.

The inherent cyclic dependency between files/dirs/alloc makes it a bit
hard to know what order to test these bits of functionality in.

Originally I was testing alloc first, because it seems you need to be
confident in your block allocator before you can start testing
higher-level data structures.

But I've gone ahead and reversed this order, testing alloc after
files/dirs. This is because of an interesting observation that if alloc
is broken, you can always increase the test device's size to some absurd
number (-DDISK_SIZE=16777216, for example) to kick the can down the
road.

Testing in this order allows alloc to use more high-level APIs and
focus on corner cases where the allocator's behavior requires subtlety
to be correct (e.g. ENOSPC).
2023-10-14 01:13:40 -05:00
Christopher Haster
52113c6ead Moved the test/bench runner path behind an optional flag
So now instead of needing:

  ./scripts/test.py ./runners/test_runner test_dtree

You can just do:

  ./scripts/test.py test_dtree

Or with an explicit path:

  ./scripts/test.py -R./runners/test_runner test_dtree

This makes it easier to run the script manually. And, while there may be
some hiccups with the implicit relative path, I think in general this will
make the test/bench scripts easier to use.

There was already an implicit runner path, though only if the test suite
was completely omitted. I'm not sure that would ever have actually
been useful...

---

Also increased the permutation field size in --list-*, since I noticed it
was overflowing.
2023-10-14 00:54:28 -05:00
Christopher Haster
e7bf5ad82f Added scripts/crc32c.py
This seems like a useful script to have.
2023-09-15 18:42:48 -05:00
Christopher Haster
528f104cb4 Enabled internal test code at the suite-level
Test suites already had the ability to provide suite-level code via the
"code" attribute, but this was placed in the suite's generated source
file, making it inaccessbile to internal tests.

This change allows suite code to be placed in the same place as internal
tests, via the "in" attribute, though this has some caveats:

1. Suite-level code generally declares helper functions in global scope.
   We don't parse this code or anything, so name collisions between
   helper functions across different test suites is up to the developer
   to resolve.

2. Internal suite-level code has access to internal functions/variables/
   etc, this means we can't place a copy in our suite's generate source
   and expect it to compile. For this reason, internal suite-level code
   is unavailable for non-internal tests in the suite.

   This also means you only get to place internal suite-level code in a
   single source file. Though this is not really an issue since littlefs
   is basically a single file...
2023-08-19 12:20:13 -05:00
Christopher Haster
4efb55e0d7 In tests/benches, renamed cfg -> CFG
This is to better indicate this is a runner generated variable.
2023-08-04 14:05:07 -05:00
Christopher Haster
1c128afc90 Renamed internal runner field filter -> if_
This makes it more consistent with the actual test field, at the cost of
the symbol collision.
2023-08-04 13:54:10 -05:00
Christopher Haster
5be7bae518 Replaced tn/bn prefixes with an actual dependency system in tests/benches
The previous system of relying on test name prefixes for ordering was
simple, but organizing tests by dependencies and topologically sorting
during compilation is 1. more flexible and 2. simplifies test names,
which get typed a lot.

Note these are not "hard" dependencies, each test suite should work fine
in isolation. These "after" dependencies just hint an ordering when all
tests are ran.

As such, it's worth noting the tests should NOT error of a dependency is
missing. This unfortunately makes it a bit hard to catch typos, but
allows faster compilation of a subset of tests.

---

To make this work the way tests are linked has changed from using custom
linker section (fun linker magic!) to a weakly linked array appended to
every source file (also fun linker magic!).

At least with this method test.py has strict control over the test
ordering, and doesn't depend on 1. the order in which the linker merges
sections, and 2. the order tests are passed to test.py. I didn't realize
the previous system was so fragile.
2023-08-04 13:33:00 -05:00
Christopher Haster
c5e84e874f Changed how fuzz tests are iterated to allow powerloss-fuzz testing
Instead of iterating over a number of seeds in the test itself, the
seeds are now permuted as a part of normal test defines.

This lets each seed take advantage of other test features, mainly the
ability to test powerlosses heuristically.

This is probably how it should have been done in the first place, but
the permutation tests can't do this since the number of permutations
changes as the size of the test input changes. The test define system
can't handle that very well.

The tradeoffs here are:

- We can't do cross-fuzz checks, such as the balance checks in the rbyd
  tests, though those really should be moved to benchmarks anyways.

- The large number of cheap fuzz permutations skews the total
  permutation count, though I'm not sure this matters.

  before: 3083 permutations (-Gnor)
  after: 409893 permutations (-Gnor)
2023-07-18 21:40:44 -05:00
Christopher Haster
b05db8e3d3 Added support for lists of conditional ifs in test/bench.py
Any conditions in both the suites and cases are anded together to
determine when the test/bench should run.

Accepting a list here makes it easier to compose multiple conditions,
since toml-level elements are a bit easier to modify than strings of
C expressions.
2023-06-01 17:40:51 -05:00
Christopher Haster
07244fb2d4 In test/bench.py, added "internal" flag
This marks internal tests/benches (case.in="lfs.c") with an otherwise-unused
flag that is printed during --summary/--list-*. This just helps identify which
tests/benches are internal.
2023-06-01 17:40:48 -05:00