littlefs

Author	SHA1	Message	Date
Christopher Haster	c87361508b	scripts: test.py/bench.py: Added --no-internal to skip internal tests The --no-internal flag avoids building any internal tests/benches (tests/benches with in="lfs3.c"), which can be useful for quickly testing high-level things while refactoring. Refactors tend to break all the internal tests, and it can be a real pain to update everything. Note that --no-internal can be injected into the build with TESTCFLAGS: TESTCFLAGS=--no-internal make test-runner -j \ && ./scripts/test.py -j -b For a curious data point, here's the current number of internal/non-internal tests: suites cases perms total: 24 808 633968/776298 internal: 22 (91.7%) 532 (65.8%) 220316/310247 (34.8%) non-internal: 2 ( 8.3%) 276 (34.2%) 413652/466051 (65.2%) It's interesting to note that while internal tests have more test cases, the non-internal tests generate a larger number of test permutations. This is probably because internal tests tend to target specific corner cases/known failure points, and don't invite much variants. --- While --no-internal may be useful for high-level testing during a refactor, I'm not sure it's a good idea to rely on it for _debugging_ a refactor. The whole point of internal testing is to catch low-level bugs early, with as little unnecessary state as possible. Skipping these to debug integration tests is a bit counterproductive!	2025-07-20 09:53:53 -05:00
Christopher Haster	7b330d67eb	Renamed config -> cfg Note this includes both the lfs3_config -> lfs3_cfg structs as well as the LFS3_CONFIG -> LFS3_CFG include define: - LFS3_CONFIG -> LFS3_CFG - struct lfs3_config -> struct lfs3_cfg - struct lfs3_file_config -> struct lfs3_file_cfg - struct lfs3_bd_config -> struct lfs3_bd_cfg - cfg -> cfg We were already using cfg as the variable name everywhere. The fact that these names were different was an inconsistency that should be fixed since we're committing to an API break. LFS3_CFG is already out-of-date from upstream, and there's plans for a config rework, but I figured I'd go ahead and change it as well to lower the chances it gets overlooked. --- Note this does _not_ affect LFS3_TAG_CONFIG. Having the on-disk vs driver-level config take slightly different names is not a bad thing.	2025-07-18 18:29:41 -05:00
Christopher Haster	0c19a68536	scripts: test.py/bench.py: Added support for multiple header files Like test.py --gdb-script, being able to specify multiple header files seems useful and is easy enough to add. --- Note that the default is only used if no other header files are specified, so this _replaces_ the default header file: $ ./scripts/test.py --include=my_header.h If you don't want to replace the default header file, you currently need to specify it explicitly: $ ./scripts/test.py \ --include=runners/test_runner.h \ --include=my_header.h	2025-07-04 18:08:11 -05:00
Christopher Haster	0b804c092b	scripts: gdb: Added some useful GDB scripts to test.py --gdb These just invoke the existing dbg*.py python scripts, but allow quick references to variables in the debugginged process: (gdb) dbgflags o file->b.o.flags LFS3_O_RDWR 0x00000002 Open a file as read and write LFS3_o_REG 0x10000000 Type = regular-file LFS3_o_UNSYNC 0x01000000 File's metadata does not match disk Quite neat and useful! This works by injecting dbg.gdb.py via gdb -x, which includes the necessary python hooks to add these commands to gdb. This can be overridden/extended with test.py/bench.py's --gdb-script flag. Currently limited to scripts that seem the most useful for process internals: - dbgerr - Decode littlefs error codes - dbgflags - Decode littlefs flags - dbgtag - Decode littlefs tags	2025-07-04 18:08:04 -05:00
Christopher Haster	8cc81aef7d	scripts: Adopt __get__ binding for write/writeln methods This actually binds our custom write/writeln functions as methods to the file object: def writeln(self, s=''): self.write(s) self.write('\n') f.writeln = writeln.__get__(f) This doesn't really gain us anything, but is a bit more correct and may be safer if other code messes with the file's internals.	2025-06-27 12:56:03 -05:00
Christopher Haster	213dba6f6d	scripts: test.py/bench.py: Added ifndef attribute for tests/benches As you might expect, this is the inverse of ifdef, and is useful for supporting opt-out flags. I don't think ifdef + ifndef is powerful enough to handle _all_ compile-time corner cases, but they at least provide convenient handling for the most common flags. Worst case, tests/benches can always include explicit #if/#ifdef/#ifndef statements in the code itself.	2025-06-24 15:17:04 -05:00
Christopher Haster	6eba1180c8	Big rename! Renamed lfs -> lfs3 and lfsr -> lfs3	2025-05-28 15:00:04 -05:00
Christopher Haster	275ca0e0ec	scripts: bench.py: Fixed issue where cumul results were mixed together Whoops, looks like cumulative results were overlooked when multiple bench measurements per bench were added. We were just adding all cumulative results together! This led to some very confusing bench results. The solution here is to keep track of per-measurement cumulative results via a Python dict. Which adds some memory usage, but definitely not enough to be noticeable in the context of the bench-runner.	2025-05-15 16:16:41 -05:00
Christopher Haster	71930a5c01	scripts: Tweaked openio comment Dang, this touched like every single script.	2025-04-16 15:23:06 -05:00
Christopher Haster	b715e9a749	scripts: Prefer 1;30-37m ansi codes over 90-97m Reading Wikipedia: > Later terminals added the ability to directly specify the "bright" > colors with 90–97 and 100–107. So if we want to stick to one pattern, we should probably go with brightness as a separate modifier. This shouldn't noticeably change any script, unless your terminal interprets 90-97m colors differently from 1;30-37m, in which case things should be more consistent now.	2025-04-16 15:22:43 -05:00
Christopher Haster	1ac3aae92b	scripts: test.py/bench.py: Added -e/--exec shortform flag Why not, -e/--exec seems useful/general purpose enough to deserve a shortform flag. Especially since much of our testing involves emulation. The only risk of conflicts is with -e/--error-* in other scripts, but the _whole point_ of test.py is to error on failure, so I don't think this will be an issue. Note that -E may be more useful for environment variables in the future. I feel like -e/--exec was more common in other programs, but I've only found sed -e and perl -e so far. Most programs stick to -c/--command (bash, python) which would conflict with -c/--compile here.	2025-04-16 15:22:10 -05:00
Christopher Haster	313696ecf9	scripts: Fixed openio issue where some scripts didn't import os This only failed if "-" was used as an argument (for stdin/stdout), so the issue was pretty hard to spot. openio is a heavily copy-pasted function, so it makes sense to just add the import os to openio directly. Otherwise this mistake will likely happen again in the future.	2025-03-12 21:18:51 -05:00
Christopher Haster	9e22167a31	scripts: Re-adopted result prefixes Now that I'm looking into some higher-level scripts, being able to merge results without first renaming everything is useful. This gives most scripts an implicit prefix for field fields, but _not_ by fields, allowing easy merging of results from different scripts: $ ./scripts/stack.py lfs.ci -o- function,stack_frame,stack_limit lfs_alloc,288,1328 lfs_alloc_discard,8,8 lfs_alloc_findfree,16,32 ... At least now these have better support in scripts with the addition of the --prefix flag (this was tricky for csv.py), which allows explicit control over field field prefixes: $ ./scripts/stack.py lfs.ci -o- --prefix= function,frame,limit lfs_alloc,288,1328 lfs_alloc_discard,8,8 lfs_alloc_findfree,16,32 ... $ ./scripts/stack.py lfs.ci -o- --prefix=wonky_ function,wonky_frame,wonky_limit lfs_alloc,288,1328 lfs_alloc_discard,8,8 lfs_alloc_findfree,16,32 ...	2025-03-12 19:10:17 -05:00
Christopher Haster	ac30a20d12	scripts: Reworked to support optional json input/output Guh This may have been more work than I expected. The goal was to allowing passing recursive results (callgraph info, structs, etc) between scripts, which is simply not possible with csv files. Unfortunately, this raised a number of questions: What happens if a script receives recursive results? -d/--diff with recursive results? How to prevent folding of ordered results (structs, hot, etc) in piped scripts? etc. And ended up with a significant rewrite of most of the result scripts' internals. Key changes: - Most result scripts now support -O/--output-json in addition to -o/--json, with -O/--output-json including any recursive results in the "children" field. - Most result scripts now support both csv and json as input to relevant flags: -u/--use, -d/--diff, -p/--percent. This is accomplished by looking for a '[' as the first character to decide if an input file is json or csv. Technically this breaks if your json has leading whitespace, but why would you ever keep whitespace around in json? The human-editability of json was already ruined the moment comments were disallowed. - csv.py requires all fields to be explicitly defined, so added -i/--enumerate, -Z/--children, and -N/--notes. At least we can provide some reasonable defaults so you shouldn't usually need to type out the whole field. - Notably, the rendering scripts (plot.py, treemapd3.py, etc) and test/bench scripts do _not_ support json. csv.py can always convert to/from json when needed. - The table renderer now supports diffing recursive results, which is nice for seeing how the hot path changed in stack.py/perf.py/etc. - Moved the -r/--hot logic up into main, so it also affects the outputted results. Note it is impossible for -z/--depth to _not_ affect the outputted results. - We now sort in one pass, which is in theory more efficient. - Renamed -t/--hot -> -r/--hot and -R/--reverse-hot, matching -s/-S. - Fixed an issue with -S/--reverse-sort where only the short form was actually reversed (I misunderstood what argparse passes to Action classes). - csv.py now supports json input/output, which is funny.	2025-03-12 19:09:43 -05:00
Christopher Haster	86f3bad2a4	scripts: Adopted Attr rework in plot.py/plotmpl.py Unifying these complicated attr-assigning flags across all the scripts is the main benefit of the new internal Attr system. The only tricky bit is we need to somehow keep track of all input fields in case % modifiers reference fields, when we could previously discard non-data fields. Tricky but doable. Updated flags: - -L/--label -> -L/--add-label - --colors -> -C/--add-color - --formats -> -F/--add-format - --chars -> -*/--add-char/--chars - --line-chars -> -_/--add-line-char/--line-chars I've also tweaked Attr to accept glob matches when figuring out group assignments. This is useful for matching slightly different, but similarly named results in our benchmark scripts. There's probably a clever way to do this by injecting new by fields with csv.py, but just adding globbing is simpler and makes attr assignment even more flexible.	2025-03-11 18:09:18 -05:00
Christopher Haster	5aada6f54a	test.py/bench.py: Limited -d/--disk and -t/--trace to one thread It doesn't really make sense to write to disk/trace files with multiple threads, the result usually ends up clobbered and useless. If we only pass disk/trace files to the first thread, the result is at at least useable, even if it only represents 1/j tests. This is actually quite a nice way to sample filesystem images in multithreaded tests. As a side effect, this also changes test.py/bench.py to no longer pass -d/--disk or -t/--trace to runner queries, which is probably a good thing? These should be ignored in queries anyways.	2025-02-08 14:53:47 -06:00
Christopher Haster	42c81ef7de	scripts: Switched to tomllib/tomli for toml parsing Found a bug in our toml parser that's difficult to work around: defines.GC_FLAGS = """ => { LFS_GC_MKCONSISTENT "GC_FLAGS": "blablabla", \| LFS_GC_LOOKAHEAD } // where did defines go? """ This appears to be this bug: https://github.com/uiri/toml/issues/286 But since it was opened 4 years ago, I think it's safe to say this toml library is now defunct... --- Apparently tomllib/tomli is the new hotness, which started as tomli before being adopt in Python 3.11 as tomllib. Fortunately tomli is still maintained so we don't have to worry about Python versions too much. Adopting tomli was relatively straightforward, the only hiccup being that it doesn't support text files? Curious, but fortunately Python exposes the underlying binary file handle in f.buffer.	2025-01-28 14:41:45 -06:00
Christopher Haster	361cd3fec0	scripts: Added missing sys imports Unfortunately the import sys in the argparse block was hiding missing sys imports. The mistake was assuming the import sys in Python would limit the scope to that if block, but Python's late binding strikes again...	2025-01-28 14:41:45 -06:00
Christopher Haster	62cc4dbb14	scripts: Disabled local import hack on import Moved local import hack behind if __name__ == "__main__" These scripts aren't really intended to be used as python libraries. Still, it's useful to import them for debugging and to get access to their juicy internals.	2025-01-28 14:41:30 -06:00
Christopher Haster	25814ed5cb	scripts: Fixed failed subprocess stderr, unconditionally forward It looks like the failure case in our scripts' subprocess stderr handling was not tested well during a fix to stderr blocking (`a735bcd`). This code was attempting to print stderr only if an error occured, but with stderr=None this just results in a NoneType TypeError. In retrospect, completely hiding stderr is kind of shitty if a subprocess fails, but it doesn't seem possible to read from both stdin and stderr with Python's APIs without getting stuck when the stderr's buffer is full. It might be possible to work around this with either multithreading, select calls, or a temp file, but I'm not sure slightly less verbose scripts are worth the added complexity in every single subprocess call. For now just reverting to unconditionally forwarding stderr from the child process. This is the simplest/most robust option.	2024-12-14 15:08:39 -06:00
Christopher Haster	51b8cdb1f0	scripts: Added -q/--quiet to test.py/bench.py This will probably only have niche uses, but may be useful for small test sets or for running specific tests with -O-. Though it is a bit funny that -q -O- turns test.py/bench.py into more or less just a complicated way to run a C program.	2024-11-17 23:50:32 -06:00
Christopher Haster	0b450b1184	scripts: Reverted full C exprs in test/bench define ranges A couple problems: 1. We should probably also support negative ranges, but this is a bit annoying since we can't tell if the range is negative or positive until expr evaluation. 2. Evaluating the range exprs at compile-time is inconsistent from other C exprs in our tests/benches (normal defines, if filters, etc), and severely limiting since we can't use other defines before the define system is initialized. 2. Attempting to move these range exprs into their own lazily evaluated functions does not seem tractable... We'd need to evaluate defines to know how many permutations there are, but how can we evaluate defines before knowing which permutation we're on? I think this circular dependency would make the permutation count undecidable? Even if we could move these exprs to their own lazily evaluated functions (which would solve the inconsistency issue), the complexity risks outweighing the benefit. Keep in mind it's useful if external tools can parse our tests. So reverting for now. Though I am keeping some of the refactoring in test.py/bench.py. Having a special DRange type is useful if we ever want to add more define functions in the future.	2024-11-17 23:36:57 -06:00
Christopher Haster	608d8a2bc1	scripts: Enabled full C exprs in test/bench define ranges This enables full C exprs in test/bench define ranges by simply passing them on to the C compiler. So this: defines.N = 'range(1,20+1)' Becomes this, in N's define function: if (i < 0 + ((((20+1)-1-(1))/(1) + 1))) return ((i-(0))*(1) + (1)); Which is a bit of a mess, but generates the correct range at runtime. This allows for much more flexible exprs in range defines without needing a full expr parser in Python. Note though that we need to evaluate the range length at compile time. This is notably before the test/bench define system is initialized, so all three range args (start, stop, step) are limited to really only simple C literals and exprs.	2024-11-17 14:36:47 -06:00
Christopher Haster	f385f8f778	bench: Tweaked bench.py to include cumulative measurements This was the one piece needed to be able to replace amor.py with csv.py. The missing feature in csv.py is the ability to keep track of a running-sum, but this is a bit of a hack in amor.py considering we otherwise view csv entries as unordered. We could add a running-sum to csv.py, or instead, just include a running sum as a part of our bench output. We have all the information there anyways, and if it simplifies the mess that is our csv scripts, that's a win. --- This also replaces the bench "meas", "iter", and "size" fields with the slightly simpler "m" (measurement? metric?) and "n" fields. It's up to the specific benchmark exactly how to interpret "n", but one field is sufficient for existing scripts.	2024-11-16 17:29:05 -06:00
Christopher Haster	7cfcc1af1d	scripts: Renamed summary.py -> csv.py This seems like a more fitting name now that this script has evolved into more of a general purpose high-level CSV tool. Unfortunately this does conflict with the standard csv module in Python, breaking every script that imports csv (which is most of them). Fortunately, Python is flexible enough to let us remove the current directory before imports with a bit of an ugly hack: # prevent local imports __import__('sys').path.pop(0) These scripts are intended to be standalone anyways, so this is probably a good pattern to adopt.	2024-11-09 12:31:16 -06:00
Christopher Haster	b08c66e387	scripts: Fixed case-level flags in bench.py A typo meant we were setting all case-level flags to suite-level flags in bench.py. And because suite-level flags are more-or-less just ored case-level flags, all case-level flags would end up shared. Fixed via untypo.	2024-11-07 00:16:15 -06:00
Christopher Haster	007ac97bec	scripts: Adopted double-indent on multiline expressions This matches the style used in C, which is good for consistency: a_really_long_function_name( double_indent_after_first_newline( single_indent_nested_newlines)) We were already doing this for multiline control-flow statements, simply because I'm not sure how else you could indent this without making things really confusing: if a_really_long_function_name( double_indent_after_first_newline( single_indent_nested_newlines)): do_the_thing() This was the only real difference style-wise between the Python code and C code, so now both should be following roughly the same style (80 cols, double-indent multiline exprs, prefix multiline binary ops, etc).	2024-11-06 15:31:17 -06:00
Christopher Haster	48c2e7784b	scripts: Renamed import math alias m -> mt Mainly to avoid conflicts with match results m, this frees up the single letter variables m for other purposes. Choosing a two letter alias was surprisingly difficult, but mt is nice in that it somewhat matches it (for itertools) and ft (for functools).	2024-11-05 01:58:40 -06:00
Christopher Haster	6e2af5bf80	Carved out ckreads, disabled at compile-time by default This moves all ckread-related logic behind the new opt-in compile-time LFS_CKREADS flag. So in order to use ckreads you need to 1. define LFS_CKREADS at compile time, and 2. pass LFS_M_CKREADS during lfsr_mount. This was always the plan since, even if ckreads worked perfectly, it adds a significant amount of baggage (stack mostly) to track the ck context of all reads. --- This is the first non-trivial opt-in define in littlefs, so more test framework features! test.py and build.py now support the optional ifdef attribute, which makes it easy to indicate a test suite/case should not be compiled when a feature is missing. Also interesting to note is the addition of LFS_IFDEF_CKREADS, which solves several issues (and general ugliness) related to #ifdefs in expression. For example: // does not compile :( (can't embed ifdefs in macros) LFS_ASSERT(flags == ( LFS_M_CKPROGS #ifdef LFS_CKREADS \| LFS_M_CKREADS #endif )) // does compile :) LFS_ASSERT(flags == ( LFS_M_CKPROGS \| LFS_IFDEF_CKREADS(LFS_M_CKREADS, 0))); --- This brings us way back down to our pre-ckread levels of code/stack: code stack before-ckreads: 36352 2672 ckreads: 38060 (+4.7%) 3056 (+14.4%) after-ckreads: 36428 (+0.2%) 2680 (+0.3%) Unfortunately, we do end up with a bit more code cost than where we started. Mainly due to code moving around to support the ckread infrastructure: code stack lfsr_bd_readtag: +52 (+23.2%) +8 (+10.0%) lfsr_rbyd_fetch: +36 (+5.0%) +8 (+6.2%, cold) lfs_toleb128: -12 (-25.0%) -4 (-20.0%, cold) total: +76 (+0.2%) +8 (+0.3%) But oh well. Note that some of these changes are good even without ckreads, such as only parsing the last ecksum tag.	2024-08-16 01:04:03 -05:00
Christopher Haster	a735bcd667	Fixed hanging scripts trying to parse stderr code.py, specifically, was getting messed up by inconsequential GCC objdump errors on Clang -g3 generated binaries. Now stderr from child processes is just redirected to /dev/null when -v/--verbose is not provided. If we actually depended on redirecting stderr->stdout these scripts would have been broken when -v/--verbose was provided anyways. Not really sure what the original code was trying to do...	2024-06-20 13:04:07 -05:00
Christopher Haster	54d77da2f5	Dropped csv field prefixes in scripts The original idea was to allow merging a whole bunch of different csv results into a single lfs.csv file, but this never really happened. It's much easier to operate on smaller context-specific csv files, where the field prefix: - Doesn't really add much information - Requires more typing - Is confusing in how it doesn't match the table field names. We can always use summary.py -fcode_size=size to add prefixes when necessary anyways.	2024-06-02 19:19:46 -05:00
Christopher Haster	3c5319e125	Tweaked test/bench id globbing to avoid duplicating cases Before, globs that match both the suite name and case name would cause end up running the case twice. Which is a bit of a problem, since all cases contain their suite name as a prefix... test_f* => run test_files \|-> run test_files_hello \|-> run test_files_trunc ... run test_files_hello run test_files_trunc ... Now we only run matching test cases if no suites were found. This has the side-effect of making the universal glob, "", equivalent to no test ids, which is nice: $ ./scripts/test.py -j -b '' # equivalent $ ./scripts/test.py -j -b # This is useful for running a specific problematic test first before running the all of the tests: $ ./scripts/test.py -j -b test_files_trunc '*'	2024-05-29 23:09:45 -05:00
Christopher Haster	31eebc1328	Added -a/--all to test.py/bench.py for bypass test/bench filters These really shouldn't be used all that often. Test filters are usually used to protect against invalid test configurations, so if you bypass test filters, expect things to fail! But some filters just prevent test cases from taking too long. In these cases being able to manually bypass the filter is useful for debugging/ benchmarking/etc...	2024-05-28 16:46:40 -05:00
Christopher Haster	c3dc7cca10	Fixed underflow issue with truncating test/bench -C/--context There was no check on context > stdout, so requesting more context than was actually printed by the test could result in a negative value. Python "helpfully" interpreted this as a negative index, resulting in somewhat random context lengths. This, combined with my tendency to just default to a large number like --context=100, led to me thinking a test was printing much less than it actually was... Don't get me wrong, I love Python, and I think Python's negative indices are a clever way to add flexibility to slice notation, but the value-dependent semantics are a pretty unfortunate footgun...	2024-04-09 20:04:07 -05:00
Christopher Haster	2dcde5579b	Fixed issue with test.py/bench.py -f/--fail not killing runners While the -f/--fail logic was correctly terminating the test.py/bench.py runner thread, it was not terminating the actual underlying test process. This was causing test.py/bench.py to hang until the test runner completed all pending tests, which could take quite some time. This wasn't noticed earlier because test.py/bench.py still reports the test as failed, and most uses of -f/--fail involve specifying a specific test case, which usually terminates quite quickly. What's more interesting is this termination logic was copied from the handling of ctrl-C/SIGINT/KeyboardInterrupt, but this issue is not present there because SIGINT would be sent to all processes in the process tree, terminating the child process anyways. Fixed by adding an explicit proc.kill() to test.py/bench.py before tearing down the runner thread.	2024-04-01 17:15:13 -05:00
Christopher Haster	531c2bcc4c	Quieted test.py/bench.py status when stdout is aimed at stdout This is a condition for specifically the -O- pattern. Doing anything fancier would be too much, so anything clever such as -O/dev/stdout will still be clobbered. This was a common enough pattern and the status updates clobbering stdout was annoying enough that I figured this warranted a special case.	2024-03-20 13:58:22 -05:00
Christopher Haster	76593711ab	Added -f/--fail to test.py/bench.py This just tells test.py/bench.py to pretend the test failed and trigger any conditional utilities. This can be combined with --gdb to easily inspect a test that isn't actually failing. Up until this point I've just been inserting assert(false) when needed, which is clunky.	2024-03-20 13:50:04 -05:00
Christopher Haster	1422a61d16	Made generated prettyasserts more debuggable The main star of the show is the adoption of __builtin_trap() for aborting on assert failure. I discovered this GCC/Clang extension recently and it integrates much, _much_ better with GDB. With stdlib's abort(), GDB drops you off in several layers of internal stdlib functions, which is a pain to navigate out of to get to where the assert actually happened. With __builtin_trap(), GDB stops immediately, making debugging quick and easy. This is great! The pain of debugging needs to come from understanding the error, not just getting to it. --- Also tweaked a few things with the internal print functions to make reading the generated source easier, though I realize this is a rare thing to do.	2024-02-14 01:14:36 -06:00
Christopher Haster	06a360462a	Simplified test/bench suite finding logic in test.py/bench.py These just take normal paths now, we weren't even using the magic test/bench suite finding logic since it's easier to just pass everything explicitly in our Makefile. The original test/bench suite finding logic was a bad idea anyways. This is what globs are for, and having custom path chasing logic is inconsistent and risks confusion.	2024-02-14 00:25:10 -06:00
Christopher Haster	a124ee54e7	Reworked test/bench defines to map to global variables Motivation: - Debuggability. Accessing the current test/bench defines from inside gdb was basically impossible for some dumb macro-debug-info reason I can't figure out. In theory, GCC provides a .debug_macro section when compiled with -g3. I can see this section with objdump --dwarf=macro, but somehow gdb can't seem to find any definitions? I'm guess the #line source remapping is causing things to break somehow... Though even if macro-debugging gets fixed, which would be valuable, accessing defines in the current test/bench runner can trigger quite a bit of hidden machinery. This risks side-effects, which is never great when debugging. All of this is quite annoying because the test/bench defines is usually the most important piece of information when debugging! This replaces the previous hidden define machinery with simple global variables, which gdb can access no problem. - Also when debugging we no longer awkwardly step into the test_define function all the time! - In theory, global variables, being a simple memory access, should be quite a bit faster than the hidden define machinery. This does matter because running tests _is_ a dev bottleneck. In practice though, any performance benefit is below the noise floor, which isn't too surprising (~630s +-~20s). - Using global variables for defines simplifies the test/bench runner quite a bit. Though some of the previous complexity was due to a whole internal define caching system, which was supposed to lazily evaluate test defines to avoid evaluating defines we don't use. This all proved to be useless because the first thing we do when running each test is evaluate all defines to generate the test id (lol). So now, instead of lazily evaluating and caching defines, we just generate global variables during compilation and evaluate all defines for each test permutation immediately before running. This relies heavily on __attribute__((weak)) symbols, and lets the linker really shine. As a funny perk this also effectively interns all test/bench defines by the address of the resulting global variable. So we don't even need to do string comparisons when mapping suite-level defines to the runner-level defines. --- Perhaps the more interesting thing to note, is the change in strategy in how we actually evaluate the test defines. This ends up being a surprisingly tricky problem, due to the potential of mutual recursion between our defines. Previously, because our define machinery was lazy, we could just evaluate each define on demand. If a define required another define, it would lazily trigger another evaluation, implicitly recursing through C's stack. If cyclic, this would eventually lead to a stack overflow, but that's ok because it's a user error to let this happen. The "correct" way, at least in terms of being computationally optimal, would be to topologically sort the defines and evaluate the resulting tree from the leaves up. But I ain't got time for that, so the solution here is equal parts hacky, simple, and effective. Basically, we just evaluate the defines repeatedly until they stop changing: - Initially, mutually recursive defines may read the uninitialized values of their dependencies, and end up with some arbitrarily wrong result. But as the defines are repeatedly evaluated, assuming no cycles, the correct results should eventually bubble up the tree until all defines converge to the correct value. - This is O(n*e) vs O(n+e), but our define graph is usually quite shallow. - To prevent non-halting, we error after an arbitrary 1000 iterations. If you hit this, it's likely because there is a cycle in the define graph. This is runtime configurable via the new --define-depth flag. - To keep things consistent and reproducible, we zero initialize all defines before the first evaluation. I don't think this is strictly necessary, but it's important for the test runner to have the exact same results on every run. No one wants a "works on my machine" situation when the tests are involved. Experimentation shows we only need an evaluation depth of 2 to successfully evaluate the current set of defines: $ ./runners/test_runner --list-defines --define-depth=2 And any performance impact is negligible (~630s +-~20s).	2024-02-13 18:59:58 -06:00
Christopher Haster	724fc5fc91	Hide gdb info header from test.py/bench.py --gdb This was too noisy when intermingled with other debug output test.py/bench.py prints when dropping into gdb.	2024-02-03 18:14:56 -06:00
Christopher Haster	161cd9e6da	Fixed race condition killing test processes in test/bench.py Note sure why we weren't hitting this earlier, but I've been hitting this race condition a bunch recently and it's annoying. Now every failed process kills the other test processes unconditionally. It's not clear if this actually _fixes_ the race condition or just makes it less likely, but it's good enough to keep the test script user friendly.	2023-12-17 15:18:26 -06:00
Christopher Haster	d485795336	Removed concept of geometries from test/bench runners This turned out to not be all that useful. Tests already take quite a bit to run, which is a good thing! We have a lot of tests! 942.68s or ~15 minutes of tests at the time of writing to be exact. But simply multiplying the number of tests by some number of geometries is heavy handed and not a great use of testing time. Instead, tests where different geometries are relevant can parameterize READ_SIZE/PROG_SIZE/BLOCK_SIZE at the suite level where needed. The geometry system was just another define parameterization layer anyways. Testing different geometries can still be done in CI by overriding the relevant defines anyways, and it _might_ be interesting there.	2023-12-06 22:23:41 -06:00
Christopher Haster	6d81b0f509	Changed --context short flag to -C in scripts This matches diff and grep, and avoids lower-case conflicts in test.py/bench.py.	2023-11-06 01:59:03 -06:00
Christopher Haster	d1b9a2969f	Added -F/--failures to test.py/bench.py to limit failures when -k/--keep-going The -k/--keep-going option has been more or less useless before this since it would completely flood the screen/logs when a bug triggers multiple test failures, which is common. Some things to note: - RAM management is tricky with -k/--keep-going, if we try to save logs and filter after running everything we quickly fill up memory. - Failing test cases are a much slower path than successes since we need to kill and restart the underlying test_runner, its state can't be trusted anymore. This is a-ok since hopefully you usually hope for many more successes than failures. Unfortunately it can make -k/--keep-going quite slow. --- ALSO -- warning this is a tangent rant-into-the-void -- I have discovered that Ubuntu has a "helpful" subsystem named Apport that tries to record/log/report any process crash in the system. It is "disabled" by default, but the way it's disabled requires LAUNCHING A PYTHON INTERPRETER to check a flag on every segfault/assert failure. This is what it does when it's "disabled"! This subsystem is fundamentally incompatible with any program that intentionally crashes subprocesses, such as our test runner. The sheer amount of python interpreters being launched quickly eats through all available RAM and starts OOM killing half the processes on the system. If anyone else runs into this, a shallow bit of googling suggests the best solution is to just disable Apport. It is not a developer friendly subsystem: $ sudo systemctl disable apport.service Removing Apport brings RAM usage back down to a constant level, even with absurd numbers of test failures. And here I thought I had memory leak somewhere.	2023-11-06 01:55:28 -06:00
Christopher Haster	1e4d4cfdcf	Tried to write errors to stderr consistently in scripts	2023-11-05 15:55:07 -06:00
Christopher Haster	2be3ff57c5	Moved post-bench amor/avg analysis out into amor.py and avg.py 1. Being able to inspect results before benchmarks complete was useful to track their status. It also allows some analysis even if a benchmark fails. 2. Moving these scripts out of bench.py allows them to be a bit more flexible, at the cost of CSV parsing/structuring overhead. 3. Writing benchmark measurements immediately avoids RAM buildup as we store intermediate measurements for each bench permutation. This may increase the IO bottleneck, but we end up writing the same number of lines, so not sure... I realize avg.py has quite a bit of overlap with summary.py, but I don't want to entangle them further. summary.py is already trying to do too much as is...	2023-11-04 13:16:50 -05:00
Christopher Haster	fb9277feac	Tweaked test.py/bench.py to allow no suites to test compilation This is mainly to allow bench_runner to at least compile after moving benches out of tree. Also cleaned up lingering runner/suite munging leftover from the change to an optional -R/--runner parameter.	2023-11-03 11:15:45 -05:00
Christopher Haster	e8bdd4d381	Reworked bench.py/bench_runner/how bench measurements are recorded This is based on how bench.py/bench_runners have actually been used in practice. The main changes have been to make the output of bench.py more readibly consumable by plot.py/plotmpl.py without needing a bunch of hacky intermediary scripts. Now instead of a single per-bench BENCH_START/BENCH_STOP, benches can have multiple named BENCH_START/BENCH_STOP invocations to measure multiple things in one run: BENCH_START("fetch", i, STEP); lfsr_rbyd_fetch(&lfs, &rbyd_, rbyd.block, CFG->block_size) => 0; BENCH_STOP("fetch"); Benches can also now report explicit results, for non-io measurements: BENCH_RESULT("usage", i, STEP, rbyd.eoff); The extra iter/size parameters to BENCH_START/BENCH_RESULT also allow some extra information to be calculated post-bench. This infomation gets tagged with an extra bench_agg field to help organize results in plot.py/plotmpl.py: - bench_meas=<meas>+amor, bench_agg=raw - amortized results - bench_meas=<meas>+div, bench_agg=raw - per-byte results - bench_meas=<meas>+avg, bench_agg=avg - average over BENCH_SEED - bench_meas=<meas>+min, bench_agg=min - minimum over BENCH_SEED - bench_meas=<meas>+max, bench_agg=max - maximum over BENCH_SEED --- Also removed all bench.tomls for now. This may seem counterproductive in a commit to improve benchmarking, but I'm not sure there's actual value to keeping bench cases committed in tree. These were alway quick to fall out of date (at the time of this commit most of the low-level bench.tomls, rbyd, btree, etc, no longer compiled), and most benchmarks were one-off collections of scripts/data with results too large/cumbersome to commit and keep updated in tree. I think the better way to approach benchmarking is a seperate repo (multiple repos?) with all related scripts/state/code and results committed into a hopefully reproducible snapshot. Keeping the bench.tomls in that repo makes more sense in this model. There may be some value to having benchmarks in CI in the future, but for that to make sense they would need to actually fail on performance regression. How to do that isn't so clear. Anyways we can always address this in the future rather than now.	2023-11-03 10:27:17 -05:00
Christopher Haster	39f417db45	Implemented a filesystem traversal that understands file bptrs/btrees Ended up changing the name of lfsr_mtree_traversal_t -> lfsr_traversal_t, since this behaves more like a filesytem-wide traversal than an mtree traversal (it returns several typed objects, not mdirs like the other mtree functions for one). As a part of this changeset, lfsr_btraversal_t (was lfsr_btree_traversal_t) and lfsr_traversal_t no longer return untyped lfsr_data_ts, but instead return specialized lfsr_{b,t}info_t structs. We weren't even using lfsr_data_t for its original purpose in lfsr_traversal_t. Also changed lfsr_traversal_next -> lfsr_traversal_read, you may notice at this point the changes are intended to make lfsr_traversal_t look more like lfsr_dir_t for consistency. --- Internally lfsr_traversal_t now uses a full state machine with its own enum due to the complexity of traversing the filesystem incrementally. Because creating diagrams is fun, here's the current full state machine, though note it will need to be extended for any parity-trees/free-trees/etc: mrootanchor \| v mrootchain .-' \| \| v \| mtree ---> openedblock '-. \| ^ \| ^ v v \| v \| mdirblock openedbtree \| ^ v \| mdirbtree I'm not sure I'm happy with the current implementation, and eventually it will need to be able to handle in-place repairs to the blocks it sees, so this whole thing may need a rewrite. But in the meantime, this passes the new clobber tests in test_alloc, so it should be enough to prove the file implementation works. (which is definitely is not fully tested yet, and some bugs had to be fixed for the new tests in test_alloc to pass). --- Speaking of test_alloc. The inherent cyclic dependency between files/dirs/alloc makes it a bit hard to know what order to test these bits of functionality in. Originally I was testing alloc first, because it seems you need to be confident in your block allocator before you can start testing higher-level data structures. But I've gone ahead and reversed this order, testing alloc after files/dirs. This is because of an interesting observation that if alloc is broken, you can always increase the test device's size to some absurd number (-DDISK_SIZE=16777216, for example) to kick the can down the road. Testing in this order allows alloc to use more high-level APIs and focus on corner cases where the allocator's behavior requires subtlety to be correct (e.g. ENOSPC).	2023-10-14 01:13:40 -05:00

1 2

76 Commits