These scripts can't easily share the common logic, but separating
field details from the print/merge/csv logic should make the common
part of these scripts much easier to create/modify going forward.
This also tweaked the behavior of summary.py slightly.
This also adds coverage support to the new test framework, which due to
reduction in scope, no longer needs aggregation and can be much
simpler. Really all we need to do is pass --coverage to GCC, which
builds its .gcda files during testing in a multi-process-safe manner.
The addition of branch coverage leverages information that was available
in both lcov and gcov.
This was made easier with the addition of the --json-format to gcov
in GCC 9.0, however the lax backwards compatibility for gcov's
intermediary options is a bit concerning. Hopefully --json-format
sticks around for a while.
GCC is a bit annoying here, it can't generate .cgi files without
generating the related .o files, though I suppose the alternative risks
duplicating a large amount of compilation work (littlefs is really
a small project).
Previously we rebuilt the .o files anytime we needed .cgi files
(callgraph info used for stack.py). This changes it so we always
built .cgi files as a side-effect of compilation. This is similar
to the .d file generation, though may be annoying if the system
cc doesn't support --callgraph-info.
A small mistake in test.py's control flow meant the failing test job
would succesfully kill all other test jobs, but then humorously start
up a new process to continue testing.
This simplifies the interaction between code generation and the
test-runner.
In theory it also reduces compilation dependencies, but internal tests
make this difficult.
This mostly required names for each test case, declarations of
previously-implicit variables since the new test framework is more
conservative with what it declares (the small extra effort to add
declarations is well worth the simplicity and improved readability),
and tweaks to work with not-really-constant defines.
Also renamed test_ -> test, replacing the old ./scripts/test.py,
unfortunately git seems to have had a hard time with this.
- Added --exec for wrapping the test-runner with external commands, such as
Qemu or Valgrind.
- Added --valgrind, which just aliases --exec=valgrind with a few extra
flags useful during testing.
- Dropped the "valgrind" type for tests. These aren't separate tests
that run in the test-runner, and I don't see a need for disabling
Valgrind for any tests. This can be added back later if needed.
- Readded support for dropping directly into gdb after a test failure,
either at the assert failure, entry point of test case, or entry point
of the test runner with --gdb, --gdb-case, or --gdb-main.
- Added --isolate for running each test permutation in its own process,
this is required for associating Valgrind errors with the right test
case.
- Fixed an issue where explicit test identifier conflicted with
per-stage test identifiers generated as a part of --by-suite and
--by-case.
Previously test defines were implemented using layers of index-mapped
uintmax_t arrays. This worked well for lookup, but limited defines to
constants computed at compile-time. Since test defines themselves are
actually calculated at _run-time_ (yeah, they have deviated quite
a bit from the original, compile-time evaluated defines, which makes
the name make less sense), this means defines can't depend on other
defines. Which was limiting since a lot of test defines relied on
defines generated from the geometry being tested.
This new implementation uses callbacks for the per-case defines. This
means they can easily contain full C statements, which can depend on
other test defines. This does means you can create infinitely-recursive
defines, but the test-runner will just break at run-time so don't do that.
One concern is that there might be a performance hit for evaluating all
defines through callbacks, but if there is it is well below the noise
floor:
- constants: 43.55s
- callbacks: 42.05s
- Added internal tests, which can run tests inside other source files,
allowing access to "private" functions and data
Note this required a special bit of handling our defining and later
undefining test configurations to not polute the namespace of the
source file, since it can end up with test cases from different
suites/configuration namespaces.
- Removed unnecessary/unused permutation argument to generated test
functions.
- Some cleanup to progress output of test.py.
- Expanded test defines to allow for lists of configurations
These are useful for changing multi-dimensional test configurations
without leading to extremely large and less useful configuration
combinations.
- Made warnings more visible durring test parsing
- Add lfs_testbd.h to implicit test includes
- Fixed issue with not closing files in ./scripts/explode_asserts.py
- Add `make test_runner` and `make test_list` build rules for
convenience
- Added --disk/--trace/--output options for information-heavy debugging
- Renamed --skip/--count/--every to --start/--stop/--step.
This matches common terms for ranges, and frees --skip for being used
to skip test cases in the future.
- Better handling of SIGTERM, now all tests are killed, reported as
failures, and testing is halted irregardless of -k.
This is a compromise, you throw away the rest of the tests, which
is normally what -k is for, but prevents annoying-to-terminate
processes when debugging, which is a very interactive process.
In the test-runner, defines are parameterized constants (limited
to integers) that are generated from the test suite tomls resulting
in many permutations of each test.
In order to make this efficient, these defines are implemented as
multi-layered lookup tables, using per-layer/per-scope indirect
mappings. This lets the test-runner and test suites define their
own defines with compile-time indexes independently. It also makes
building of the lookup tables very efficient, since they can be
incrementally populated as we expand the test permutations.
The four current define layers and when we need to build them:
layer defines predefine_map define_map
user-provided overrides per-run per-run per-suite
per-permutation defines per-perm per-case per-perm
per-geometry defines per-perm compile-time -
default defines compile-time compile-time -
- Added filtering based on suite, case, perm, type, geometry
- Added --skip, --count, and --every (will be used for parallelism)
- Implemented --list-defines
- Better helptext for flags with arguments
- Other minor tweaks
- Indirect index map instead of bitmap+sparse array
- test_define_t and test_type_t
- Added back conditional filtering
- Added suite-level defines and filtering
This moves defines entirely into the runtime of the test_runner,
simplifying thing and reducing the amount of generated code that needs
to be build, at the cost of limiting test-defines to uintmax_t types.
This is implemented using a set of index-based scopes (created by
test.py) that allow different layers to override defines from other
layers, accessible through the global `test_define` function.
layers:
1. command-line overrides
2. per-case defines
3. per-geometry defines
This is to try a different design for testing, the goals are to make the
test infrastructure a bit simpler, with clear stages for building and
running, and faster, by avoiding rebuilding lfs.c n-times.
As noted in Python's subprocess library:
> This will deadlock when using stdout=PIPE and/or stderr=PIPE and the
> child process generates enough output to a pipe such that it blocks
> waiting for the OS pipe buffer to accept more data.
Curiously, this only became a problem when updating to Ubuntu 20.04
in CI (python3.6 -> python3.8).
Avoids redundant counting of structs shared in multiple .c files, which
is very common. This is different from the other scripts,
code.py/data.py/stack.py, but this difference makes sense as struct
declarations have a very different lifetime.
This requires parsing an additional section of the dwarfinfo (--dwarf=rawlines)
to get the declaration file info.
---
Interpreting the results of ./scripts/structs.py reporting is a bit more
complicated than other scripts, structs aren't used in a consistent
manner so the cost of a large struct depends on the context in which it
is used.
But that being said, there really isn't much reason to report
internal-only structs. These structs really only exist for type-checking
in internal algorithms, and their cost will end up reflected in other RAM
measurements, either stack, heap, or other.
Using errors=replace in python utf-8 decoding makes these scripts more
resilient to underlying errors, rather than just throwing an unhelpfully
generic decode error.
- Changed `make summary` to show a one line summary
- Added `make lfs.csv` rule, which is useful for finding more info with
other scripts
- Fixed small issue in ./scripts/summary.py
- Added *.ci (callgraph) and *.csv (script output) to CI
A full summary of static measurements (code size, stack usage, etc) can now
be found with:
make summary
This is done through the combination of a new ./scripts/summary.py
script and the ability of existing scripts to merge into existing csv
files, allowing multiple results to be merged either in a pipeline, or
in parallel with a single ./script/summary.py call.
The ./scripts/summary.py script can also be used to quickly compare
different builds or configurations. This is a proper implementation
of a similar but hacky shell script that has already been very useful
for making optimization decisions:
$ ./scripts/structs.py new.csv -d old.csv --summary
name (2 added, 0 removed) code stack structs
TOTAL 28648 (-2.7%) 2448 1012
Also some other small tweaks to scripts:
- Removed state saving diff rules. This isn't the most useful way to
handle comparing changes.
- Added short flags for --summary (-Y) and --files (-F), since these
are quite often used.
- Added -L/--depth argument to show dependencies for scripts/stack.py,
this replaces calls.py
- Additional internal restructuring to avoid repeated code
- Removed incorrect diff percentage when there is no actual size
- Consistent percentage rendering in test.py
This required a patch to the --diff flag for the scripts to ignore
a missing file. This enables the useful one liner for making comparisons
with potentially missing previous versions:
./scripts/code.py lfs.o -d lfs.o.code.csv -o lfs.o.code.csv
function (0 added, 0 removed) old new diff
TOTAL 25476 25476 +0
One downside, these previous files are easy to delete as a part of make
clean, which limits their usefulness for comparing configuration
changes...
Note this detects loops (recursion), and renders this as infinity.
Currently littlefs does have a single recursive function and you can see
how this infects the full call graph. Eventually this should be removed.
This is to avoid unexpected script behavior even though data.py should
always return 0 bytes for littlefs. Maybe a check for this should be
added to CI?
Now with -s/--sort and -S/--reverse-sort for sorting the functions by
size.
You may wonder why add reverse-sort, since its utility doesn't seem
worth the cost to implement (these are just helper scripts after all),
the reason is that reverse-sort is quite useful on the command-line,
where scrollback may be truncated, and you only care about the larger
entries.
Outside of the command-line, normal sort is prefered.
Fortunately the difference is just the sign in the sort key.
Note this conflicts with the short --summary flag, so that has been
removed.
I confirmed that the same number of tests are run
with "make test" on:
* Ubuntu with and without this change
* macOS with this change
> ====== results ======
> tests passed 817/817 (100.00%)
> tests failed 0/817 (0.00%)
Currently this is just lfs.c and lfs_util.c. Previously this included
the block devices, but this meant all of the scripts needed to
explicitly deselect the block devices to avoid reporting build
size/coverage info on them.
Note that test.py still explicitly adds the block devices for compiling
tests, which is their main purpose. Humorously this means the block
devices will probably be compiled into most builds in this repo anyways.
This was lost in the Travis -> GitHub transition, in serializing some of
the jobs, I missed that we need to clean between tests with different
geometry configurations. Otherwise we end up running outdated binaries,
which explains some of the weird test behavior we were seeing.
Also tweaked a few script things:
- Better subprocess error reporting (dump stderr on failure)
- Fixed a BUILDDIR rule issue in test.py
- Changed test-not-run status to None instead of undefined
Now littlefs's Makefile can work with a custom build directory
for compilation output. Just set the BUILDDIR variable and the Makefile
will take care of the rest.
make BUILDDIR=build size
This makes it very easy to compare builds with different compile-time
configurations or different cross-compilers.
This meant most of code.py's build isolation is no longer needed,
so revisted the scripts and cleaned/tweaked a number of things.
Also bought code.py in line with coverage.py, fixing some of the
inconsistencies that were created while developing these scripts.
One change to note was removing the inline measuring logic, I realized
this feature is unnecessary thanks to GCC's -fkeep-static-functions and
-fno-inline flags.
Since we already have fairly complicated scriptts, I figured it wouldn't
be too hard to use the gcov tools and directly parse their output. Boy
was I wrong.
The gcov intermediary format is a bit of a mess. In version 5.4, a
text-based intermediary format is written to a single .gcov file per
executable. This changed sometime before version 7.5, when it started
writing separate .gcov files per .o files. And in version 9 this
intermediary format has been entirely replaced with an incompatible json
format!
Ironically, this means the internal-only .gcda/.gcno binary format has
actually been more stable than the intermediary format.
Also there's no way to avoid temporary .gcov files generated in the
project root, which risks messing with how test.py runs parallel tests.
Fortunately this looks like it will be fixed in gcov version 9.
---
Ended up switching to lcov, which was the right way to go. lcov handles
all of the gcov parsing, provides an easily parsable output, and even
provides a set of higher-level commands to manage coverage collection
from different runs.
Since this is all provided by lcov, was able to simplify coverage.py
quite a bit. Now it just parses the .info files output by lcov.
Now coverage information can be collected if you provide the --coverage
to test.py. Internally this uses GCC's gcov instrumentation along with a
new script, coverage.py, to parse *.gcov files.
The main use for this is finding coverage info during CI runs. There's a
risk that the instrumentation may make it more difficult to debug, so I
decided to not make coverage collection enabled by default.
Mostly taken from .travis.yml, biggest changes were around how to get
the status updates to work.
We can't use a token on PRs the same way we could in Travis, so instead
we use a second workflow that checks every pull request for "status"
artifacts, and create the actual statuses in the "workflow_run" event,
where we have full access to repo secrets.
Inspired by Linux's Bloat-O-Meter, code_size.py wraps nm to provide
function-level code size, and supports detailed comparison between
different builds.
One difference is that code_size.py invokes littlefs's build system
similarly to test.py, creating a duplicate build in the "sizes"
directory. This makes it easy to monitor a cross-compiled build size
while simultaneously testing on the host machine.
Not sure how this went unnoticed, I guess this is the first bug that
needed in-depth inspection after the a last-minute argument cleanup
in the debug scripts.
- Standardized littlefs debug statements to use hex prefixes and
brackets for printing pairs.
- Removed the entry behavior for readtree and made -t the default.
This is because 1. the CTZ skip-list parsing was broken, which is not
surprising, and 2. the entry parsing was more complicated than useful.
This functionality may be better implemented as a proper filesystem
read script, complete with directory tree dumping.
- Changed test.py's --gdb argument to take [init, main, assert],
this matches the names of the stages in C's startup.
- Added printing of tail to all mdir dumps in readtree/readmdir.
- Added a print for if any mdirs are corrupted in readtree.
- Added debug script side-effects to .gitignore.
- Changed readmdir.py to print the metadata pair and revision count,
which is useful when debugging commit issues.
- Added truncated data view to readtree.py by default. This does mean
readtree.py must read all files on the filesystem to show the
truncated data, hopefully this does not end up being a problem.
- Made overall representation hopefully more readable, including moving
superblock under the root dir, userattrs under files, fixing a gstate
rendering issue.
- Added rendering of soft-tails as dotted-arrows, hopefully this isn't
too noisy.
- Fixed explode_asserts.py off-by-1 in #line mapping caused by a strip
call in the assert generation eating newlines. The script matches
line numbers between the original+modified files by emitting assert
statements that use the same number of lines. An off-by-1 here causes
the entire file to map lines incorrectly, which can be very annoying.
- Added caching to Travis install dirs, because otherwise
pip3 install fails randomly
- Increased size of littlefs-fuse disk because test script has
a larger footprint now
- Skip a couple of reentrant tests under byte-level writes because
the tests just take too long and cause Travis to bail due to no
output for 10m
- Fixed various Valgrind errors
- Suppressed uninit checks for tests where LFS_BLOCK_ERASE_VALUE == -1.
In this case rambd goes uninitialized, which is fine for rambd's
purposes. Note I couldn't figure out how to limit this suppression
to only the malloc in rambd, this doesn't seem possible with Valgrind.
- Fixed memory leaks in exhaustion tests
- Fixed off-by-1 string null-terminator issue in paths tests
- Fixed lfs_file_sync issue caused by revealed by fixing memory leaks
in exhaustion tests. Getting ENOSPC during a file write puts the file
in a bad state where littlefs doesn't know how to write it out safely.
In this case, lfs_file_sync and lfs_file_close return 0 without
writing out state so that device-side resources can still be cleaned
up. To recover from ENOSPC, the file needs to be reopened and the
writes recreated. Not sure if there is a better way to handle this.
- Added some quality-of-life improvements to Valgrind testing
- Fit Valgrind messages into truncated output when not in verbose mode
- Turned on origin tracking
RAM-backed testing is faster than file-backed testing. This is why
test.py uses rambd by default.
So why add support for tmpfs-backed disks if we can already run tests in
RAM? For reentrant testing.
Under reentrant testing we simulate power-loss by forcefully exiting the
test program at specific times. To make this power-loss meaningful, we need to
persist the disk across these power-losses. However, it's interesting to
note this persistence doesn't need to be actually backed by the
filesystem.
It may be possible to rearchitecture the tests to simulate power-loss a
different way, by say, using coroutines or setjmp/longjmp to leave
behind ongoing filesystem operations without terminating the program
completely. But at this point, I think it's best to work with what we
have.
And simply putting the test disks into a tmpfs mount-point seems to
work just fine.
Note this does force serialization of the tests, which isn't required
otherwise. Currently they are only serialized due to limitations in
test.py. If a future change wants to perallelize the tests, it may need
to rework RAM-backed reentrant tests.