code.py, specifically, was getting messed up by inconsequential GCC
objdump errors on Clang -g3 generated binaries.
Now stderr from child processes is just redirected to /dev/null when
-v/--verbose is not provided.
If we actually depended on redirecting stderr->stdout these scripts
would have been broken when -v/--verbose was provided anyways. Not
really sure what the original code was trying to do...
These are basically the same, but with reverting asserts-as-hints the
unreachable statements give the compiler more info. It also avoids
unused variable warnings, and matches the if-err pattern we use for
propagating errors elsewhere.
Code changes were a bit weird, seems the compiler decided to uninline
lfsr_data_fromgeomtry and lfsr_data_fromleb128 for some reason:
code stack
before: 33882 2560
after: 33886 (+0.0%) 2560 (+0.0%)
The ability of GCC is just insufficient for asserts-as-hints and at some
point it's not worth trying to workaround this.
Code with asserts should never be worse than code without asserts, so we
might as well just disable asserts completely when not debugging:
code stack
hint-assert (before): 33918 2592
no-assert (after): 33882 (-0.1%) 2560 (-1.2%)
Clang does no better here (targeting x86):
code
clang+hint-asserts (before): 51946
clang+no-asserts (after): 51663 (-0.5%)
Maybe in the future some builtin will let us force pure expressions. It
would be interesting to revisit assert driven optimizations at some
point.
With rcompat/wcompat flags, on-disk minor version bumps will hopefully
not be needed for a long time (ever?). And if the on-disk version never
changes, why was a word to report it every lfsr_fs_stat call?
But this may be something to listen to user feedback on. Worst case we
can always readd fsinfo.disk_version if users find it useful.
Code changes:
code stack
before: 33922 2592
after: 33918 (-0.0%) 2592 (+0.0%)
All in-flight blocks should be tracked by at least an open file handle
when calling lfsr_mdir_commit. It's the internals of lfsr_mdir_commit
that are really protected by the lfsr_alloc_ckpoint system.
Making lfsr_mdir_commit implicitly call lfsr_alloc_ckpoint removes one
step that was easy to forget, and means we don't need to pepper every
function with lfsr_alloc_ckpoint willy-nilly.
This also merges lfsr_fs_prepareappend -> lfsr_fs_mkconsistent, which
were really two names for the same function. lfsr_fs_grow not calling
lfsr_fs_prepareappend made this name potentially confusing.
Code changes:
code stack
before: 33942 2592
after: 33922 (-0.1%) 2592 (+0.0%)
While it may be useful to know when/why lfsr_fs_fixgrm fails, at this
point in lfsr_rename/lfsr_remove the operation has already succeeded as
far as the filesystem is concerned.
It's counterintuitive, but ignoring these errors actually tells the user
_more_ information, specifically whether or not the operation completed
on disk.
At least we can log the error via LFS_WARN, and such errors will likely
come up again in a future operation, such as the call to lfsr_fs_fixgrm
on the next filesystem mutation.
This was noticed in test_grow, which tests error code-paths quite a bit
more than any other test.
Code changes:
code stack
before: 33934 2592
after: 33942 (+0.0%) 2592 (+0.0%)
So LFSR_TAG_MOVE is last, as it's rather complex and also needs to
handle bsprouts/bshrubs.
Compiler noise:
code stack
before: 33926 2592
after: 33934 (+0.0%) 2592 (+0.0%)
Not sure how this was missed for so long, but we completely forget about
in-flight mroot attrs if we happen to uninline the mtree.
I guess this was missed because only some late-stage fs ops need to
commit mroot attrs (lfsr_fs_grow, lfsr_setattr, upgrades, etc), but
being able to commit to the mroot is definitely an operation we need to
support.
Fixing this in a non-awkward way was a bit tricky. We need some way to
commit both the provided attr-list and our new mtree, but all of the
lower layers only accept a single attr-list. The solution here
is to add a special tail-recursive LFSR_TAG_ATTRS that can be used to
chain together multiple attr-lists. This solves the problem quite
elegantly and may actually be useful in the future?
It takes a bit of code:
code stack
before: 33850 2584
after: 33926 (+0.2%) 2592 (+0.3%)
But this solves our final lfsr_fs_grow-related bug. No more mroot-split
hacks in test_grow, and we can now grow any stuck filesystem.
Well this turned into a never-ending can of worms...
I guess the good news is our newly added lfsr_grow_incr_* tests are
_very_ good at finding post-error-resume bugs.
Implementation-wise, this was fairly straightforward thanks to prior
work by BrianPugh, kaetemi, and myself:
1. Made block_count pseudo-optional by adding lfs.block_count so we can
mutate it based on what we find on-disk.
This was done a bit different from the previous implementation,
instead of setting block_count=0 to read the block_count from disk,
we allow any block_count <= the configured block_count.
This matches how we handle name_limit/file_limit/etc, and allows
users to mount a filesystem with unknown block_count while asserting
an upper bound.
2. Added lfsr_fs_grow, which can grow the filesystem.
The is basically the same as the previous implementation except we're
a bit more careful with the lookahead buffer.
I thought the previous impl might have been broken w.r.t. lookahead
buffer, but fortunately it's only broken in a way that makes us think
newly available blocks are temporarily in-use. Which is a bit funny.
One interesting thing that came out with more aggressive tests is
that it's possible to get locked-up in lfsr_fs_preparemutation trying
to clean up grms/orphans before we change the filesystem size.
Fortunately it turns out we don't _really_ need to call
lfsr_fs_preparemutation here. This gets a bit delicate, but means we
should always be able to grow a full filesystem.
To test this I've added both the simple grow/error tests from the
previous version, as well as a set of fuzz tests (a la test_relocations
and friends) that incrementally grow the filesystem when encountering
LFS_ERR_NOSPC. These have a surprising amount coverage, testing
lfsr_fs_grow, lfsr_fs_stat, lfsr_fs_size, and resuming operations after
encountering an error.
Which also means they found bugs:
- lfs_alloc_setinuse was not broken before, because lookahead.start was
always a multiple of lookahead_size. But now with lfs_alloc_discard,
this invariant may not be true.
I've just changed all lookahead.start updates to mod block_count. This
adds a bit of code, but is much easier to reason about.
While fixing this, I also added an assert to never allocate blocks
{0,1} in lfs_alloc. This is a good assert to have, but did require
some tweaks to test_btree to avoid these blocks.
- We were incorrectly patching grms in lfsr_mdir_commit when mdelta=0.
Funnily enough we also proceed to ignore the patched grm most of the
time when mdelta=0, so this went unnoticed.
- It turns out we're completely ignoring rid=-1 attrs if we split the
mroot. Not sure how this was missed. It's a bit important.
Note this is still broken. Fixing this requires some rather invasive
changes to lfsr_mdir_commit's internal logic that should probably be
in another commit...
Note again fwrite_fuzz is omitted. Currently the state of data in opened
files is undefined after a failed write, so this wouldn't really be
testing anything interesting...
More features = more code, and all of this bug fixing meant several
things contributed to code/stack changes in this commit:
code stack
before: 33654 2592
+variable block_count: 33646 (-0.0%) 2584 (+0.0%)
+lfsr_fs_grow: 33818 (+0.5%) 2584 (-0.3%)
+lookahead-start-fix: 33842 (+0.6%) 2584 (-0.3%)
+grm-patch-fix (after): 33850 (+0.6%) 2584 (-0.3%)
Wild that variable block_count actually saves code/stack. I guess the
indirect lfs->cfg->block_count load can get costly...
Inspired by errno's/dbgerr.py's -l/--list, this gives a quick and easy
list of the current tag encodings, which can be very useful:
$ ./scripts/dbgtag.py -l
LFSR_TAG_NULL 0x0000 v--- ---- ---- ----
LFSR_TAG_CONFIG 0x00tt v--- ---- -ttt tttt
LFSR_TAG_MAGIC 0x0003 v--- ---- ---- --11
LFSR_TAG_VERSION 0x0004 v--- ---- ---- -1--
... snip ...
We already need to keep dbgtag.py in-sync or risk a bad debugging
experience, so we might as well let it tell us all the information it
currently knows.
Also yay for self-inspecting code, I don't know if it's bad that I'm
becoming a fan of parsing information out of comments...
This is based on moreutils's errno program, but specialized for
littlefs's error codes. Everything is hardcoded in dbgerr.py, so this
should be portable to any OS really.
Hopefully this will be useful for debugging littlefs errors:
$ ./scripts/dbgerr.py -84
LFS_ERR_CORRUPT -84 Corrupted
The idea behind LFS_ERR_UNKNOWN is to reserve -1 as a general purpose
"Idunno what error code to use just bail" error. Discouraged for
production code, but very useful for hacking things together either for
quick prototypes or when trying to bootstrap a system.
-1 was actually already carved out for this purpose, but not documented.
I've used it heavily in other project, and it would be good to actually
codify this so others know the option is available.
This should end up being one of the allowed bd error codes, along with
LFS_ERR_IO, LFS_ERR_CORRUPT, etc.
---
If you try to map this to Linux errno codes, this does unfortunately
conflict with EPERM (-1), but what is EPERM other than an arbitrary "you
can't do that" error? Outside of Linux/POSIX, I think you can probably
get away with using EACCES (-13) for permission related errors instead.
I think the tradeoff is well worth it given we're not _really_
constrained to exact POSIX semantics.
We really only rely on cksum here.
Passing weight was useful for an assert, but it clutters up the function
call with a parameter that isn't used normally.
Removing ths saves a bit of code:
code stack
before: 33670 2592
after: 33654 (-0.0%) 2592 (+0.0%)
The theory is that lfsr_opened_isopen is a relatively special case, and
the main cause of LFS_ASSERT side-effects. All other LFS_ASSERTs are
either limited to simple expressions, or small static-inline functions.
Unfortunately, even without lfsr_opened_isopen asserts, it seems
asserts-as-hints still results in worse code/stack costs:
code stack
no-assert (before): 33626 2552
hint-assert (after): 33670 (+0.1%) 2592 (+1.6%)
Digging around in the low-level assembly, it seems that what is
happening is the increased number of calls to static-inline functions is
causing the compiler to prefer to not-inline functions more often. Then,
even if calls would be eliminated as dead-code, the damage is done to
the containing function.
It's not entirely clear how this could be avoided. Maybe a separate
LFS_ASSERT for only pure expressions? This may not be worth trying to
solve outside of the compiler...
I realized the reason asserting on opened/closed file handles added so
much extra code was because our LFS_ASSERT macro doesn't properly
eliminate side-effects.
Consider this assert:
LFS_ASSERT(lfsr_opened_isopen(lfs, &dir->o));
Expanded:
((lfsr_opened_isopen(lfs, &dir->o))
? (void)0
: __builtin_unreachable());
Even though the compiler knows lfsr_opened_isopen must return true
here, it doesn't know what possible side-effects calling
lfsr_opened_isopen may have, and can't eliminate the function call.
This is quite a bit more obvious if you did something like:
LFS_ASSERT(lfsr_file_sync(&lfs, &file) == 0);
But since lfsr_opened_isopen is a static inline function, it gets a
little bit less clear. Even worse, whether or not the call is eliminated
probably depends if it's actually inlined and other compiler
optimization noise.
---
This commit effectively reverts LFS_ASSERT as a compiler hint, making
LFS_ASSERT an empty string if LFS_NO_ASSERT is defined.
This may not be the optimal solution, but it at least keeps the
programmer's intuition that anything in LFS_ASSERT has zero impact when
asserts are disabled. It would be nice if there was some way to tell the
compiler that an expression should have no side-effects, but as far as
I'm aware this is not currently possible.
Measurements show that just disabling asserts wins in both code and
stack over trying to leverage asserts-as-hints:
code stack
hint-assert (before): 33904 2584
no-assert (after): 33626 (-0.8%) 2552 (-1.2%)
At least both of these win over leaving asserts enabled, so
asserts-as-hints does eliminate _most_ code (measured here
with a simple assert-loop, since I assume that would have the smallest
code footprint):
code stack
loop-assert: 36874 2616
hint-assert (before): 33904 (-8.1%) 2584 (-1.2%)
no-assert (after): 33626 (-8.8%) 2552 (-2.4%)
Unfortunately asserts-as-hints was doing quite a bit of heavy lifting
at preventing overzealous GCC warnings. It took quite a few tweaks to
get GCC to shut up, and I'm still not entirely sure the best way to tell
GCC that some functions only return negative values. Currently I just
limit certain error checks to only check for negative values, which is
not great, but at least gets the code compiling again...
This adopts upstream opened/closed assertions, which are useful for
catching user mistakes (note the bug fixes in our tests):
- Assert if already open in lfsr_*_open
- Assert if not open in lfsr_file_* and lfsr_dir_* functions
- Assert if any files/dirs are still open in lfsr_unmount
Unfortunately this had a surprising code cost for what really should
have been a noop as far as the compiler is concerned. And saved a bit of
stack? Maybe our assertion hints are causing a surprising amount of code
movement? I'm really not sure what's going on and this deserves more
investigation:
code stack
before: 33686 2592
after: 33904 (+0.6%) 2584 (-0.3%)
At the very least this didn't add a noticable amount of testing time. I
was a bit concerned because our orphan/zombie testing grows opened-list
operations ~O(n^2), but any measurable overhead is less than how much
our test runtime swings between runs (+-~20s).
Before, lfsr_mount would return LFS_ERR_INVAL if it could not mount the
filesystem for any reason. This matches POSIX's mount behavior, but is,
in my humble opinion, unhelpful... A corrupted filesystem image is an
"invalid parameter"?
This splits lfsr_mount's failed-to-mount behavior into two error codes:
- LFS_ERR_CORRUPT - Failed to mount because something was corrupted.
Unlikely disk contains a littlefs image.
- LFS_ERR_NOTSUP - Failed to mount because on-disk filesystem is
incompatible. Reconfiguring your driver may successfully mount.
This offers a bit more of a hint to users on why mount failed. Though
relevant error logs will probably have more useful information. Worst
case users can always treat CORRUPT/NOTSUP the same after calling
lfsr_mount.
Code changes:
code stack
before: 33674 2592
after: 33686 (+0.0%) 2592 (+0.0%)
I realized we really can't do anything if we find a file of unknown
type... If we don't understand a file's data structure, we can't really
do any bookkeeping. Allocating new blocks will probably corrupt unknown
files since we can't traverse any related B-trees, and mdir compaction
would be an absolute mess.
So, instead, just print an error and bail during mount.
Eventually we could at least fallback to readonly mode, but this is
currently a TODO item.
This also means the LFS_ERR_NOTSUP logic in lfsr_mtree_pathlookup is no
longer needed. Since, even with readonly fallback, we should never
mutate a filesystem with unknown file types.
Maybe in the future we could have a sort of known-but-not-supported mode
for file types? So special file types could not be support, but at least
understood enough to support traversal/remove/rename/etc?
Code changes:
code stack
before: 33694 2592
after: 33674 (-0.1%) 2592 (+0.0%)
These don't really rely on any advanced file operations, and can run in
parallel.
This was a leftover from when test_incompat+test_compat were merged, and
test_compat should probably run after all file operations are thoroughly
tested.
Returning the actual on-disk file type is probably more useful for users
as this gives them more information.
I was originally concerned about collisions with future internal types,
LFS_TYPE_TRAVERSAL, etc, needed for internal opened-list tracking, but
it turns out we can avoid problems by starting internal types at 0x80,
since on-disk file types are only 7-bits.
Code changes:
code stack
before: 33710 2592
after: 33694 (-0.0%) 2592 (+0.0%)
This adds a couple things so our unknown file types don't just cause our
filesystem to fall over:
- lfsr_mount now prints a warning on any unknown file types found at
mount time. Since we're already iterating over all files to find
orphans, this is basically free.
- Added LFS_TYPE_UNKNOWN to represent files with an unknown/unsupported
type. This is now returned by lfsr_stat/lfsr_dir_read for files of any
unknow type.
- Added LFS_ERR_NOTSUP. This is now returned by functions that attempt
to modify a file of unknown type, and my have more use cases in the
future.
It's tempting to allow remove/rename on unknown file types, but since
we don't know what data structures these may be referencing, doing so
would likely leak storage. Or worse. Shrubs for example would just
explode if you only moved the metadata entry.
This also adds test_incompat_unknown to test these cases.
Code changes are minimal, though there are a number of extra conditions
to check for unknown file types. The lfsr_mount condition is
particularly fun as it should be completely optimized out when debug
statements are disabled:
code stack
before: 33670 2592
after: 33710 (+0.1%) 2592 (+0.0%)
Unlike the other test_compat tests, the test_incompat tests cover
specific corner cases and don't require any special linking. We probably
always want to run these, and keeping them merged with test_compat risks
the entire suite being omitted at some point.
The test_compat tests are a bit special and probably deserves a
dedicated test suite.
test_compat has been very useful for testing compatibility on patch and
minor releases.
Though, in porting the tests, I've realized these are actually really
flimsy w.r.t. API changes... lfsp_config notably relies on compatible
struct layouts, which is _not_ guaranteed by littlefs's compatibility
rules.
For this reason I've restricted these tests to only run if LFS_VERSION
doesn't change, though this may be worth reinvestigating in the future.
test_compat on minor API releases would be quite valuable...
lfsr_fs_stat is also not quite up to date with upstream yet. It's really
just a small shim copying over static configs at the moment (except for
name_limit/file_limit). This is because we're still missing most of what
would actually be interesting here: variable block counts, minor
versions, etc.
And of course a minimal lfsr_fs_stat means minimal code changes:
code stack
before: 33642 2592
after: 33670 (+0.1%) 2592 (+0.0%)
See comments/previous commits. lfsr_fs_mkconsistent allows running
internal consistency operations without any other filesystem changes.
Implementation-wize, this just calls lfsr_fs_preparemutation which we
already need to, uh, prepare for mutation. Though it may do some
additional work in the future, such as setting compat flags, version
numbers, etc.
Added mkconsistent permutations to what seems like the relevant tests:
- test_forphans - easy for lfsr_fs_mkconsistent to accidentally delete
orphans/zombies.
- test_powerloss - heavy fuzz tests over powerloss-related consistency
operations, though this does multiply every permutation by ~2x...
Code cost minimal. I guess this is what it costs to make an internal
function non-static:
code stack
before: 33634 2592
after: 33642 (+0.0%) 2592 (+0.0%)
Changed:
- lfsr_mkdir(&lfs, "/") => LFS_ERR_EXIST
- lfsr_file_open(&lfs, &file, "/", *) => LFS_ERR_ISDIR
Unchanged:
- lfsr_remove(&lfs, "/") => LFS_ERR_INVAL
- lfsr_rename(&lfs, "/", *) => LFS_ERR_INVAL
- lfsr_rename(&lfs, *, "/") => LFS_ERR_INVAL
This better matches what Linux, etc, does: prefering a normal
dir-related error unless the only issue is that the dir in question is
the root.
Though Linux, etc, usually return EBUSY, which seems to also be used for
special device files. We could add LFS_ERR_BUSY, but I'm not sure it's
really worth it for such a rare error. It's not like the name would help
anything...
Internally, lfsr_mtree_pathlookup always returns LFS_ERR_INVAL for root,
so this unfortunately requires a bit more code to map to the correct
errors:
code stack
before: 33598 2592
after 33634 (+0.1%) 2592 (+0.0%)
Turns out there's very _very_ small powerloss hole in our current
perturb logic.
We rely on tag valid bits to validate perturb bits, but these
intentionally don't end up in the commit checksum. This means there will
always be a powerloss hole when we write the last valid bit. If we lose
power after writing that bit, suddenly the remaining commit and any
following commits may appear as valid.
Now, this is really unlikely considering we need to lose power exactly
when we write the cksum tag's valid bit, and our nonce helps protect
against this. But a hole is a hole.
The solution here is to include the _current_ perturb bit (q) in the
commit's cksum tag, alongside the _next_ perturb bit (p). This will be
included in the commit's checksum, but _not_ in the canonical checksum,
allowing the commit's checksum validate the current perturb state
without ruining our erased-state agnostic checksums:
.---+---+---+---. . . .---+---+---+---. \ \ \ \
|v| tag | |v| tag | | | | |
+---+---+---+---+ +---+---+---+---+ | | | |
| commit | | commit | | | | |
| | | | +-. | | |
+---+---+---+---+ +---+---+---+---+ / | | | |
|v|qp-------------. |v|qp| tag | | . . .
+---+---+---+---+ | +---+---+---+---+ | . . .
| cksum | | | cksum | | . . .
+---+---+---+---+ | +---+---+---+---+ | . . .
| padding | | | padding | | . . .
| | | | | | . . .
+---+---+---+---+ | . +---+---+---+---+ | | | |
| erased | +-> |v------------------' | | |
| | | +---+---+---+---+ | | |
. . | | commit | +-. | +- rbyd
. . | |.----------------. | | | | cksum
| +| -+---+---+---+ | / | +-. /
+-> |v|qp| tag | '-----' | |
| +- ^ ---+---+---+ / |
'------' cksum ----------------'
+---+---+---+---+
| padding |
| |
+---+---+---+---+
| erased |
| |
. .
. .
(Ok maybe this diagram needs work...)
This adds another thing that needs to be checked during rbyd fetch, and
note, we _do_ need to explicitly check this, but it solves the problem.
If power is loss after v, q would be invalid, and if power is lost after
q, our cksum would be invalid.
Note this would have also been an issue for the previous cksum + parity
perturb scheme.
Code changes:
code stack
before: 33570 2592
after: 33598 (+0.1%) 2592 (+0.0%)
The previous cksum + parity scheme worked, but needing to calculate both
cksum + parity on slightly different sets of metadata felt overly
complicated. After taking a step back, I've realized the problem is that
we're trying to force perturb effects to be implicit via the parity. If we
instead actually implement perturb effects explicitly, things get quite
a bit simpler...
This does add a bit more logic to the read path, but I don't think it's
worse than the mess we needed to parse separate cksum + parity.
Now, the perturb bit has the explicit behavior of inverting all tag
valid bits in the following commit. Which is conveniently the same as
xoring the crc32c with 00000080 before parsing each tag:
.---+---+---+---. . . .---+---+---+---. \ \ \ \
|v| tag | |v| tag | | | | |
+---+---+---+---+ +---+---+---+---+ | | | |
| commit | | commit | | | | |
| | | | +-. | | |
+---+---+---+---+ +---+---+---+---+ / | | | |
|v|p--------------. |v|p| tag | | . . .
+---+---+---+---+ | +---+---+---+---+ | . . .
| cksum | | | cksum | | . . .
+---+---+---+---+ | +---+---+---+---+ | . . .
| padding | | | padding | | . . .
| | | | | | . . .
+---+---+---+---+ | . +---+---+---+---+ | | | |
| erased | +-> |v------------------' | | |
| | | +---+---+---+---+ | | |
. . | | commit | +-. | +- rbyd
. . | | | | | | | cksum
| +---+---+---+---+ / | +-. /
'-> |v----------------------' | |
+---+---+---+---+ / |
| cksum ----------------'
+---+---+---+---+
| padding |
| |
+---+---+---+---+
| erased |
| |
. .
. .
With this scheme, we don't need to calculate a separate parity, because
each valid bit effectively validates the current state of the perturb
bit.
We also don't need extra logic to omit valid bits from the cksum,
because flipping all valid bits effectively makes perturb=0 the
canonical metadata encoding and cksum.
---
I also considered only inverting the first valid bit, which would have
the additional benefit of allowing entire commits to be crc32ced at
once, but since we don't actually track when we've started a commit
this turned out to be quite a bit more complicated than I thought.
We need someway to validate the first valid bit, otherwise it could be
flipped by a failed prog and we'd never notice. This is fine, we can
store a copy of the previous perturb bit in the next cksum tag, but it
does mean we need to track the perturb bit for the duration of the
commit. So we'd end up needing to track both start-of-commit and the
perturb bit state, which starts getting difficult to fit into our rbyd
struct...
It's easier and simpler to just flip every valid bit. As a plus this
means every valid bit contributes to validating the perturb bit.
---
Also renamed LFSR_TAG_PERTURB -> LFSR_TAG_NOISE just to avoid confusion.
Though not sure if this tag should stick around...
The end result is a nice bit of code/stack savings, which is what we'd
expect with a simpler scheme:
code stack
before: 33746 2600
after: 33570 (-0.5%) 2592 (-0.3%)
Turns out we don't need SHRUBALLOC, as we can infer if we need to reset
the shrub based on if it already exists in our mdir. Not in mdir =>
needs to alloc/reset.
This saves an internal tag and a bit of code:
code stack
before: 33770 2600 (+0.0%)
after: 33746 (-0.1%) 2600 (+0.0%)
Now that we are testing more powerloss behaviors, test_powerloss is the
longest running test suite by a decent margin:
Before:
$ ./scripts/summary.py test.csv -bsuite -ftime -Stime
... snip ...
test_rbyd 578.6
test_fwrite 984.5
test_badblocks 1341.5
test_exhaustion 1648.3
test_powerloss 2192.3 <--
TOTAL 7378.6
$ ./scripts/summary.py test.csv -bcase -ftime -Stime
... snip ...
test_fwrite_fuzz_aligned 247.2
test_exhaustion_file_fuzz 287.7
test_exhaustion_dir_fuzz 307.2
test_exhaustion_orphanzombie_fuzz 389.1
test_exhaustion_orphanzombiedir_fuzz 531.5
test_powerloss_file_pl_fuzz 787.7 <--
test_badblocks_single_dir_many 840.2
test_powerloss_filedir_pl_fuzz 1366.7 <--
TOTAL 7378.6
But testing more things is better than testing the same thing more.
Worst case you can always manually override OPS, -DOPS=1024, if you have
CI cycles to spare. Though note with our linear powerloss heuristic,
the tail end of long running tests also recieves fewer powerlosses,
which reduces the usefulness of running these tests longer.
These *_pl_fuzz tests also now match the default number of OPS in
test_relocations.
These emulate powerloss behavior where only some of the bits being
progged are actually progged if there is a powerloss. This behavior was
the original motivation for our ecksums/fcrcs, so it's good to have this
tested.
As a simplification, these only test the extremes:
- LFS_EMUBD_POWERLOSS_SOMEBITS => one bit progged
- LFS_EMUBD_POWERLOSS_MOSTBITS => all-but-one bit progged
Also they flips bits instead of preserving exact partial prog behavior,
but this is allowed (progs can have any intermediate value), has the
same effect as partial progs, and should encourage failed progs.
This required a number of tweaks in emubd: moved powerloss before prog,
moved mutate after powerloss, etc, but these shouldn't affect other
powerloss behaviors. Handling powerloss after prog was only to avoid
power_cycles=1 being useless, it's not strictly required.
Good news is testing so far suggests our ecksum design is sound.
Now, instead of reverting only the first block on powerloss, _all_
blocks since the last sync are reverted (except the in-flight block, if
you reverted that it would be the same as noop powerloss).
It was a bit frustrating trying to reproduce known holes in our sync
logic before this, but reverting all blocks really is the worst case,
so we should have quite a bit more confidence going forward.
This was a bit tricky to implement without memory leaks everywhere,
since we need to be able to resume for exhaustive powerloss testing. But
emubd's copy-on-write block emulation really shines here.
It's counter-intuitive, but no top-level API should return
LFS_ERR_CORRUPT. Instead, if we can't make progress because of a corrupt
block, we should return LFS_ERR_NOSPC. This makes it easier for users to
write code that is well behaved even when a device is end-of-life.
It's up to our mroot extension algorithm to make sure this case can't be
reached in normal operation unless the device is _actually_ at
end-of-life.
Because mroot extension is a bit of a special case, we weren't
converting these corrupt errors to nospc errors consistently. This is
fixed now, along with a couple more hopefully-useful logging statements.
Found while playing around with test_exhaustion + block_recycles=-1.
This should assert on bad wear-leveling, but LFS_ERR_CORRUPT was
unexpected. Added an explicit test because this is an easy thing to let
split through:
- test_badblocks_mrootanchor_wear
Code changes were surprisingly minimal, I wonder if constants are being
swapped out somewhere low-level?
code stack
before: 33766 2600
after: 33770 (+0.0%) 2600 (+0.0%)
This number was an incorrect result caued by overallocating the
small-table array (64-ints vs 64-bytes), and has unfortunately crept
into too many places...
The original idea was to allow merging a whole bunch of different csv
results into a single lfs.csv file, but this never really happened. It's
much easier to operate on smaller context-specific csv files, where the
field prefix:
- Doesn't really add much information
- Requires more typing
- Is confusing in how it doesn't match the table field names.
We can always use summary.py -fcode_size=size to add prefixes when
necessary anyways.
This is equivalent to just omitting the flag, but makes it a bit easier
to switch between summary.py and more specific scripts such as code.py,
where -u/--use is needed to operate on csv files.
We already rely on this symbol in these scripts, so might use it to
display the mathematically correct ratio for new entries.
This has the added benefit of ordering new entries vs extremely big
changes correctly:
$ ./scripts/code.py -u test.after.csv -d test.before.csv
function (1 added, 0 removed) osize nsize dsize
test_a - 49 +49 (+∞%)
test_b 19 719 +700 (+3684.2%)
test_c 91 191 +100 (+109.9%)
TOTAL 110 959 +849 (+771.8%)
This is a bit more complicated, but make testmarks really showed how
confusing this could get.
Now, instead of:
suite passed time
test_alloc 304/304 1.6 (100.0%)
test_badblocks 6880/6880 1323.3 (100.0%)
... snip ...
test_rbyd 385878/385878 592.7 (100.0%)
test_relocations 7899/7899 318.8 (100.0%)
TOTAL 548206/548206 6229.7 (100.0%)
Percents/notes are interspersed next to their relevant fields:
suite passed time
test_alloc 304/304 (100.0%) 1.6
test_badblocks 6880/6880 (100.0%) 1323.3
... snip ...
test_rbyd 385878/385878 (100.0%) 592.7
test_relocations 7899/7899 (100.0%) 318.8
TOTAL 548206/548206 (100.0%) 6229.7
Note has no effect on scripts with only a single field (code.py, etc).
But it does make multi-field diffs a bit more readable:
$ ./scripts/stack.py -u after.stack.csv -d before.stack.csv -p
function frame limit
lfsr_bd_sync 8 (+100.0%) 216 (+100.0%)
lfsr_bd_flush 40 (+25.0%) 208 (+4.0%)
... snip ...
lfsr_file_flush 32 (+0.0%) 2424 (-0.3%)
lfsr_file_flush_ 216 (-3.6%) 2392 (-0.3%)
TOTAL 9008 (+0.4%) 2600 (-0.3%)
This lets you view the first n lines of output instead of the last n
lines, as though the output was piped through head.
This is how the standard watch command works, and can be more useful
when most of the information is at the top, such as in our dbg*.py
scripts (watch.py was originally used as a sort of inotifywait-esque
build runner, which is the main reason it's different).
To make this work, RingIO (renamed from LinesIO) now uses terminal
height as a part of its canvas rendering. This has the added benefit of
more rigorously enforcing the canvas boundaries, but risks breaking when
not associated with a terminal. But that raises the question, does
RingIO even make sense without a terminal?
Worst case you can bypass all of this with -z/--cat.
The main test additions are the test_powerloss tests, intended to be
high-level tests over difficult/weird powerloss environments (such as
out-of-order writes!):
- test_powerloss_dir_many - 2242 pls
- test_powerloss_file_many - 8856 pls
- test_powerloss_file_pl_fuzz - 384508 pls
- test_powerloss_filedir_pl_fuzz - 268339 pls
But there was also a bunch of other test movement in the late-stage/
high-level tests. I'm trying to keep the core of these tests somewhat
consistent so we have a nice template to extend for future testing, in
case we want to test other environmentalish concerns, but not all of
these tests make sense in all of these contexts:
badblocks powerloss relocations exhaustion
dir_many y y y
dir_fuzz y y y
file_many y y y
file_fuzz y y y
fwrite_fuzz y y
orphanzombie_fuzz y y y
orphanzombiedir_fuzz y y y
file_pl_fuzz y y
filedir_pl_fuzz y y
Why not:
- dir/file_many+exhaustion? - Needs to be unbounded
- dir/file_fuzz+powerloss? - Takes O(n^2)
- fwrite_fuzz+powerloss? - Takes O(n^2)
- fwrite_fuzz+relocations? - Doesn't really test anything
- orphanzombie*_fuzz+powerloss? - Powerloss kills zombies
- file*_pl_fuzz+badblocks? - PL + Badblocks currently incompactible
- file*_pl_fuzz+exhaustion? - PL + Badblocks currently incompactible
---
Of course, in order to actually get out-of-order write testing working,
we need to implement out-of-order write syncing.
Fortunately this was a simple exercise in placing lfsr_bd_sync calls
before any mdir commits where we may have unsynced data:
- in lfsr_file_sync, to sync any pending file data
- in lfsr_mdir_commit, to sync any mroot/mtree changes
We also call lfsr_bd_sync _after_ mdir commits in case users expect to
sequence any filesystem-external operations such as network, UI, etc. In
theory this could be optional, but no users have really requested it
yet, so leave that for a potential future improvement:
- in lfsr_mdir_commit
- in lfsr_formatinited (really just because we don't go through
lfsr_mdir_commit)
Note that lfsr_rbyd_commit has been relaxed in the scheme. It only
flushes caches, and does _not_ call lfsr_bd_sync. This is useful for
allowing multiple B-tree nodes to be written out-of-order, also long as
the whole thing is synchronized before any mdir commit.
All of these lfsr_bd_sync calls add a bit of code, but not really an
amount to care about:
code stack
before: 33678 2600
after: 33766 (+0.3%) 2600 (+0.0%)
More information upstream (f2a6f45, fc2aa33, 7873d81), but this adds
LFS_EMUBD_POWERLOS_OOO for testing out-of-order block devices that
require sync to be called for things to serialize. It's a simple
implementation, just reverts the first write since last sync on
powerloss, but gets the job done.
Cherry-picking these changes required reverting emubd's scratch buffer,
but carrying around an extra ~block_size of memory isn't a big deal
here.
This sort of reverts the addition of lfsr_bd_unprog, but with a slightly
better API. lfsr_bd_unprog was too much of a hack, and isn't really
generalizable. The align flag isn't necessarily any better, but at least
it's the simplest/least-confusing solution available.
And it's net savings, code-wise:
code stack lfs_t
before: 33690 2608 164
after: 33678 (-0.0%) 2600 (-0.3%) 160 (-2.4%)
The only real use case for the bd runtime bounds checks is to abort rbyd
commits when they run off the end of the block. Since rbyd's now have
their own set of low-level append functions, we're better off doing the
bounds checks there and changing all of the lfsr_bd_* bounds checks to
asserts.
Block overflows are a particularly easy mistake to make, and one that
would be good to catch early.
One interesting thing to note: We're now using LFSR_TAG_DSIZE for range
checks instead of the actual tag encoding. This may seem suboptimal, but
if LFSR_TAG_DSIZE can't fit in the remaining space in the block, the
cksum tag wouldn't be able to fit anyways. So we're not really wasting
any space.
This saves a nice bit of code:
code stack
before: 33690 2608
after: 33610 (-0.2%) 2608 (+0.0%)
While exploring the test_badblocks ERASENOOP failure more, I realized
the problem is that we are nesting crc32cs.
To be clear, using crc32cs to validate progs in general is not an issue,
that is perfectly fine on paper. The issue is that we were using crc32cs
to validate progs _that contain crc32cs_.
Looking at the collision, we can see the fully expanded lleb128s we use
for our cksum tags:
00 00 00 ff b0 02 00 87 80 80 00 3e c0 7f 7e => bdfa9b10
ab 77 de c2 b0 03 00 87 80 80 00 3e 38 d5 22 => bdfa9b10
'-.-' ^ '----.----' '----.----'
'----|------|-----------|-- cksum tag
'------|-----------|-- cksum weight (0)
'-----------|-- cksum size + padding
'-- cksum crc32c
So we ended up perfectly aligning the cksum's crc32c with our cache
line. Lucky us.
Unfortunately funny math makes it so that whenever a crc32c contains a
crc32c, the inner crc32c sort of cancels itself out from the outer
crc32c. So these two messages end up mathematically equivalent, even
though they contain different data:
crc(m) = m(x) x^|P|-1 mod P
crc(m ++ crc(m)) = (m(x) x^|P|-1 + (m(x) x^|P|-1 mod P)) x^|P|-1 mod P
crc(m ++ crc(m)) = (m(x) x^|P|-1 + m(x) x^|P|-1) x^|P|-1 mod P
crc(m ++ crc(m)) = 0 x^|P|-1 mod P
crc(m ++ crc(m)) = 0
So using a crc32c to check progs is not fit for purpose.
This leaves us with a couple options:
1. Use a different checksum, or do something like rearranging bytes to
avoid this cancelling out issue. Unfortunately this gets tricky since
crc32cs are linear, simply using an xor mask won't work...
2. Don't check progs at such a low-level, but at a high-level using the
rbyd/data block crc32cs. Since this would mean only one crc32c, this
would avoid nesting issues. Unfortunately this would probably come
with quite a high code cost to try to keep track of both the
before+after rbyd cksums everywhere...
3. Just read back the data into the rcache to compare at the byte-level,
which would mean clobbering our rcache when prog checking is enabled.
This commit goes with option 3., which is probably the simplest. It also
removes any question of crc32c collision, which could be a real nuisance
when debugging low-level block device operations, a use case where prog
checking will hopefully be quite valuable.
Clobbering the rcache also has the advantage of reverting the prog
>= read requirement, which is nice for flexibility. Though this needs to
be tested.
---
There was a bit of a hiccup, and that was how prog checking interacts
with lfsr_bd_cpy. lfsr_bd_cpy used the rcache to hold data being copied
to/from disk, but this data needs to be checked, and prog checking would
clobber the rcache. Problems! I guess this is one footgun of the
internal lfsr_bd_readnext API...
The solution is to instead turn this around and use the pcache to hold
any copied data, since this would not be clobbered when prog checking.
This has some other knock-on effects, mainly that we can't take
advantage of read hints in lfsr_bd_cpy, but has the added advantage of
potentially not clobbering the rcache at all when no checking progs.
Code changes were fairly minimal:
code stack
before: 33718 2608
after: 33690 (-0.1%) 2608 (+0.0%)
The initial goal was the simplify these layers. Keyword being initial.
Unfortunately these layers are both complex and subtle, so the goal
shifted more to be rigorous and reliable.
This mainly meant rearranging our prog/read loops to follow a consistent
style, with higher-priority buffers being sorted out before flushing
things. This gets a bit tricky with wanting to support both cache
bypassing and buffer-lending prognext/readnext, but with some redundant
prognext/readnext calls it's doable.
We also now aggressively discard rcaches on pcache conflicts. This
change does rely on the prog >= read assumption. Discarding rcaches
means we should no longer have overlapping caches, so hopefully no more
zombie rcache issues.
Our bypassing heuristic was also tweaked a bit. Now, in addition to
alignment, >= read/prog_size, and >= hint requirements, we also require
operations to be >= r/pcache_size. This should improve cache usage when
r/pcache_size >> read/prog_size, since we were too eager to bypass
before.
Long story short, this ended up being more just things shifting around
than a significant simplification of the bd layers. At least we ended up
with a nice bit of stack savings:
code stack
before: 33682 2640
after: 33718 (+0.1%) 2608 (-1.2%)
Also, test_badblocks with LFS_EMUBD_BADBLOCK_ERASENOOP is now failing. I
was worried the amount of fuzz testing we do would eventually end up
with a naturally occuring crc32c collision, and sure enough we did! Yayy
yyyy...
00 00 00 ff b0 02 00 87 80 80 00 3e c0 7f 7e => bdfa9b10
ab 77 de c2 b0 03 00 87 80 80 00 3e 38 d5 22 => bdfa9b10
Need to think about what to do with this... For now I've just commented
out the problematic test.