The documentation does not match the implementation here. The intended
behavior of superblock expansion was to duplicate the current superblock
entry into the new superblock:
.--------. .--------.
.|littlefs|->|littlefs|
||bs=4096 | ||bs=4096 |
||bc=256 | ||bc=256 |
||crc32 | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
The main benefit is that we can rely on the magic string "littlefs"
always residing in blocks 0x{0,1}, even if the superblock chain has
multiple superblocks.
The downside is that earlier superblocks in the superblock chain may
contain out-of-date configuration. This is a bit annoying, and risks
hard-to-reach bugs, but in theory shouldn't break anything as long as
the filesystem is aware of this.
Unfortunately this was lost at some point during refactoring in the
early v2-alpha work. A lot of code was moving around in this stage, so
it's a bit hard to track down the change and if it was intentional. The
result is superblock expansion creates a valid linked-list of
superblocks, but only the last superblock contains a valid superblock
entry:
.--------. .--------.
.|crc32 |->|littlefs|
|| | ||bs=4096 |
|| | ||bc=256 |
|| | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
What's interesting is this isn't invalid as far as lfs_mount is
concerned. lfs_mount is happy as long as a superblock entry exists
anywhere in the superblock chain. This is good for compat flexibility,
but is the main reason this has gone unnoticed for so long.
---
With the benefit of more time to think about the problem, it may have
been more preferable to copy only the "littlefs" magic string and NOT
the superblock entry:
.--------. .--------.
.|littlefs|->|littlefs|
||crc32c | ||bs=4096 |
|| | ||bc=256 |
|| | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
This would allow for simple "littlefs" magic string checks without the
risks associated with out-of-date superblock entries.
Unfortunately the current implementation errors if it finds a "littlefs"
magic string without an associated superblock entry, so such a change
would not be compatible with old drivers.
---
This commit tweaks superblock expansion to duplicate the superblock
entry instead of simply moving it to the new superblock. And adds tests
over the magic string "littlefs" both before and after superblock
expansion.
Found by rojer and Nikola Kosturski
Some forms of storage, mainly anything with an FTL, eMMC, SD, etc, do
not guarantee a strict write order for writes to different blocks. It
would be good to test that this doesn't break littlefs.
This adds LFS_EMUBD_POWERLOSS_OOO to lfs_emubd, which tells lfs_emubd to
try to break any order-dependent code on powerloss.
The behavior right now is a bit simple, but does result in test
breakage:
1. Save the state of the block on first write (erase really) after
sync/init.
2. On powerloss, revert the first write to its original state.
This might be a bit confusing when debugging, since the block will
appear to time-travel, but doing anything fancier would make emubd quite
a bit more complicated.
You could also get a bit fancier with which/how many blocks to revert,
but this should at least be sufficient to make sure bd sync calls are in
the right place.
Inlined files live in metadata and decrease storage requirements, but
may be limited to improve metadata-related performance. This is
especially important given the current plague of metadata performance.
Though decreasing inline_max may make metadata more dense and increase
block usage, so it's important to benchmark if optimizing for speed.
The underlying limits of inlined files haven't changed:
1. Inlined files need to fit in RAM, so <= cache_size
2. Inlined files need to fit in a single attr, so <= attr_max
3. Inlined files need to fit in 1/8 of a block to avoid metadata
overflow issues, this is after limiting by metadata_max,
so <= min(metadata_max, block_size)/8
By default, the largest possible inline_max is used. This preserves
backwards compatibility and is probably a good default for most use
cases.
This does have the awkward effect of requiring inline_max=-1 to
indicate disabled inlined files, but I don't think there's a good
way around this.
This extends lfs_fs_gc to now handle three things:
1. Calls mkconsistent if not already consistent
2. Compacts metadata > compact_thresh
3. Populates the block allocator
Which should be all of the janitorial work that can be done without
additional on-disk data structures.
Normally, metadata compaction occurs when an mdir is full, and results in
mdirs that are at most block_size/2.
Now, if you call lfs_fs_gc, littlefs will eagerly compact any mdirs that
exceed the compact_thresh configuration option. Because the resulting
mdirs are at most block_size/2, it only makes sense for compact_thresh to
be >= block_size/2 and <= block_size.
Additionally, there are some special values:
- compact_thresh=0 => defaults to ~88% block_size, may change
- compact_thresh=-1 => disables metadata compaction during lfs_fs_gc
Note that compact_thresh only affects lfs_fs_gc. Normal compactions
still only occur when full.
- Renamed lfs.free -> lfs.lookahead
- Renamed lfs.free.off -> lfs.lookahead.start
- Renamed lfs.free.i -> lfs.lookahead.next
- Renamed lfs.free.ack -> lfs.lookahead.ckpoint
- Renamed lfs_alloc_ack -> lfs_alloc_ckpoint
These have been named a bit confusingly, and I think the new names make
their relevant purposes a bit clearer.
At the very it's clear lfs.lookahead is related to the lookahead buffer.
(and doesn't imply a closed free-bitmap).
The idea is in the future this function may be extended to support other
block janitorial work. In such a case calling this lfs_fs_gc provides a
more general name that can include other operations.
This is currently just wishful thinking, however.
- Test that the code actually runs.
- Test that lfs_fs_findfreeblocks does not break block allocations.
- Test that lfs_fs_findfreeblocks does not error when no space is
available, it should only errors when the block is actually needed.
The initial implementation for this was provided by kaetemi, originally
as a mount flag. However, it has been modified here to be self-contained
in an explicit runtime function that can be called after mount.
The reasons for an explicit function:
1. lfs_mount stays a strictly readonly operation, and avoids pulling in
all of the write machinery.
2. filesystem-wide operations such as lfs_fs_grow can be a bit risky,
and irreversable. The action of growing the filesystem should be very
intentional.
---
One concern with this change is that this will be the first function
that changes metadata in the superblock. This might break tools that
expect the first valid superblock entry to contain the most recent
metadata, since only the last superblock in the superblock chain will
contain the updated metadata.
In separating the configuration of littlefs from the physical geometry
of the underlying device, we can no longer rely solely on lfs_config to
contain all of the information necessary for the simulated block devices
we use for testing.
This adds a new lfs_*bd_config struct for each of the block devices, and
new erase_size/erase_count fields. The erase_* name was chosen since
these reflect the (simulated) physical erase size and count of
erase-sized blocks, unlike the block_* variants which represent logical
block sizes used for littlefs's bookkeeping.
It may be worth adopting erase_size/erase_count in littlefs's config at
some point in the future, but at the moment doesn't seem necessary.
Changing the lfs_bd_config structs to be required is probably a good
idea anyways, as it moves us more towards separating the bds from
littlefs. Though we can't quite get rid of the lfs_config parameter
because of the block-device API in lfs_config. Eventually it would be
nice to get rid of it, but that would require API breakage.
The intention is to help interop with older minor versions of littlefs.
Unfortunately, since lfs2.0 drivers cannot mount lfs2.1 images, there are
situations where it would be useful to write to write strictly lfs2.0
compatible images. The solution here adds a "disk_version" configuration
option which determines the behavior of lfs2.1 dependent features.
Normally you would expect this to only change write behavior. But since the
main change in lfs2.1 increased validation of erased data, we also need to
skip this extra validation (fcrc) or see terrible slowdowns when writing.
In terms of ease-of-use, a user familiar with other filesystems expects
block_usage in fsinfo. But in terms of practicality, block_usage can be
expensive to find in littlefs, so if it's not needed in the resulting
fsinfo, that operation is wasteful.
It's not clear to me what the best course of action is, but since
block_usage can always be added to fsinfo later, but not removed without
breaking backwards compatibility, I'm leaving this out for now.
Block usage can still be found by explicitly calling lfs_fs_size.
Version are now returned with major/minor packed into 32-bits,
so 0x00020001 is the current disk version, for example.
1. This needed to change to use a disk_* prefix for consistency with the
defines that already exist for LFS_VERSION/LFS_DISK_VERSION.
2. Encoding the version this way has the nice side effect of making 0 an
invalid value. This is useful for adding a similar config option
that needs to have reasonable default behavior for backwards
compatibility.
In theory this uses more space, but in practice most other config/status
is 32-bits in littlefs. We would be wasting this space for alignment
anyways.
This function naturally doesn't exist in the previous version. We should
eventually add these calls when we can expect the previous version to
support this function, though it's a bit unclear when that should happen.
Or maybe not! Maybe this is testing more of the previous version than we
really care about.
LFS_VERSION -> LFS_DISK_VERSION
These tests shouldn't depend on LFS_VERSION. It's a bit subtle, but
LFS_VERSION versions the API, and LFS_DISK_VERSION versions the
on-disk format, which is what test_compat should be testing.
Currently this includes:
- minor_version - on-disk minor version
- block_usage - estimated number of in-use blocks
- name_max - configurable name limit
- file_max - configurable file limit
- attr_max - configurable attr limit
These are currently the only configuration operations that need to be
written to disk. Other configuration is either needed to mount, such as
block_size, or does not change the on-disk representation, such as
read/prog_size.
This also includes the current block usage, which is common in other
filesystems, though a more expensive to find in littlefs. I figure it's
not unreasonable to make lfs_fs_stat no worse than block allocation,
hopefully this isn't a mistake. It may be worth caching the current
usage after the most recent lookahead scan.
More configuration may be added to this struct in the future.
lfs_fs_mkconsistent allows running the internal consistency operations
(desuperblock/deorphan/demove) on demand and without any other
filesystem changes.
This can be useful for front-loading and persisting consistency operations
when you don't want to pay for this cost on the first write to the
filesystem.
Conveniently, this also offers a way to force the on-disk minor version
to bump, if that is wanted behavior.
Idea from kasper0
The underlying issue is that lfs_fs_deorphan did not updating gstate
correctly. The way it determined if there are any orphans remaining in
the filesystem was by subtracting the number of found orphans from an
internal counter.
This internal counter is a leftover from a previous implementation that
allowed leaving the lfs_fs_deorphan loop early if we know the number of
expected orphans. This can happen during recursive mdir relocations, but
with only a single bit in the gstate, can't happen during mount. If we
detect orphans during mount, we set this internal counter to 1, assuming
we will find at least one orphan.
But this presents a problem, what if we find _no_ orphans? If this happens
we never decrement the internal counter of orphans, so we would never
clear the bit in the gstate. This leads to a running lfs_fs_deorphan
on more-or-less every mutable operation in the filesystem, resulting in
an extreme performance hit.
The solution here is to not subtract the number of found orphans, but assume
that when our lfs_fs_deorphan loop finishes, we will have no orphans, because
that's the whole point of lfs_fs_deorphan.
Note that the early termination of lfs_fs_deorphan was dropped because
it would not actually change the runtime complexity of lfs_fs_deorphan,
adds code cost, and risks fragile corner cases such as this one.
---
Also added tests to assert we run lfs_fs_deorphan at most once.
Found by kasper0 and Ldd309
This just means a rewrite of the superblock entry with the new minor
version.
Though it's interesting to note, we don't need to rewrite the superblock
entry until the first write operation in the filesystem, an optimization
that is already in use for the fixing of orphans and in-flight moves.
To keep track of any outdated minor version found during lfs_mount, we
can carve out a bit from the reserved bits in our gstate. These are
currently used for a counter tracking the number of orphans in the
filesystem, but this is usually a very small number so this hopefully
won't be an issue.
In-device gstate tag:
[-- 32 --]
[1|- 11 -| 10 |1| 9 ]
^----^-----^--^--^-- 1-bit has orphans
'-----|--|--|-- 11-bit move type
'--|--|-- 10-bit move id
'--|-- 1-bit needs superblock
'-- 9-bit orphan count
This is a bit tricky since we need two different version of littlefs in
order to test for most compatibility concerns.
Fortunately we already have scripts/changeprefix.py for version-specific
symbols, so it's not that hard to link in the previous version of
littlefs in CI as a separate set of symbols, "lfsp_" in this case.
So that we can at least test the compatibility tests locally, I've added
an ifdef against the expected define "LFSP" to define a set of aliases
mapping "lfsp_" symbols to "lfs_" symbols. This is manual at the moment,
and a bit hacky, but gets the job done.
---
Also changed BUILDDIR creation to derive subdirectories from a few
Makefile variables. This makes the subdirectories less manual and more
flexible for things like LFSP. Note this wasn't possible until BUILDDIR
was changed to default to "." when omitted.
Removed the weird alignment requirement from the general truncate tests.
This explicitly hid off-by-one truncation errors.
These tests now reveal the same issue as the block-sized truncation test
while also testing for other potential off-by-one errors.
When truncation is done on a file to the block size, there seems to be
an error where it points to an incorrect block. Perform a write /
truncate / readback operation to verify this issue.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
- General cleanup from integration, including cleaning up some older
commit code
- Partial-prog tests do not make sense when prog_size == block_size
(there can't be partial-progs!)
- Fixed signed-comparison issue in modified filebd
This change is necessary to handle out-of-order writes found by pjsg's
fuzzing work.
The problem is that it is possible for (non-NOR) block devices to write
pages in any order, or to even write random data in the case of a
power-loss. This breaks littlefs's use of the first bit in a page to
indicate the erase-state.
pjsg notes this behavior is documented in the W25Q here:
https://community.cypress.com/docs/DOC-10507
---
The basic idea here is to CRC the next page, and use this "erase-state CRC" to
check if the next page is erased and ready to accept programs.
.------------------. \ commit
| metadata | |
| | +---.
| | | |
|------------------| | |
| erase-state CRC -----. |
|------------------| | | |
| commit CRC ---|-|-'
|------------------| / |
| padding | | padding (doesn't need CRC)
| | |
|------------------| \ | next prog
| erased? | +-'
| | | |
| v | /
| |
| |
'------------------'
This is made a bit annoying since littlefs doesn't actually store the
page (prog_size) in the superblock, since it doesn't need to know the
size for any other operation. We can work around this by storing both
the CRC and size of the next page when necessary.
Another interesting note is that we don't need to any bit tweaking
information, since we read the next page every time we would need to
know how to clobber the erase-state CRC. And since we only read
prog_size, this works really well with our caching, since the caches
must be a multiple of prog_size.
This also brings back the internal lfs_bd_crc function, in which we can
use some optimizations added to lfs_bd_cmp.
Needs some cleanup but the idea is passing most relevant tests.
When you add a function to every benchmark suite, you know if should
probably be provided by the benchmark runner itself. That being said,
randomness in tests/benchmarks is a bit tricky because it needs to be
strictly controlled and reproducible.
No global state is used, allowing tests/benches to maintain multiple
randomness stream which can be useful for checking results during a run.
There's an argument for having global prng state in that the prng could
be preserved across power-loss, but I have yet to see a use for this,
and it would add a significant requirement to any future test/bench runner.
These are just incorrect limits in the tests that can be triggered by
powerloss testing, which can end up with more metadata-pairs than
without powerloss testing due to orphans.
- Fixed prettyasserts.py parsing when '->' is in expr
- Made prettyasserts.py failures not crash (yay dynamic typing)
- Fixed the initial state of the emubd disk file to match the internal
state in RAM
- Fixed true/false getting changed to True/False in test.py/bench.py
defines
- Fixed accidental substring matching in plot.py's --by comparison
- Fixed a missed LFS_BLOCk_CYCLES in test_superblocks.toml that was
missed
- Changed test.py/bench.py -v to only show commands being run
Including the test output is still possible with test.py -v -O-, making
the implicit inclusion redundant and noisy.
- Added license comments to bench_runner/test_runner
These are really just different flavors of test.py and test_runner.c
without support for power-loss testing, but with support for measuring
the cumulative number of bytes read, programmed, and erased.
Note that the existing define parameterization should work perfectly
fine for running benchmarks across various dimensions:
./scripts/bench.py \
runners/bench_runner \
bench_file_read \
-gnor \
-DSIZE='range(0,131072,1024)'
Also added a couple basic benchmarks as a starting point.
The main benefit is small test ids everywhere, though this is with the
downside of needing longer names to properly prefix and avoid
collisions. But this fits into the rest of the scripts with globally
unique names a bit better. This is a C project after all.
The other small benefit is test generators may have an easier time since
per-case symbols can expect to be unique.
This is really more work for the bench runner. With this change defines
can be manipulated at a rather high level at runtime. Which should be
useful for generating benchmarks across various dimensions.
The define grammar in the test_runner is now a bit more powerful,
accepting:
1. A single value: -DN=42
2. A list of values, which get permuted: -DN=1,2,3
3. A range: -DN=range(10)
4. Some combo: -DN=1,2,range(3,0,-1)
This is more complex in the test .toml defines, which can also be C
expressions:
1. A single value: define=42
2. A single expression: define='42*42'
3. A list: define=[1,2,3]
4. A comma separated string: define='1,2,3'
5. A range: define='42*range(10)'
6. This mess: define=[1,2,'3,4,range(2)*range(2)+3']
On one hand this seems like the wrong place for these tests, on the
other hand, it's good to know that the block device is behaving as
expected when debugging the filesystem.
Maybe this should be moved to an external program for users to test
their block devices in the future?
This mostly required names for each test case, declarations of
previously-implicit variables since the new test framework is more
conservative with what it declares (the small extra effort to add
declarations is well worth the simplicity and improved readability),
and tweaks to work with not-really-constant defines.
Also renamed test_ -> test, replacing the old ./scripts/test.py,
unfortunately git seems to have had a hard time with this.
This was caused by the new lfs_file_rawseek optimization that can skip
flushing when calculated file->pos is unchanged combined with an
implicit expectation in lfs_file_truncate that lfs_file_rawseek
unconditionally sets file->pos.
Because of this assumption, lfs_file_truncate could leave file->pos in
an outdated state while changing the internal file metadata. Humorously,
this was always gauranteed to trigger the skip in lfs_file_rawseek when
we try to restore the file->pos, leaving the file->cache used to do the
CTZ skip-list lookup in a potentially bad state.
The easiest fix is to just update file->pos correctly. Note we don't
want to explicitly flush since we can leverage the same noop
optimization if we truncate to the file position. Which I've added a
test for.
These two features have been much requested by users, and have even had
several PRs proposed to fix these in several cases. Before this, these
error conditions usually were caught by internal asserts, however
asserts prevented users from implementing their own workarounds.
It's taken me a while to provide/accept a useful recovery mechanism
(returning LFS_ERR_CORRUPT instead of asserting) because my original thinking
was that these error conditions only occur due to bugs in the filesystem, and
these bugs should be fixed properly.
While I still think this is mostly true, the point has been made clear
that being able to recover from these conditions is definitely worth the
code cost. Hopefully this new behaviour helps the longevity of devices
even if the storage code fails.
Another, less important, reason I didn't want to accept fixes for these
situations was the lack of tests that prove the code's value. This has
been fixed with the new testing framework thanks to the additional of
"internal tests" which can call C static functions and really take
advantage of the internal information of the filesystem.
With the superblock expansion stuff, the test_format tests have grown
to test more advanced superblock-related features. This is fine but
deserves a rename so it's more clear.
Also fixed a typo that meant tests never ran with block cycles.