Because multiple, out-of-date superblocks can exist in our superblock
chain, we need to be careful to make sure newer superblock entries
override older superblock entries.
If we see an older on-disk minor version in the superblock chain, we
were correctly overriding the on-disk minor version, but we were also
leaving the "needs superblock" bit set in our consistency state.
This isn't a hard-error, but would lead to a superblock rewrite every
mount. The rewrite would make no progress, as the out-of-date version is
effectively immutable at this point, and just waste prog cycles.
This should fix that by clearing the "needs superblock" bit if we see a
newer on-disk minor version.
The documentation does not match the implementation here. The intended
behavior of superblock expansion was to duplicate the current superblock
entry into the new superblock:
.--------. .--------.
.|littlefs|->|littlefs|
||bs=4096 | ||bs=4096 |
||bc=256 | ||bc=256 |
||crc32 | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
The main benefit is that we can rely on the magic string "littlefs"
always residing in blocks 0x{0,1}, even if the superblock chain has
multiple superblocks.
The downside is that earlier superblocks in the superblock chain may
contain out-of-date configuration. This is a bit annoying, and risks
hard-to-reach bugs, but in theory shouldn't break anything as long as
the filesystem is aware of this.
Unfortunately this was lost at some point during refactoring in the
early v2-alpha work. A lot of code was moving around in this stage, so
it's a bit hard to track down the change and if it was intentional. The
result is superblock expansion creates a valid linked-list of
superblocks, but only the last superblock contains a valid superblock
entry:
.--------. .--------.
.|crc32 |->|littlefs|
|| | ||bs=4096 |
|| | ||bc=256 |
|| | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
What's interesting is this isn't invalid as far as lfs_mount is
concerned. lfs_mount is happy as long as a superblock entry exists
anywhere in the superblock chain. This is good for compat flexibility,
but is the main reason this has gone unnoticed for so long.
---
With the benefit of more time to think about the problem, it may have
been more preferable to copy only the "littlefs" magic string and NOT
the superblock entry:
.--------. .--------.
.|littlefs|->|littlefs|
||crc32c | ||bs=4096 |
|| | ||bc=256 |
|| | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
This would allow for simple "littlefs" magic string checks without the
risks associated with out-of-date superblock entries.
Unfortunately the current implementation errors if it finds a "littlefs"
magic string without an associated superblock entry, so such a change
would not be compatible with old drivers.
---
This commit tweaks superblock expansion to duplicate the superblock
entry instead of simply moving it to the new superblock. And adds tests
over the magic string "littlefs" both before and after superblock
expansion.
Found by rojer and Nikola Kosturski
Long story short we aren't calling sync correctly in littlefs. This
fixes that.
Some forms of storage, mainly anything with an FTL, eMMC, SD, etc, do
not guarantee a strict write order for writes to different blocks. In
theory this is what bd sync is for, to tell the bd when it is important
for the writes to be ordered.
Currently, littlefs calls bd sync after committing metadata. This is
useful as it ensures that user code can rely on lfs_file_sync for
ordering external side-effects.
But this is insufficient for handling storage with out-of-order writes.
Consider the simple case of a file with one data block:
1. lfs_file_write(blablabla) => writes data into a new data block
2. lfs_file_sync() => commits metadata to point to the new data block
But with out-of-order writes, the bd is free to reorder things such that
the metadata is updated _before_ the data is written. If we lose power,
that would be bad.
The solution to this is to call bd sync twice: Once before we commit
the metadata to tell the bd that these writes must be ordered, and once
after we commit the metadata to allow ordering with user code.
As a small optimization, we only call bd sync if the current file is not
inlined and has actually been modified (LFS_F_DIRTY). It's possible for
inlined files to be interleaved with writes to other files.
Found by MFaehling and alex31
By "luck" the previous code somehow managed to not be broken, though it
was possible to traverse the same file twice in lfs_fs_traverse/size
(which is not an error).
The problem was an underlying assumption in lfs_dir_get that it would
never be called when the requested id is pending removal because of a
powerloss. The assumption was either:
1. lfs_dir_find would need to be called first to find the id, and it
would correctly toss out pending-rms with LFS_ERR_NOENT.
2. lfs_fs_mkconsistent would be implicitly called before any filesystem
traversals, cleaning up any pending-rms. This is at least true for
allocator scans.
But, as noted by andriyndev, both lfs_fs_traverse and lfs_fs_size can
call lfs_fs_get with a pending-rm id if called in a readonly context.
---
By "luck" this somehow manages to not break anything:
1. If the pending-rm id is >0, the id is decremented by 1 in lfs_fs_get,
returning the previous file entry during traversal. Worst case, this
reports any blocks owned by the previous file entry twice.
Note this is not an error, lfs_fs_traverse/size may return the same
block multiple times due to underlying copy-on-write structures.
2. More concerning, if the pending-rm id is 0, the id is decremented by
1 in lfs_fs_get and underflows. This underflow propagates into the
type field of the tag we are searching for, decrementing it from
0x200 (LFS_TYPE_STRUCT) to 0x1ff (LFS_TYPE_INTERNAL(UNUSED)).
Fortunately, since this happens to underflow to the INTERNAL tag
type, the type intended to never exist on disk, we should never find
a matching tag during our lfs_fs_get search. The result? lfs_dir_get
returns LFS_ERR_NOENT, which is actually what we want.
Also note that LFS_ERR_NOENT does not terminate the mdir traversal
early. If it did we would have missed files instead of duplicating
files, which is a slightly worse situation.
---
The fix is to add an explicit check for pending-rms in lfs_dir_get, just
like in lfs_dir_find. This avoids relying on unintended underflow
propagation, and should make the internal API behavior more consistent.
This is especially important for potential future gc extensions.
Found by andriyndev
So instead of lfs_file_rawopencfg, it's now lfs_file_opencfg_.
The "raw" prefix is annoying, doesn't really add meaning ("internal"
would have been better), and gets in the way of finding the relevant
function implementations.
I have been using _s as suffixes for unimportant name collisions in
other codebases, and it seems to work well at reducing wasted brain
cycles naming things. Adopting it here avoids the need for "raw"
prefixes.
It's quite a bit like the use of prime symbols to resolve name
collisions in math, e.g. x' = x + 1. Which is even supported in Haskell
and is quite nice there.
And the main benefit: Now if you search for the public API name, you get
the internal function first, which is probably what you care about.
Here is the exact script:
sed -i 's/_raw\([a-z0-9_]*\)\>/_\1_/g' $(git ls-tree -r HEAD --name-only | grep '.*\.c')
Inlined files live in metadata and decrease storage requirements, but
may be limited to improve metadata-related performance. This is
especially important given the current plague of metadata performance.
Though decreasing inline_max may make metadata more dense and increase
block usage, so it's important to benchmark if optimizing for speed.
The underlying limits of inlined files haven't changed:
1. Inlined files need to fit in RAM, so <= cache_size
2. Inlined files need to fit in a single attr, so <= attr_max
3. Inlined files need to fit in 1/8 of a block to avoid metadata
overflow issues, this is after limiting by metadata_max,
so <= min(metadata_max, block_size)/8
By default, the largest possible inline_max is used. This preserves
backwards compatibility and is probably a good default for most use
cases.
This does have the awkward effect of requiring inline_max=-1 to
indicate disabled inlined files, but I don't think there's a good
way around this.
This extends lfs_fs_gc to now handle three things:
1. Calls mkconsistent if not already consistent
2. Compacts metadata > compact_thresh
3. Populates the block allocator
Which should be all of the janitorial work that can be done without
additional on-disk data structures.
Normally, metadata compaction occurs when an mdir is full, and results in
mdirs that are at most block_size/2.
Now, if you call lfs_fs_gc, littlefs will eagerly compact any mdirs that
exceed the compact_thresh configuration option. Because the resulting
mdirs are at most block_size/2, it only makes sense for compact_thresh to
be >= block_size/2 and <= block_size.
Additionally, there are some special values:
- compact_thresh=0 => defaults to ~88% block_size, may change
- compact_thresh=-1 => disables metadata compaction during lfs_fs_gc
Note that compact_thresh only affects lfs_fs_gc. Normal compactions
still only occur when full.
When lfs_rename() is called trying to rename (move) a file to an
existing directory, LFS_ERR_ISDIR is (correctly) returned. However, in
the opposite case, if one tries to rename (move) a directory to a path
currently occupied by a regular file, LFS_ERR_NOTDIR should be
returned (since the error is that the destination is NOT a directory),
but in reality, LFS_ERR_ISDIR is returned in this case as well.
This commit fixes the code so that in the latter case, LFS_ERR_NOTDIR
is returned.
I think realistically no one is using this. It's already only partially
supported and untested.
Worst case, if someone does depend on this we can always revert.
lfs_init handles the checks/asserts of most configuration, moving these
checks near lfs_init attempts to keep all of these checks nearby each
other.
Also updated the comments to avoid somtimes-ambiguous range notation.
And removed negative bounds checks. Negative bounds should be obviously
incorrect, and 0 is _technically_ not illegal for any define (though
admittedly unlikely to be correct).
This drops the lookahead buffer from operating on 32-bit words to
operating on 8-bit bytes, and removes any alignment requirement. This
may have some minor performance impact, but it is unlikely to be
significant when you consider IO overhead.
The original motivation for 32-bit alignment was an attempt at
future-proofing in case we wanted some more complex on-disk data
structure. This never happened, and even if it did, it could have been
added via additional config options.
This has been a significant pain point for users, since providing
word-aligned byte-sized buffers in C can be a bit annoying.
- Renamed lfs.free -> lfs.lookahead
- Renamed lfs.free.off -> lfs.lookahead.start
- Renamed lfs.free.i -> lfs.lookahead.next
- Renamed lfs.free.ack -> lfs.lookahead.ckpoint
- Renamed lfs_alloc_ack -> lfs_alloc_ckpoint
These have been named a bit confusingly, and I think the new names make
their relevant purposes a bit clearer.
At the very it's clear lfs.lookahead is related to the lookahead buffer.
(and doesn't imply a closed free-bitmap).
Superblock expansion is an irreversible operation. In an effort to
prevent superblock expansion from claiming valuable scratch space
(important for small, <~8 block filesystems), littlefs prevents
superblock expansion when the disk is "mostly full".
In true computer-scientist fashion, this "mostly full" threshold was
set to ~50%.
As pointed out by gbolgradov and rojer, >~50% utilization is not
uncommon, and it can lead to a situation where superblock expansion does
not occur in a relatively healthy filesystem, causing focused wear at
the root.
To remedy this, the threshold is now increased to ~88% (7/8) full.
This may change in the future and should probably be eventually user
configurable.
Found by gbolgradov and rojer
The idea is in the future this function may be extended to support other
block janitorial work. In such a case calling this lfs_fs_gc provides a
more general name that can include other operations.
This is currently just wishful thinking, however.
The initial implementation for this was provided by kaetemi, originally
as a mount flag. However, it has been modified here to be self-contained
in an explicit runtime function that can be called after mount.
The reasons for an explicit function:
1. lfs_mount stays a strictly readonly operation, and avoids pulling in
all of the write machinery.
2. filesystem-wide operations such as lfs_fs_grow can be a bit risky,
and irreversable. The action of growing the filesystem should be very
intentional.
---
One concern with this change is that this will be the first function
that changes metadata in the superblock. This might break tools that
expect the first valid superblock entry to contain the most recent
metadata, since only the last superblock in the superblock chain will
contain the updated metadata.
The code-cost wasn't that bad: 16556 B -> 16754 B (+1.2%)
But moving write support of older versions behind a compile-time flag
allows us to be a bit more liberal with what gets added to support older
versions, since the cost won't hit most users.
The intention is to help interop with older minor versions of littlefs.
Unfortunately, since lfs2.0 drivers cannot mount lfs2.1 images, there are
situations where it would be useful to write to write strictly lfs2.0
compatible images. The solution here adds a "disk_version" configuration
option which determines the behavior of lfs2.1 dependent features.
Normally you would expect this to only change write behavior. But since the
main change in lfs2.1 increased validation of erased data, we also need to
skip this extra validation (fcrc) or see terrible slowdowns when writing.
In terms of ease-of-use, a user familiar with other filesystems expects
block_usage in fsinfo. But in terms of practicality, block_usage can be
expensive to find in littlefs, so if it's not needed in the resulting
fsinfo, that operation is wasteful.
It's not clear to me what the best course of action is, but since
block_usage can always be added to fsinfo later, but not removed without
breaking backwards compatibility, I'm leaving this out for now.
Block usage can still be found by explicitly calling lfs_fs_size.
Version are now returned with major/minor packed into 32-bits,
so 0x00020001 is the current disk version, for example.
1. This needed to change to use a disk_* prefix for consistency with the
defines that already exist for LFS_VERSION/LFS_DISK_VERSION.
2. Encoding the version this way has the nice side effect of making 0 an
invalid value. This is useful for adding a similar config option
that needs to have reasonable default behavior for backwards
compatibility.
In theory this uses more space, but in practice most other config/status
is 32-bits in littlefs. We would be wasting this space for alignment
anyways.