forked from Imagelibrary/littlefs
Adopted lfs_ctz_index implementation using popcount
This reduces the O(n^2logn) runtime to read a file to only O(nlog). The extra O(n) did not touch the disk, so it isn't a problem until the files become very large, but this solution comes with very little cost. Long story short, you can find the block index + offset pair for a CTZ linked-list with this series of formulas: n' = floor(N / (B - 2w/8)) N' = (B - 2w/8)n' + (w/8)popcount(n') off' = N - N' n, off = n'-1, off'+B if off' < 0 n', off'+(w/8)(ctz(n')+1) if off' >= 0 For the long story, you will need to see the updated DESIGN.md
This commit is contained in:
109
DESIGN.md
109
DESIGN.md
@@ -292,7 +292,7 @@ We can find the runtime complexity by looking at the path to any block from
|
||||
the block containing the most pointers. Every step along the path divides
|
||||
the search space for the block in half. This gives us a runtime of O(logn).
|
||||
To get to the block with the most pointers, we can perform the same steps
|
||||
backwards, which keeps the asymptotic runtime at O(log n). The interesting
|
||||
backwards, which puts the runtime at O(2logn) = O(logn). The interesting
|
||||
part about this data structure is that this optimal path occurs naturally
|
||||
if we greedily choose the pointer that covers the most distance without passing
|
||||
our target block.
|
||||
@@ -304,17 +304,18 @@ in a block, this is pretty reasonable.
|
||||
|
||||
Unfortunately, the CTZ skip-list comes with a few questions that aren't
|
||||
straightforward to answer. What is the overhead? How do we handle more
|
||||
pointers than we can store in a block?
|
||||
pointers than we can store in a block? How do we store the skip-list in
|
||||
a directory entry?
|
||||
|
||||
One way to find the overhead per block is to look at the data structure as
|
||||
multiple layers of linked-lists. Each linked-list skips twice as many blocks
|
||||
as the previous linked-list. Or another way of looking at it is that each
|
||||
as the previous linked-list. Another way of looking at it is that each
|
||||
linked-list uses half as much storage per block as the previous linked-list.
|
||||
As we approach infinity, the number of pointers per block forms a geometric
|
||||
series. Solving this geometric series gives us an average of only 2 pointers
|
||||
per block.
|
||||
|
||||

|
||||

|
||||
|
||||
Finding the maximum number of pointers in a block is a bit more complicated,
|
||||
but since our file size is limited by the integer width we use to store the
|
||||
@@ -322,7 +323,7 @@ size, we can solve for it. Setting the overhead of the maximum pointers equal
|
||||
to the block size we get the following equation. Note that a smaller block size
|
||||
results in more pointers, and a larger word width results in larger pointers.
|
||||
|
||||

|
||||

|
||||
|
||||
where:
|
||||
B = block size in bytes
|
||||
@@ -335,8 +336,102 @@ widths:
|
||||
|
||||
Since littlefs uses a 32 bit word size, we are limited to a minimum block
|
||||
size of 104 bytes. This is a perfectly reasonable minimum block size, with most
|
||||
block sizes starting around 512 bytes. So we can avoid the additional logic
|
||||
needed to avoid overflowing our block's capacity in the CTZ skip-list.
|
||||
block sizes starting around 512 bytes. So we can avoid additional logic to
|
||||
avoid overflowing our block's capacity in the CTZ skip-list.
|
||||
|
||||
So, how do we store the skip-list in a directory entry? A naive approach would
|
||||
be to store a pointer to the head of the skip-list, the length of the file
|
||||
in bytes, the index of the head block in the skip-list, and the offset in the
|
||||
head block in bytes. However this is a lot of information, and we can observe
|
||||
that a file size maps to only one block index + offset pair. So it should be
|
||||
sufficient to store only the pointer and file size.
|
||||
|
||||
But there is one problem, calculating the block index + offset pair from a
|
||||
file size doesn't have an obvious implementation.
|
||||
|
||||
We can start by just writing down an equation. The first idea that comes to
|
||||
mind is to just use a for loop to sum together blocks until we reach our
|
||||
file size. We can write equation equation as a summation:
|
||||
|
||||

|
||||
|
||||
where:
|
||||
B = block size in bytes
|
||||
w = word width in bits
|
||||
n = block index in skip-list
|
||||
N = file size in bytes
|
||||
|
||||
And this works quite well, but is not trivial to calculate. This equation
|
||||
requires O(n) to compute, which brings the entire runtime of reading a file
|
||||
to O(n^2logn). Fortunately, the additional O(n) does not need to touch disk,
|
||||
so it is not completely unreasonable. But if we could solve this equation into
|
||||
a form that is easily computable, we can avoid a big slowdown.
|
||||
|
||||
Unfortunately, the summation of the CTZ instruction presents a big challenge.
|
||||
How would you even begin to reason about integrating a bitwise instruction?
|
||||
Fortunately, there is a powerful tool I've found useful in these situations:
|
||||
The [On-Line Encyclopedia of Integer Sequences (OEIS)](https://oeis.org/).
|
||||
If we work out the first couple of values in our summation, we find that CTZ
|
||||
maps to [A001511](https://oeis.org/A001511), and its partial summation maps
|
||||
to [A005187](https://oeis.org/A005187), and surprisingly, both of these
|
||||
sequences have relatively trivial equations! This leads us to the completely
|
||||
unintuitive property:
|
||||
|
||||

|
||||
|
||||
where:
|
||||
ctz(i) = the number of trailing bits that are 0 in i
|
||||
popcount(i) = the number of bits that are 1 in i
|
||||
|
||||
I find it bewildering that these two seemingly unrelated bitwise instructions
|
||||
are related by this property. But if we start to disect this equation we can
|
||||
see that it does hold. As n approaches infinity, we do end up with an average
|
||||
overhead of 2 pointers as we find earlier. And popcount seems to handle the
|
||||
error from this average as it accumulates in the CTZ skip-list.
|
||||
|
||||
Now we can substitute into the original equation to get a trivial equation
|
||||
for a file size:
|
||||
|
||||

|
||||
|
||||
Unfortunately, we're not quite done. The popcount function is non-injective,
|
||||
so we can only find the file size from the block index, not the other way
|
||||
around. However, we can guess and correct. Consider an n' block index that
|
||||
is greater than n, we can find one pretty easily:
|
||||
|
||||

|
||||
|
||||
where:
|
||||
n' >= n
|
||||
|
||||
We can plug n' back into our popcount equation to find an N' file size that
|
||||
is greater than N. However, we need to rearrange our terms a bit to avoid
|
||||
integer overflow:
|
||||
|
||||

|
||||
|
||||
where:
|
||||
N' >= N
|
||||
|
||||
Now that we have N', we can find our block offset:
|
||||
|
||||

|
||||
|
||||
where:
|
||||
off' >= off, our byte offset in the block
|
||||
|
||||
Now we're getting somewhere. N' is greater than or equal to N, and as long as
|
||||
the number of pointers per block is bounded by the block size, it can only be
|
||||
different by at most one block. So we have two cases that can be determined by
|
||||
the sign of off'. If off' is negative, we correct n' and add a block to off'.
|
||||
Note that we also need to incorporate the overhead of the last block to get
|
||||
the right offset.
|
||||
|
||||

|
||||
|
||||
It's a lot of math, but computers are very good at math. With these equations
|
||||
we can solve for the block index + offset while only needed to store the file
|
||||
size in O(1).
|
||||
|
||||
Here is what it might look like to update a file stored with a CTZ skip-list:
|
||||
```
|
||||
|
||||
19
lfs.c
19
lfs.c
@@ -1004,16 +1004,23 @@ int lfs_dir_rewind(lfs_t *lfs, lfs_dir_t *dir) {
|
||||
|
||||
/// File index list operations ///
|
||||
static int lfs_ctz_index(lfs_t *lfs, lfs_off_t *off) {
|
||||
lfs_off_t i = 0;
|
||||
|
||||
while (*off >= lfs->cfg->block_size) {
|
||||
i += 1;
|
||||
*off -= lfs->cfg->block_size;
|
||||
*off += 4*(lfs_ctz(i) + 1);
|
||||
lfs_off_t size = *off;
|
||||
lfs_off_t i = size / (lfs->cfg->block_size-2*4);
|
||||
if (i == 0) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
lfs_off_t nsize = (lfs->cfg->block_size-2*4)*i + 4*lfs_popc(i-1) + 2*4;
|
||||
lfs_soff_t noff = size - nsize;
|
||||
|
||||
if (noff < 0) {
|
||||
*off = noff + lfs->cfg->block_size;
|
||||
return i-1;
|
||||
} else {
|
||||
*off = noff + 4*(lfs_ctz(i) + 1);
|
||||
return i;
|
||||
}
|
||||
}
|
||||
|
||||
static int lfs_ctz_find(lfs_t *lfs,
|
||||
lfs_cache_t *rcache, const lfs_cache_t *pcache,
|
||||
|
||||
@@ -41,6 +41,10 @@ static inline uint32_t lfs_npw2(uint32_t a) {
|
||||
return 32 - __builtin_clz(a-1);
|
||||
}
|
||||
|
||||
static inline uint32_t lfs_popc(uint32_t a) {
|
||||
return __builtin_popcount(a);
|
||||
}
|
||||
|
||||
static inline int lfs_scmp(uint32_t a, uint32_t b) {
|
||||
return (int)(unsigned)(a - b);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user