Christopher Haster abe68c0844 rbyd-rr: Reworking rbyd range removal to try to preserve rby structure
This is the start of (yet another) rework of rybd range removals, this
time in an effort to preserve the rby structure that maps to a balanced
2-3-4 tree. Specifically, the property that all search paths have the
same number of black edges (2-3-4 nodes).

This is currently incomplete, as you can probably tell from the mess,
but this commit at least gets a working altn/alta encoding in place
necessary for representing empty 2-3-4 nodes. More on that below.

---

First the problem:

My assumption, when implementing the previous range removal algorithms,
was that we only needed to maintain the existing height of the tree.

The existing rbyd operations limit the height to strictly log n. And
while we can't _reduce_ the height to maintain perfect balance, we can
at least avoid _increasing_ the height, which means the resulting tree
should have a height <= log n. Since our rbyds are bounded by the
block_size b, this means worst case our rbyd can never exceed a height
<= log b, right?

Well, not quite.

This is true the instance after the remove operation. But there is an
implicit assumption that future rbyd operations will still be able to
maintain height <= log n after the remove operation. This turns out to
not be true.

The problem is that our rbyd appends only maintain height <= log n if
our rby structure is preserved. If the rby structure is broken, rbyd
append assumes an rby structure that doesn't exist, which can lead to an
increasingly unbalanced tree.

Consider this happily balanced tree:

         .-------o-------.                    .--------o
     .---o---.       .---o---.            .---o---.    |
   .-o-.   .-o-.   .-o-.   .-o-.        .-o-.   .-o-.  |
  .o. .o. .o. .o. .o. .o. .o. .o.      .o. .o. .o. .o. |
  a b c d e f g h i j k l m n o p  =>  a b c d e f g h i
                   '------+------'
                        remove

After a range removal it looks pretty bad, but note the height is still
<= log n (old n not the new n). We are still <= log b.

But note what happens if we start to insert attrs into the short half of
the tree:

         .--------o
     .---o---.    |
   .-o-.   .-o-.  |
  .o. .o. .o. .o. |
  a b c d e f g h i

                  .-----o
         .--------o .-+-r
     .---o---.    | | | |
   .-o-.   .-o-.  | | | |
  .o. .o. .o. .o. | | | |
  a b c d e f g h i j'k'l'

                      .-------------o
                  .---o   .---+-----r
         .--------o .-o .-o .-o .-+-r
     .---o---.    | | | | | | | | | |
   .-o-.   .-o-.  | | | | | | | | | |
  .o. .o. .o. .o. | | | | | | | | | |
  a b c d e f g h i j'k'l'm'n'o'p'q'r'

Our right side is generating a perfectly balanced tree as expected, but
the left side is suddenly twice as far from the root! height(r')=3,
height(a)=6!

The problem is when we append l', we don't really know how tall the tree
is. We only know l' has one black edge, which assuming rby structure is
preserved, means all other attrs must have one black edge, so creating a
new root is justified.

In reality this just makes the tree grow increasingly unbalanced,
increasing the height of the tree by worst case log n every range
removal.

---

It's interesting to note this was discovered while debugging
test_fwrite_overwrite, specifically:

  test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10

It turns out the append fragments -> delete fragments -> append/carve
block + becksum loop contains the perfect sequence of attrs necessary to
turn this tree inbalance into a linked-list!

                        .->         0 data w1 1
                      .-b->         1 data w1 1
                      | .->         2 data w1 1
                    .-b-b->         3 data w1 1
                    |   .->         4 data w1 1
                    | .-b->         5 data w1 1
                    | | .->         6 data w1 1
                .---b-b-b->         7 data w1 1
                |       .->         8 data w1 1
                |     .-b->         9 data w1 1
                |     | .->        10 data w1 1
                |   .-b-b->        11 data w1 1
                | .-b----->        12 data w1 1
              .-y-y------->        13 data w1 1
              |         .->        14 data w1 1
            .-y---------y->        15 data w1 1
            |           .->        16 data w1 1
          .-y-----------y->        17 data w1 1
          |             .->        18 data w1 1
        .-y-------------y->        19 data w1 1
        |               .->        20 data w1 1
      .-y---------------y->        21 data w1 1
      |                 .->        22 data w1 1
    .-y-----------------y->        23 data w1 1
    |                   .->        24 data w1 1
  .-y-------------------y->        25 data w1 1
  |                   .--->        26 data w1 1
  |                   | .->   27-2047 block w2021 10
  b-------------------r-b->           becksum 5

Note, to reproduce this you need to step through with a breakpoint on
lfsr_bshrub_commit. This only shows up in the file's intermediary btree,
which at the time of writing ends up at block 0xb8:

  $ ./scripts/test.py \
        test_fwrite_overwrite:1181h1g2i1gg2l15o10p11r1gg8s10 \
        -ddisk --gdb -f

  $ ./scripts/watch.py -Kdisk -b \
        ./scripts/dbgrbyd.py -b4096 disk 0xb8 -t

  (then b lfsr_bshrub_commit and continue a bunch)

---

So, we need to preserve the rby structure.

Note pruning red/yellow alts is not an issue. These aren't black, so we
aren't changing the number of black edges in the tree. We've just
effectively reduced a 3/4 node into a 2/3 node:

      .-> a
  .---b-> b              .-> a <- 2 black
  | .---> c            .-b-> b
  | | .-> d            | .-> c
  b-r-b-> e <- rm  =>  b-b-> d <- 2 black

The tricky bit is pruning black alts. Naively this changes the number of
black edges/2-3-4 nodes in the tree, which is bad:

    .-> a
  .-b-> b              .-> a <- 2 black
  | .-> c            .-b-> b
  b-b-> d <- rm  =>  b---> c <- 1 black

It's tempting to just make the alt red at this point, effectively
merging the sibling 2-3-4 node. This maintains balance in the subtree,
but still removes a black edge, causing problems for our parent:

      .-> a
    .-b-> b                .-> a <- 3 black
    | .-> c              .-b-> b
  .-b-b-> d              | .-> c
  |   .-> e            .-b-b-> d
  | .-b-> f            | .---> e
  | | .-> g            | | .-> f
  b-b-b-> h <- rm  =>  b-r-b-> g <- 2 black

In theory you could propagate this all the way up to the root, and this
_would_ probably give you a perfect self-balancing range removal
algorithm... but it's recursive... and littlefs can't be recursive...

               .-> s
             .-b-> t                              .-> s
             | .-> u                        .-----b-> t
           .-b-b-> v                        |     .-> u
           |   .-> w                        | .---b-> v
           | .-b-> x                        | | .---> w
  | |      | | .-> y           | | | |      | | | .-> x
  b-b- ... b-b-b-> z <- rm =>  r-b-r-b- ... r-b-r-b-> y

So instead, an alternative solution. What if we allowed black alts that
point nowhere? A sort of noop 2-3-4 node that serves only to maintain
the rby structure?

    .-> a
  .-b-> b              .-> a <- 2 black
  | .-> c            .-b-> b
  b-b-> d <- rm  =>  b-b-> c <- 2 black

I guess that would technically make this 1-2-3-4 tree.

This does add extra overhead for writing noop alts, which are otherwise
useless, but it seems to solve most of our problems: 1. does not
increase the height of the tree, 2. maintains the rby structure, 3.
tail-recursive.

And, thanks to the preserved rby structure, we can say that in the worst
case our rbyds will never exceed height <= log b again, even with range
removals.

If we apply this strategy to our original example, you can see how the
preserved rby structure sort of "absorbs" new red alts, preventing
further unbalancing:

         .-------o-------.                    .--------o
     .---o---.       .---o---.            .---o---.    o
   .-o-.   .-o-.   .-o-.   .-o-.        .-o-.   .-o-.  o
  .o. .o. .o. .o. .o. .o. .o. .o.      .o. .o. .o. .o. o
  a b c d e f g h i j k l m n o p  =>  a b c d e f g h i
                   '------+------'
                        remove

Reinserting:

         .--------o
     .---o---.    o
   .-o-.   .-o-.  o
  .o. .o. .o. .o. o
  a b c d e f g h i

         .----------------o
     .---o---.            o
   .-o-.   .-o-.   .------o
  .o. .o. .o. .o. .o. .-+-r
  a b c d e f g h i j'k'l'm'

         .----------------------------o
     .---o---.          .-------------o
   .-o-.   .-o-.    .---o   .---+-----r
  .o. .o. .o. .o. .-o .-o .-o .-o .-+-r
  a b c d e f g h i j'k'l'm'n'o'p'q'r's'

Much better!

---

This commit makes some big steps towards this solution, mainly codifying
a now-special alt-never/alt-always (altn/alta) encoding to represent
these noop 1 nodes.

Technically, since null (0) tags are not allowed, these already exist as
altle 0/altgt 0 and don't need any extra carve-out encoding-wise:

  LFSR_TAG_ALT   0x4kkk  v1dc kkkk -kkk kkkk
  LFSR_TAG_ALTN  0x4000  v10c 0000 -000 0000
  LFSR_TAG_ALTA  0x6000  v11c 0000 -000 0000

We actually already used altas to terminate unreachable tags during
range removals, but this behavior was implicit. Now, altns have very
special treatment as a part of determining bounds during appendattr
(both unreachable gt/le alts are represented as altns). For this reason
I think the new names are warranted.

I've also added these encodings to the dbg*.py scripts for, well,
debuggability, and added a special case to dbgrby.py -j to avoid
unnecessary altn jump noise.

As a part of debugging, I've also extended dbgrbyd.py's tree renderer to
show trivial prunable alts. Unsure about keeping this. On one hand it's
useful to visualize the exact alt structure, on the other hand it likely
adds quite a bit of noise to the more complex dbg scripts.

The current state of things is a mess, but at least tests are passing!

Though we aren't actually reclaiming any altns yet... We're definitely
_not_ preserving the rby structure at the moment, and if you look at the
output from the tests, the resulting tree structure is hilarious bad.

But at least the path forward is clear.
2024-04-01 16:23:14 -05:00
2019-09-01 21:11:49 -07:00
2024-03-20 01:37:29 -05:00
2022-03-20 23:03:52 -05:00
2022-11-09 11:12:20 -06:00
2022-02-18 21:13:41 -06:00

littlefs

A little fail-safe filesystem designed for microcontrollers.

   | | |     .---._____
  .-----.   |          |
--|o    |---| littlefs |
--|     |---|          |
  '-----'   '----------'
   | | |

Power-loss resilience - littlefs is designed to handle random power failures. All file operations have strong copy-on-write guarantees and if power is lost the filesystem will fall back to the last known good state.

Dynamic wear leveling - littlefs is designed with flash in mind, and provides wear leveling over dynamic blocks. Additionally, littlefs can detect bad blocks and work around them.

Bounded RAM/ROM - littlefs is designed to work with a small amount of memory. RAM usage is strictly bounded, which means RAM consumption does not change as the filesystem grows. The filesystem contains no unbounded recursion and dynamic memory is limited to configurable buffers that can be provided statically.

Example

Here's a simple example that updates a file named boot_count every time main runs. The program can be interrupted at any time without losing track of how many times it has been booted and without corrupting the filesystem:

#include "lfs.h"

// variables used by the filesystem
lfs_t lfs;
lfs_file_t file;

// configuration of the filesystem is provided by this struct
const struct lfs_config cfg = {
    // block device operations
    .read  = user_provided_block_device_read,
    .prog  = user_provided_block_device_prog,
    .erase = user_provided_block_device_erase,
    .sync  = user_provided_block_device_sync,

    // block device configuration
    .read_size = 16,
    .prog_size = 16,
    .block_size = 4096,
    .block_count = 128,
    .cache_size = 16,
    .lookahead_size = 16,
    .block_cycles = 500,
};

// entry point
int main(void) {
    // mount the filesystem
    int err = lfs_mount(&lfs, &cfg);

    // reformat if we can't mount the filesystem
    // this should only happen on the first boot
    if (err) {
        lfs_format(&lfs, &cfg);
        lfs_mount(&lfs, &cfg);
    }

    // read current count
    uint32_t boot_count = 0;
    lfs_file_open(&lfs, &file, "boot_count", LFS_O_RDWR | LFS_O_CREAT);
    lfs_file_read(&lfs, &file, &boot_count, sizeof(boot_count));

    // update boot count
    boot_count += 1;
    lfs_file_rewind(&lfs, &file);
    lfs_file_write(&lfs, &file, &boot_count, sizeof(boot_count));

    // remember the storage is not updated until the file is closed successfully
    lfs_file_close(&lfs, &file);

    // release any resources we were using
    lfs_unmount(&lfs);

    // print the boot count
    printf("boot_count: %d\n", boot_count);
}

Usage

Detailed documentation (or at least as much detail as is currently available) can be found in the comments in lfs.h.

littlefs takes in a configuration structure that defines how the filesystem operates. The configuration struct provides the filesystem with the block device operations and dimensions, tweakable parameters that tradeoff memory usage for performance, and optional static buffers if the user wants to avoid dynamic memory.

The state of the littlefs is stored in the lfs_t type which is left up to the user to allocate, allowing multiple filesystems to be in use simultaneously. With the lfs_t and configuration struct, a user can format a block device or mount the filesystem.

Once mounted, the littlefs provides a full set of POSIX-like file and directory functions, with the deviation that the allocation of filesystem structures must be provided by the user.

All POSIX operations, such as remove and rename, are atomic, even in event of power-loss. Additionally, file updates are not actually committed to the filesystem until sync or close is called on the file.

Other notes

Littlefs is written in C, and specifically should compile with any compiler that conforms to the C99 standard.

All littlefs calls have the potential to return a negative error code. The errors can be either one of those found in the enum lfs_error in lfs.h, or an error returned by the user's block device operations.

In the configuration struct, the prog and erase function provided by the user may return a LFS_ERR_CORRUPT error if the implementation already can detect corrupt blocks. However, the wear leveling does not depend on the return code of these functions, instead all data is read back and checked for integrity.

If your storage caches writes, make sure that the provided sync function flushes all the data to memory and ensures that the next read fetches the data from memory, otherwise data integrity can not be guaranteed. If the write function does not perform caching, and therefore each read or write call hits the memory, the sync function can simply return 0.

Design

At a high level, littlefs is a block based filesystem that uses small logs to store metadata and larger copy-on-write (COW) structures to store file data.

In littlefs, these ingredients form a sort of two-layered cake, with the small logs (called metadata pairs) providing fast updates to metadata anywhere on storage, while the COW structures store file data compactly and without any wear amplification cost.

Both of these data structures are built out of blocks, which are fed by a common block allocator. By limiting the number of erases allowed on a block per allocation, the allocator provides dynamic wear leveling over the entire filesystem.

                    root
                   .--------.--------.
                   | A'| B'|         |
                   |   |   |->       |
                   |   |   |         |
                   '--------'--------'
                .----'   '--------------.
       A       v                 B       v
      .--------.--------.       .--------.--------.
      | C'| D'|         |       | E'|new|         |
      |   |   |->       |       |   | E'|->       |
      |   |   |         |       |   |   |         |
      '--------'--------'       '--------'--------'
      .-'   '--.                  |   '------------------.
     v          v              .-'                        v
.--------.  .--------.        v                       .--------.
|   C    |  |   D    |   .--------.       write       | new E  |
|        |  |        |   |   E    |        ==>        |        |
|        |  |        |   |        |                   |        |
'--------'  '--------'   |        |                   '--------'
                         '--------'                   .-'    |
                         .-'    '-.    .-------------|------'
                        v          v  v              v
                   .--------.  .--------.       .--------.
                   |   F    |  |   G    |       | new F  |
                   |        |  |        |       |        |
                   |        |  |        |       |        |
                   '--------'  '--------'       '--------'

More details on how littlefs works can be found in DESIGN.md and SPEC.md.

  • DESIGN.md - A fully detailed dive into how littlefs works. I would suggest reading it as the tradeoffs at work are quite interesting.

  • SPEC.md - The on-disk specification of littlefs with all the nitty-gritty details. May be useful for tooling development.

Testing

The littlefs comes with a test suite designed to run on a PC using the emulated block device found in the bd directory. The tests assume a Linux environment and can be started with make:

make test

License

The littlefs is provided under the BSD-3-Clause license. See LICENSE.md for more information. Contributions to this project are accepted under the same license.

Individual files contain the following tag instead of the full license text.

SPDX-License-Identifier:    BSD-3-Clause

This enables machine processing of license information based on the SPDX License Identifiers that are here available: http://spdx.org/licenses/

  • littlefs-fuse - A FUSE wrapper for littlefs. The project allows you to mount littlefs directly on a Linux machine. Can be useful for debugging littlefs if you have an SD card handy.

  • littlefs-js - A javascript wrapper for littlefs. I'm not sure why you would want this, but it is handy for demos. You can see it in action here.

  • littlefs-python - A Python wrapper for littlefs. The project allows you to create images of the filesystem on your PC. Check if littlefs will fit your needs, create images for a later download to the target memory or inspect the content of a binary image of the target memory.

  • mklfs - A command line tool built by the Lua RTOS guys for making littlefs images from a host PC. Supports Windows, Mac OS, and Linux.

  • Mbed OS - The easiest way to get started with littlefs is to jump into Mbed which already has block device drivers for most forms of embedded storage. littlefs is available in Mbed OS as the LittleFileSystem class.

  • SPIFFS - Another excellent embedded filesystem for NOR flash. As a more traditional logging filesystem with full static wear-leveling, SPIFFS will likely outperform littlefs on small memories such as the internal flash on microcontrollers.

  • Dhara - An interesting NAND flash translation layer designed for small MCUs. It offers static wear-leveling and power-resilience with only a fixed O(|address|) pointer structure stored on each block and in RAM.

Description
A little fail-safe filesystem designed for microcontrollers
Readme 12 MiB
Languages
C 68.3%
Python 30.8%
Makefile 0.9%