Implemented mtree path/dname lookup, rudimentary lfsr_mkdir/lfsr_dir_read

This makes it now possible to create directories in the new system. The new system now uses a single global "mtree" to store all metadata entries in the filesystem. In this system, a directory is simply a range of metadata entries. This has a number of benefits, but does come with its own problems: 1. We need to indicate which directory each file belongs to. To do this the file's name entry has been changed to a tuple of leb128-encoded directory-id + actual file name: 01 66 69 6c 65 2e 74 78 74 .file.txt ^ '----------+----------' '------------|------------ leb128 directory-id '------------ ascii/utf8 name If we include the directory-id as part of filename comparison, files should naturally be next to other files in the same directory. 2. We need a way allocate directory-ids for new directories. This turns out to be a bit more tricky than I expected. We can't use any mid/bid/rid inherent to the mtree, because these change on any file creation/deletion. And since we commit the did into the tree, that's not acceptable. Initially I though you could just find the largest did and increment, but this gives you no way to reclaim deleted dids. And sure, deleted dids have no storage consumption, but eventually you will overflow the did integer. Since this can suddenly happen in a filesystem that's been in a steady-state for years, that's pretty unnacceptable. One solution is to do a simple linear search over the mtree for an unused did. But with a runtime of O(n^2 log(n)), this raises performance concerns. Sidenote: It's interesting to note that the Linux kernel's allocation of process-ids, a very similar problem, is surprisingly complex and relies on a radix-tree of bitmaps (struct idr). This suggests I'm not missing an obvious solution somewhere. The solution I settled on here is to instead treat the set of dids as a sort of hash table: 1. Hash the full directory path into a did. 2. Perform a linear search until we have no collision. leb128(truncate28(crc32c("dir"))) .--------' v 9e cd c8 30 66 69 6c 65 2e 74 78 74 ...0file.txt '----+----' '----------+----------' '-----------------|------------ leb128 directory-id '------------ ascii/utf8 name Worst case, this can still exhibit the worst case O(n^2 log(n)) performance when we are close to full dids. However that seems unlikely to happen in practice, since we don't truncate our hashes, unlike normal hash tables. An additional 32-bit word for each file is a small price to pay for a low-chance of collisions. In the current implementation, I do truncate the hash to 28-bits. Since we encode the hash with leb128, and hashes are statistically random, this gives us better usage of the leb128 encoding. However it does limit a 32-bit littlefs to 256 Mi directories. Maybe this should be a configurable limit in the future. But that highlights another benefit of this scheme. It's easy to change in the future without disk changes. 3. We need a way to know if a directory-id is allocated, even if the directory is empty. For this we just introduce a new tag: LFSR_TAG_DSTART, which is an empty file entry that indicates the directory at the given did in the mtree is allocated. To create/delete these atomically with the reference in our parent directory, we can use the GRM system for atomic renames. Note this isn't implemented yet. This is also the first time we finally get around to testing all of the dname lookup functions, so this did find a few bugs, mostly around reporting the root correctly.
2023-07-05 13:34:50 -05:00
parent 0bb1e0b8b5
commit da810aca26
6 changed files with 751 additions and 166 deletions
--- a/scripts/dbgrbyd.py
+++ b/scripts/dbgrbyd.py
@@ -20,10 +20,11 @@ COLORS = [
 TAG_NULL        = 0x0000
 TAG_SUPERMAGIC  = 0x0003
 TAG_SUPERCONFIG = 0x0004
-TAG_NAME        = 0x0100
-TAG_BRANCH      = 0x0100
-TAG_REG         = 0x0101
-TAG_DIR         = 0x0102
+TAG_NAME        = 0x0200
+TAG_BRANCH      = 0x0200
+TAG_DSTART      = 0x0201
+TAG_REG         = 0x0202
+TAG_DIR         = 0x0203
 TAG_STRUCT      = 0x0300
 TAG_INLINED     = 0x0300
 TAG_BLOCK       = 0x0302
@@ -31,6 +32,7 @@ TAG_BTREE       = 0x0303
 TAG_MROOT       = 0x0304
 TAG_MDIR        = 0x0305
 TAG_MTREE       = 0x0306
+TAG_DID         = 0x0307
 TAG_UATTR       = 0x0400
 TAG_SATTR       = 0x0500
 TAG_ALT         = 0x4000
@@ -38,6 +40,7 @@ TAG_CRC         = 0x2000
 TAG_FCRC        = 0x2100


+
 # parse some rbyd addr encodings
 # 0xa     -> [0xa]
 # 0xa.b   -> ([0xa], b)
@@ -130,6 +133,7 @@ def tagrepr(tag, w, size, off=None):
    elif (tag & 0xff00) == TAG_NAME:
        return '%s%s %d' % (
            'branch' if tag == TAG_BRANCH
+                else 'dstart' if tag == TAG_DSTART
                else 'reg' if tag == TAG_REG
                else 'dir' if tag == TAG_DIR
                else 'name 0x%02x' % (tag & 0xff),
@@ -143,6 +147,7 @@ def tagrepr(tag, w, size, off=None):
                else 'mroot' if tag == TAG_MROOT
                else 'mdir' if tag == TAG_MDIR
                else 'mtree' if tag == TAG_MTREE
+                else 'did' if tag == TAG_DID
                else 'struct 0x%02x' % (tag & 0xff),
            ' w%d' % w if w else '',
            size)