littlefs

Author	SHA1	Message	Date
Christopher Haster	6d8eb948d1	Tweaked tracebd.py to prioritize progs over erases Yes, erases are the more costly operation that we should highlight. But, aside from broken code, you can never prog more than you erase. This makes it more useful to priortize progs over erases, so erases without an overlaying prog show up as a relatively unique blue, indicating regions of memory that have been erased but not progged. Too many erased-but-not-progged regions indicate a potentially wastefull algorithm.	2023-10-24 02:18:40 -05:00
Christopher Haster	d1e79bffc7	Renamed crystallize_size -> crystal_size The original name was a bit of a mouthful. Also dropped the default crystal_size in the test/bench runners block_size/4 -> block_size/8. I'm already noticing large amounts of inflation when blocks are fragmented, though I am experimenting with a rather small fragment_size right now. Future benchmarks/experimentation is required to figure out good values for these.	2023-10-23 12:27:44 -05:00
Christopher Haster	e25d11c33c	Extended new "fragmenting" write strategy to file btrees Note this is really just a proof of concept, and tests are not passing. There's also a number of hacks holding everything together and really need to be cleaned up. I was hoping it would be possible to deduplicate the carveshrub/carvetree functions the same way shrub/tree readnext functions were deduplicated. These both share a lot of subtle logic, and in theory operated on minor variations of the same underlying rbyd structure, but in practice several issues get in the way: - While the logic is the same, the way changes are played out is very different: btrees commit attributes to the btree immediately, whereas shrubs build up a bounded attr list to commit to the shrub via an mdir commit. In theory shrubs could be committed immediately, but it would be wasteful. And btrees can't commit a bounded attribute list because 1. rm attrs may need to be split into an unbounded number accross multiple rbyds, 2. fragmenting blocks may create an unbounded headache, and 3. attribute lists can't span multiple rbyds so we'd need to manually play them out anyways. - We need to allocate a new btree in carvetree, but in carveshrub we defer allocation to mdir commit time (because of the potential for failed commits). This complicates things. - The unions with sprouts/direct bptrs are often very similar, but need different handling when carving. This gets a bit tricky. - In theory you could switch between building attrs for shrubs and immediate commits for btrees, but since the immediate commits _change the tree_, the carving math changes subtlely. - carveshrub needs to do several auxilary things: track the shrub estimate, build attrs in RAM, etc. carvetree needs to do several auxilary things: dereference bptrs, fragment bptrs, allocate new btrees, etc. If these can be deduplicated it would likely result in code savings, but also risks increased RAM costs from trying to do too many things at once. The cost of two functions may also be more cognitive than real, since the subtletly here is just math. And computers happen to be pretty good at math. Though this concern may be unfounded, and deduplicated these functions is still enticing and an interesting idea to explore. I've already noticed some concerning performance once a write exceeds our crystallization threshold. This makes sense, as our current strategy is to completely rewrite any data region over our crystallization threshold. But I wonder if there's a way to exclude the first block in our region from the crystallization heuristic... Anyways, some good progress here, but more work to be done.	2023-10-21 22:51:48 -05:00
Christopher Haster	c815c19c20	New "fragmenting" write strategy The attempt to implement in-rbyd data slicing, being lazily coalesced during rbyd compaction, failed pretty much completely. Slicing is a very enticing write strategy, getting both minimal overhead post-compaction and fast random write speeds, but the idea has some fundamental conflicts with how we play out attrs post-compaction. This idea might work in a more powerful filesystem, but brings back the need to simulate rbyds in RAM, which is something I really don't want to do (complex, bug-prone, likely adds code cost, may not even be tractable). So, third time's the charm? --- This new write strategy writes only datas and bptrs, and avoids dagging by completely rewriting any regions of data larger than a configurable crystallization threshold. This loses most of the benefits of data crystallization, random writes will now usually need to rewrite a full block, but as a tradeoff our data at rest is always stored with optimal overhead. And at least data crystallization still saves space when our data isn't block aligned, or in sparse files. From reading up on some other filesystem designs it seems this is a desirable optimization sometimes referred to as "tail-packing" or "block suballocation" Some other changes from just having more time to think about the problem: 1. Instead of scanning to figure out our current crystal size, we can use a simple heuristic of 1. look up left block, 2. look up right block, 3. assume any data between these blocks contribute to our current crystal. This is just a heuristic, so worst case you write the first and last byte of a block which is enough to trigger compaction into a block. But on the plus side this avoids issues with small holes preventing blocks from being formed. This approach brings the number of btree lookups down from O(crystallize_size) to 2. 2. I've gone ahead and dropped the previous scheme of coalesce_size + fragment_size and instead adopted a single fragment_size that controls the size of, well, fragments, i.e. data elements stored directly in trees. This affects both the inlined shrub as well as fragments stored in the inner nodes of the btree. I believe it's very similar to what is often called "pages" in logging filesystems, though I'm going to avoid that term for now because it's a bit overloaded. Previously, neighboring writes that, when combined, would exceed our coalesce_size, they just weren't combined. Now they are combined up to our fragment size, potentially splitting the right fragment. Before (fragment_size=8): .---+---+---+---+---+---+---+---. \| 8 bytes \| '---+---+---+---+---+---+---+---' + .---+---+---+---+---. \| 5 bytes \| '---+---+---+---+---' = .---+---+---+---+---+---+---+---+---+---. \| 5 bytes \| 5 bytes \| '---+---+---+---+---+---+---+---+---+---' After: .---+---+---+---+---+---+---+---. \| 8 bytes \| '---+---+---+---+---+---+---+---' + .---+---+---+---+---. \| 5 bytes \| '---+---+---+---+---' = .---+---+---+---+---+---+---+---+---+---. \| 8 bytes \|2 bytes\| '---+---+---+---+---+---+---+---+---+---' This leads to better fragment alignment (much like our block strategy), and minimizes tree overhead. Any neighboring data to the right is only coalesced if it fits in the current fragment, or would be rewritten (carved) anyways, to avoid unnecessary data rewriting. For example (fragment_size=8): .---+---+---+---+---+---+---+---+---+---+---+---+---+---. \| 6 bytes \| 6 bytes \|2 bytes\| '---+---+---+---+---+---+---+---+---+---+---+---+---+---' + .---+---+---+---+---. \| 5 bytes \| '---+---+---+---+---' = .---+---+---+---+---+---+---+---+---+---+---+---+---+---. \| 8 bytes \| 4 bytes \|2 bytes\| '---+---+---+---+---+---+---+---+---+---+---+---+---+---' Other than these changes this commit is mostly a bunch of carveshrub rewriting again, which continues to be nuanced and annoying to get bug free.	2023-10-21 22:05:46 -05:00
Christopher Haster	907c24beeb	Renamed a number of things related to shrubs/trees - -> lfsr_shrub_t - -> lfsr_tree_t The idea here is to adopt "shrub" as an umbrella term for the shrub/sprout union, and "tree" as an umbrella term for the bptr/btree union. I think this is a bit better than calling shrub/sprout "inlined" which is a _very_ overloaded term in this codebase (inlined in the tree? the mdir? inlined in the C struct?).	2023-10-19 21:18:44 -05:00
Christopher Haster	2940555caa	Attempted to implement slice dereferencing But already there are some pretty fundamental problems. The main issue is that, while we correctly dereference slices during compaction, pending commits that get delayed after compaction still point to the old block. I'm not sure there's an easy way around this aside from aborting compaction commits or fully simulating commits, both of which seem too costly to implement... Also coalescing during compaction is flawed as well, since our attributes will be outdated by the time they are committed if there is a compaction... Looks like it's back to the drawing board. Either our approach to compaction needs to change, or this slice/coalescing work needs to be reverted/redesigned...	2023-10-19 01:05:22 -05:00
Christopher Haster	865477d7e1	Changing coalesce strategy, reimplemented shrub/btree carve Note this is already showing better code reuse, which is a good sign, though maybe that's just the benefit of reimplementing similar logic multiple times. Now both reading and carving end up in the same lfsr_btree_readnext and lfsr_btree_buildcarve functions for both btrees and shrubs. Both btrees and shrubs are fundamentally rbyds, so we can share a lot of functionality as long as we redirect to the correct commit function at the last minute. This surprising opportunity for deduplication was noticed while putting together the dbg scripts. Planned logic (not actual function names): lfsr_file_readnext -> lfsr_shrub_readnext \| \| \| v '---------> lfsr_btree_readnext lfsr_file_flushbuffer -> lfsr_shrub_carve ------------. .---------------------' \| v v lfsr_file_flushshrub -> lfsr_btree_carve -> lfsr_btree_buildcarve Though the btree part of the above statement is only a hypothetical at the moment. Not even the shrubs can survive compaction now. The reason is the new SLICE tag which needs low-level support in rbyd compact. SLICE introduces indirect refernces to data located in the same rbyd, which removes any copying cost associated with coalescing. Previously, a large coalesce_size risked O(n^2) runtime when incrementally append small amounts of data, but with SLICEs we can defer coalescing to compaction time, where the copy is effectively free. This compaction-time-coalescing is also hypothetical, which is why our tests are failing. But the theory is promising. I was originally against this idea because of how it crosses abstraction layers, requiring some very low-level code that absolutely can not be omitted in a simpler littlefs driver. But after working on the actual file writing code for a while I've become convinced the tradeoff is worth it. Note coalesce_size will likely still need to be configurable. Data in fragmenting/sparse btrees is still susceptible to coalescing, and it's not clear the impacts of internal fragmentation when data sizes approach the hard block_size/2 limit.	2023-10-17 23:21:18 -05:00
Christopher Haster	fce1612dc0	Reverted to separate BTREE/BRANCH encodings, reordered on-disk structs My current thinking is that these are conceptually different types, with BTREE tags representing the entire btree, and BRANCH tags representing only the inner btree nodes. We already have multiple btree tags anyways: btrees attached to files, the mtree, and in the future maybe a bmaptree. Having separate tags also makes it possible to store a btree in a btree, though I don't think we'll ever use this functionality. This also removes the redundant weight field from branches. The redundant weight field is only a minor cost relative to storage, but it also takes up a bit of RAM when encoding. Though measurements show this isn't really significant. New encodings: btree encoding: branch encoding: .---+- -+- -+- -+- -. .---+- -+- -+- -+- -. \| weight \| \| blocks \| +---+- -+- -+- -+- -+ ' ' \| blocks \| ' ' ' ' +---+- -+- -+- -+- -+ ' ' \| trunk \| +---+- -+- -+- -+- -+ +---+- -+- -+- -+- -' \| trunk \| \| cksum \| +---+- -+- -+- -+- -' '---+---+---+---' \| cksum \| '---+---+---+---' Code/RAM changes: code stack before: 30836 2088 after: 30944 (+0.4%) 2080 (-0.4%) Also reordered other on-disk structs with weight/size, so such structs always have weight/size as the first field. This may enable some optimizations around decoding the weight/size without needing to know the specific type in some cases. --- This change shouldn't have affected functionality, but it revealed a bug in a dtree test, where a did gets caught in an mdir split and the split name makes the did unreachable. Marking this as a TODO for now. The fix is going to be a bit involved (fundamental changes to the opened-mdir list), and similar work is already planned to make removed files work.	2023-10-15 14:53:07 -05:00
Christopher Haster	1d5946b5ea	Renamed mblocks -> mptr Since we need an bptr type internally, a block pointer, which is a bit more complicated than just a single address, calling our mdir pairs mptrs makes sense.	2023-10-14 14:11:20 -05:00
Christopher Haster	173de4388b	Added file tags to rendering of inner tree tags in dbglfs.py Now -i/--inner will also show the file tags that reference the underlying data structure. The difference is subtle but useful: littlefs v2.0 0x{0,1}.eee, rev 315, weight 0.256, bd 4096x262144 {0000,0001}: -1.1 hello reg 8192, btree 0x5121.d50 8143 0000.0efc: + 0-8142 btree w8143 11 ... 5121.0d50: \| .-+ 0-4095 block w4096 6 ... \| \| '-> 0-4095 block w4096 0x5117.0 4096 ... '-+-+ 4096-8142 block w4047 6 ... '-> 4096-8142 block w4047 0x5139.0 4047 ...	2023-10-14 04:47:25 -05:00
Christopher Haster	fbb6a27b05	Changed crystallization strategy in btrees to rely on coalescing This is a pretty big rewrite, but is necessary to avoid "dagging". "Dagging" (I just made this term up) is when you transform a pure tree into a directed acyclic graph (DAG). Normally DAGs are perfectly fine in a copy-on-write system, but in littlefs's cases, it creates havoc for future block allocator plans, and it's interaction with parity blocks raises some uncomfortable questions. How does dagging happen? Consider an innocent little btree with a single block: .-----. \|btree\| \| \| '-----' \| v .-----. \|abcde\| \| \| '-----' Say we wanted to write a small amount of data in the middle of our block. Since the data is so small, the previous scheme would simply inline the data, carving the left and right sibling (in the case the same block) to make space: .-----. \|btree\| \| \| '-----' .' v '. \| c' \| '. .' v v .-----. \|ab de\| \| \| '-----' Oh no! A DAG! With the potential for multiple pointers to reference the same block in our btree, some invariants break down: - Blocks no longer have a single reference - If you remove a reference you can no longer assume the block is free - Knowing when a block is free requires scanning the whole btree - This split operation effectively creates two blocks, does that mean we need to rewrite parity blocks? --- To avoid this whole situation, this commit adopts a new crystallization algorithm. Instead of allowing crystallization data to be arbitrarily fragmented, we eagerly coalesce any data under our crystallization threshold, and if we can't coalesce, we compact everything into a block. Much like a Knuth heap, simply checking both siblings to coalesce has the effect that any data will always coalesce up to the maximum size where possible. And when checking for siblings, we can easily find the block alignment. This also has the effect of always rewriting blocks if we are writing a small amount of data into a block. Unfortunately I think this is just necessary in order to avoid dagging. At the very least crystallization is still useful for files not quite block aligned at the edges, and sparse files. This also avoids concerns of random writes inflating a file via sparse crystallization.	2023-10-14 01:25:41 -05:00
Christopher Haster	a81691744a	Reworked lfsr_file_read a bit - Merged lfsr_file_read_ back into lfsr_file_read, I don't think we need stateless reads in the end. - Tweaked reads to use conservative hints instead of just filling all cache lines with whatever is in the retrieved datas. - Switched to if/else for sprout/shrub and bptr/btree checks. Though this had no affect on code size, which isn't too surprising.	2023-10-14 01:25:31 -05:00
Christopher Haster	57aa513163	Tweaked debug prints to show more information during mount Now when you mount littlefs, the debug print shows a bit more info: lfs.c:7881:debug: Mounted littlefs v2.0 0x{0,1}.c63 w43.256, bd 4096x256 To dissassemble this a bit: littlefs v2.0 0x{0,1}.c63 w43.256, bd 4096x256 ^ ^ '-+-' ^ ^ ^ ^ ^ '-\|-----\|----\|---\|---\|--------\|---\|-- major version '-----\|----\|---\|---\|--------\|---\|-- minor version '----\|---\|---\|--------\|---\|-- mroot blocks \| \| \| \| \| (1st is active) '---\|---\|--------\|---\|-- mroot trunk '---\|--------\|---\|-- mtree weight '--------\|---\|-- mleaf weight '---\|-- block size '-- block count dbglfs.py also shows the block device geometry now, as read from the mroot: $ ./scripts/dbglfs.py disk -B4096 littlefs v2.0 0x{0,1}.c63, rev 1, weight 43.256, bd 4096x256 ... This may be over-optimizing for testing, but the reason the mount debug is only one line is to avoid slowing down/messying test output. Both powerloss testing and remounts completely fill the output with mount prints that aren't actually all that useful. Also switching to prefering parens in debug info mainly for mismatched things.	2023-10-14 01:25:26 -05:00
Christopher Haster	5ecd6d59cd	Tweaked config and gstate reprs in dbglfs.py to be more readable Mainly aligning things, it was easy for the previous repr to become a visual mess. This also represents the config more like how we represent other tags, since they've changed from a monolithic config block to separate attributes.	2023-10-14 01:25:20 -05:00
Christopher Haster	b936e33643	Tweaked dbg scripts to resize tag repr based on weight This a compromise between padding the tag repr correctly and parsing speed. If we don't have to traverse an rbyd (for, say, tree printing), we don't want to since parsing rbyds can get quite slow when things get big (remember this is a filesystem!). This makes tag padding a bit of a hard sell. Previously this was hardcoded to 22 characters, but with the new file struct printing it quickly became apparently this would be a problematic limit: 12288-15711 block w3424 0x1a.0 3424 67 64 79 70 61 69 6e 71 gdypainq It's interesting to note that this has only become an issue for large trees, where the weight/size in the tag can be arbitrarily large. Fortunately we already have the weight of the rbyd after fetch, so we can use a heuristic similar to the id padding: tag padding = 21 + nlog10(max(weight,1)+1) --- Also dropped extra information with the -x/--device flag. It hasn't really been useful and was implemented inconsistently. Maybe -x/--device should just be dropped completely...	2023-10-14 01:25:14 -05:00
Christopher Haster	c8b60f173e	Extended dbglfs.py to show file data structures You can now pass -s/--structs to dbglfs.py to show any file data structures: $ ./scripts/dbglfs.py disk -B4096 -f -s -t littlefs v2.0 0x{0,1}.9cf, rev 3, weight 0.256 {0000,0001}: -1.1 hello reg 128, trunk 0x0.993 128 0000.0993: .-> 0-15 shrubinlined w16 16 6b 75 72 65 65 67 73 63 kureegsc .-+-> 16-31 shrubinlined w16 16 6b 65 6a 79 68 78 6f 77 kejyhxow \| .-> 32-47 shrubinlined w16 16 65 6f 66 75 76 61 6a 73 eofuvajs .-+-+-> 48-63 shrubinlined w16 16 6e 74 73 66 67 61 74 6a ntsfgatj \| .-> 64-79 shrubinlined w16 16 70 63 76 79 6c 6e 72 66 pcvylnrf \| .-+-> 80-95 shrubinlined w16 16 70 69 73 64 76 70 6c 6f pisdvplo \| \| .-> 96-111 shrubinlined w16 16 74 73 65 69 76 7a 69 6c tseivzil +-+-+-> 112-127 shrubinlined w16 16 7a 79 70 61 77 72 79 79 zypawryy This supports the same -b/-t/-i options found in dbgbtree.py, with the one exception being -z/--struct-depth which is lowercase to avoid conflict with the -Z/--depth used to indicate the filesystem tree depth. I think this is a surprisingly reasonable way to show the inner structure of files without clobbering the user's console with file contents. Don't worry, if clobbering is desired, -T/--no-truncate still dumps all of the file content. Though it's still up to the user to manually apply the sprout/shrub overlay. That step is still complex enough to not implement in this tool yet. I	2023-10-14 01:25:08 -05:00
Christopher Haster	66e6ce4bfb	Enabled no-coalescing file tests, fixed sprout->shrub transition bug Oh hey, it's that piece of complexity I was worried about. The problem was that the position calculation for new appended right_data depended on left_overlap, which fell out of sync when transitioning from sprout->shrub. The fix here is to keep left_overlap/right_overlap up to date with the model that the sprout->shrub transition is effectively doing a shrub-wide rm first. Hacky, but hopefully avoids bugs in the future by keeping all of these variables in a reasonable state... There may be a simpler way to think about how this code should function, but I just can't see it. This may deserve a rewrite in the future.	2023-10-14 01:25:01 -05:00
Christopher Haster	92e1fafbc4	Merged sprout and shrub carving paths Noticed a lot of duplicate conditions, so tried merging these two code paths. This does risk a difficult to read/maintain function, since there are some rather tricky subtleties with the sprout -> shrub transition. On the other hand, the code reuse does mean less conditions to worry about. Merging these code paths also saves a bit of code: code stack before: 30960 2256 after: 30700 (-0.8%) 2256 (+0.0%)	2023-10-14 01:24:50 -05:00
Christopher Haster	addaa8fe3e	Implemented data coalescing in sprout->shrub conversion Note we still end up with a shrub, even if the file could revert back to a sprout. This is just a simplification for the inlined file logic. We never implicitly revert to a sprout.	2023-10-14 01:22:54 -05:00
Christopher Haster	e43b4c7d9a	Implemented data coalescing in carveinlined, though it is a bit hacky The hacky part is how we interact with the scratch datas array in multiple places. This code isn't generalizable.	2023-10-14 01:22:27 -05:00
Christopher Haster	da5b6c0751	Reworked lfsr_file_carveinlined a bit, prefer no rm tag where possible This mostly figures out how things might work with coalescing, without fully implementing coalescing yet. One thing noteworthy, previously when carving right data, we would remove the right data and rewrite it. This was to accomidate implicit splits: buf: [bbbb] shrub: [llrrrrrr] 1. rm [ll] 2. append [llrr] 3. append [llbbbbrr] An implicit split being when the left sibling and right sibling are the same data buf: [bbbb] shrub: [llllllll] 1. carve [ll] 2. append [llbbbb] 3. append [llbbbbll] By separating out the split logic, this rm can be avoided: buf: [bbbb] shrub: [llrrrrrr] 1. carve [llrr] 2. append [llbbbbrr] At the cost of making our implicit split have more steps (in code), though, I believe it does have less subtle/more understandable behavior: buf: [bbbb] shrub: [llllllll] 1. carve [ll] 2. append [llll] 3. append [llbbbbll] As a plus, we avoid looking up the same sibling twice when doing implicit splits.	2023-10-14 01:14:56 -05:00
Christopher Haster	aa64c85317	Deduplicated shrub updates into lfsr_file_carveinlined lfsr_file_carveinlined writes data into a shrub, while handling both the carving logic of data we might be overlapping, and any hole logic we need to fill out the tree. This provides a nice, relatively simple but flexible, operation for all shrub updates: static int lfsr_file_carveinlined(lfs_t lfs, lfsr_file_t file, lfs_off_t pos, lfs_off_t weight, lfs_soff_t delta, lfsr_data_t data); I'm quite happy with how these internal carveinlined/carvebtree functions are coming together. It's nice to have all of that logic in one place, even if it's a bit complex.	2023-10-14 01:14:41 -05:00
Christopher Haster	39f417db45	Implemented a filesystem traversal that understands file bptrs/btrees Ended up changing the name of lfsr_mtree_traversal_t -> lfsr_traversal_t, since this behaves more like a filesytem-wide traversal than an mtree traversal (it returns several typed objects, not mdirs like the other mtree functions for one). As a part of this changeset, lfsr_btraversal_t (was lfsr_btree_traversal_t) and lfsr_traversal_t no longer return untyped lfsr_data_ts, but instead return specialized lfsr_{b,t}info_t structs. We weren't even using lfsr_data_t for its original purpose in lfsr_traversal_t. Also changed lfsr_traversal_next -> lfsr_traversal_read, you may notice at this point the changes are intended to make lfsr_traversal_t look more like lfsr_dir_t for consistency. --- Internally lfsr_traversal_t now uses a full state machine with its own enum due to the complexity of traversing the filesystem incrementally. Because creating diagrams is fun, here's the current full state machine, though note it will need to be extended for any parity-trees/free-trees/etc: mrootanchor \| v mrootchain .-' \| \| v \| mtree ---> openedblock '-. \| ^ \| ^ v v \| v \| mdirblock openedbtree \| ^ v \| mdirbtree I'm not sure I'm happy with the current implementation, and eventually it will need to be able to handle in-place repairs to the blocks it sees, so this whole thing may need a rewrite. But in the meantime, this passes the new clobber tests in test_alloc, so it should be enough to prove the file implementation works. (which is definitely is not fully tested yet, and some bugs had to be fixed for the new tests in test_alloc to pass). --- Speaking of test_alloc. The inherent cyclic dependency between files/dirs/alloc makes it a bit hard to know what order to test these bits of functionality in. Originally I was testing alloc first, because it seems you need to be confident in your block allocator before you can start testing higher-level data structures. But I've gone ahead and reversed this order, testing alloc after files/dirs. This is because of an interesting observation that if alloc is broken, you can always increase the test device's size to some absurd number (-DDISK_SIZE=16777216, for example) to kick the can down the road. Testing in this order allows alloc to use more high-level APIs and focus on corner cases where the allocator's behavior requires subtlety to be correct (e.g. ENOSPC).	2023-10-14 01:13:40 -05:00
Christopher Haster	881c46f562	Tweaked lfsr_mtree_traversal_next to no longer write the mtree/mroot This was a cludge due to needing lfs->mtree initialized to traverse the mtree, the assumption being that future traversals should strictly update the mtree/mroot to the existing state. Moving code around (and adopting an actual state machine, which will be needed for btree traversal) made this no longer necessary. Now the mtree/mroot is only initialized in lfsr_mountinited, as it should be.	2023-10-14 01:13:33 -05:00
Christopher Haster	4996b8419d	Implemented most of file btree reading/writing Still needs testing, though the byte-level fuzz tests were already causing blocks to crystallize. I noticed this because of test failures which are fixed now. Note the block allocator currently doesn't understand file btrees. To get the current tests passing requires -DDISK_SIZE=16777216 or greater. It's probably also worth noting there's a lot that's not implemented yet! Data checksums and write validation for one. Also ecksums. And we should probably have some sort of special handling for linear writes so linear writes (the most common) don't end up with a bunch of extra crystallizing writes. Also the fact that btrees can become DAGs now is an oversight and a bit concerning. Will that work with a closed allocator? Block parity?	2023-10-14 01:12:26 -05:00
Christopher Haster	1e13124091	Tweaked LFS_ASSERT impl to use __builtin_unreachable First, realized the the LFS_UNREACHABLE logic was flipped after a confusing test bug (damn double negatives). But also realized LFS_ASSERT could be tweaked to "call" __builtin_unreachable() on assert failure to act as a sort of compiler hint. Turns out this hint saves a little bit of code, note both builds have LFS_UNREACHABLE fixed: code stack without __builtin_unreachable: 28408 1928 with __builtin_unreachable: 28324 (-0.3%) 1920 (+0.0%) Since __builtin_unreachable is a compiler extension, its usage respects LFS_NO_INTRINSICS.	2023-10-14 01:11:51 -05:00
Christopher Haster	07e977bb43	Progress on file btrees Added lfsr_bptr_t to represent block pointers (maybe we should rename mblocks back to mptr), added fetching of btrees/bptrs in lfsr_file_opencfg, added estimate tracking to our shrubs so we actually know when to create a btree, and implemented most of the high-level btree logic. It's not working yet, but the biggest idea introduced here is how we handle block alignment. See, we really don't want awkward btree topologies to form where small amounts of data get stuck between blocks: .-----.--.-----. \| \| \| \| \| \| \| \| '-----'--'-----' This is wasteful, as the middle bit of data either gets represented as a full block with its data partially covered, or as data inlined in the btree, which comes with ~2x overhead. The solution here is to scan for a block on either the left or right to derive our block alignment from. Unfortunately, since our sibling blocks could have been carved, this requires scanning all the way from pos-2B+1 to pos+2B-1, a total of 4B-2, to make sure we find a sibling if there is one. worst case left worst case right .-----.-----. .-----.-----. \| xxxx\| \| \|p \|xxxxx\| \|xxxxx\| p\| \| \|xxxx \| '-----'-----' '-----'-----' '----+----' '----+----' pos-2bs+1 pos+2*bs-1 Fortunately, at this stage, data should have had many chances to coalesce, so hopefully the actual scan overhead should be much smaller in practice. Writing data to a file linearly, for example, only needs a single lookup to find the previous block.	2023-10-14 01:09:45 -05:00
Christopher Haster	52113c6ead	Moved the test/bench runner path behind an optional flag So now instead of needing: ./scripts/test.py ./runners/test_runner test_dtree You can just do: ./scripts/test.py test_dtree Or with an explicit path: ./scripts/test.py -R./runners/test_runner test_dtree This makes it easier to run the script manually. And, while there may be some hiccups with the implicit relative path, I think in general this will make the test/bench scripts easier to use. There was already an implicit runner path, though only if the test suite was completely omitted. I'm not sure that would ever have actually been useful... --- Also increased the permutation field size in --list-*, since I noticed it was overflowing.	2023-10-14 00:54:28 -05:00
Christopher Haster	df32211bda	Changed -t/--dtree to -f/--files in dbglfs.py This flag makes more sense to me and avoids conflicts with the -d/--delta flag used for gstate.	2023-10-14 00:54:06 -05:00
Christopher Haster	a2aa25aa8e	Tweaked dbgrbyd.py to show -1 tag rids	2023-10-14 00:53:31 -05:00
Christopher Haster	8c0f99890d	Tweaked appendattrs to not need to save changes to rid_	2023-10-14 00:52:18 -05:00
Christopher Haster	ef691d4cfe	Tweaked rbyd lookup/append to use 0 lower rid bias Previously our lower/upper bounds were initialized to -1..weight. This made a lot of the math unintuitive and confusing, and it's not really necessary to support -1 rids (-1 rids arise naturally in order-statistic trees the can have weight=0). The tweak here is to use lower/upper bounds initialized to 0..weight, which makes the math behave as expected. -1 rids naturally arise from rid = upper-1.	2023-10-14 00:52:00 -05:00
Christopher Haster	501f8cbe10	Implemented lfsr_file_fruncate This is an exciting new function, made possible by the order-statistic nature of our rbyds and btrees. lfsr_file_fruncation is like truncate, but from the front. It can trim data off of the front of files, and grow files from the front, effectively prefixing files with zeros cheaply. This may have some niche use cases for prefixing files with headers, but the real killer is making logging files trivial. Up until now logging into a file has always resulted in awkward file-swapping code when a file gets full. Now maintaining a log is just a single fruncate call. --- Implementation wise, lfsr_file_fruncate is very similar to lfsr_file_truncate, except we need to always inject holes into all file trees to adjust file contents correctly.	2023-10-14 00:51:26 -05:00
Christopher Haster	5adc1f54b7	Implemented and tested lfsr_file_truncate Not much to say here. We need to modify trees a bit, but at least it's relatively straightforward.	2023-10-14 00:45:32 -05:00
Christopher Haster	981e64f524	Added more seek tests, fixed some annoying POSIX/etc subtleties What do you think a file's size becomes when you: 1. seek past the end of a file 2. call write with zero data! POSIX/etc has this case explicitly mentioned, noting that zero-sized writes should never update the file size. This clashes with the assumption that file writes always update the file position, but I suppose it makes a bit of practical sense if you want zero-sized file writes to be idempotent.	2023-10-14 00:38:49 -05:00
Christopher Haster	0638b09d18	Switched to using mid to tell which files belong in a compaction This avoids the previous issues with block state for null inlined data, and we're already testing the rid anyways for splits. In theory we don't need the block for inlined data at all, but it is convenient as it allows us to use the existing internal rbyd/data APIs without needing to move data around. Though it may be worth looking into alternative layouts at some point.	2023-10-14 00:33:55 -05:00
Christopher Haster	69993da7e1	Small cleanup of inlined compaction update conditions This deduplicates quite a bit of logic which is very satisfying. It could be even better if the block field was located in the same place for both sprouts and shrubs...	2023-10-14 00:33:04 -05:00
Christopher Haster	a6357e8a5c	Renamed test_ftree->files, added fuzz tests, fixed a bug The bug was a simple miscalculation on how much data to truncate when carving a left-neighbor that also has a hole.	2023-10-14 00:31:08 -05:00
Christopher Haster	cbbd77708d	Actually made the previous commit work The logic behind relying on pre-commit inlined state to clear any failed commits was sound, but built on the wrong assumption that file->inlined would always contain the mdir's block. This was not true for null-inlined, i.e. no inlined data, since this doesn't really live anywhere. Changed file's inlined state to track the mdir block, even when we have no inlined data. A bit redundant, but a nice invariant to rely on in lfsr_mdir_compact__. This invariant also only affects lfsr_mdir_compact__, since this is the only place inlined data can change blocks.	2023-10-14 00:29:22 -05:00
Christopher Haster	58be838916	Tweaked compact to use pre-commit inlined state This seems more correct and avoids an extra set of inlined state copies. Win win.	2023-10-14 00:28:56 -05:00
Christopher Haster	488ba4b650	Fixed mdir estimate during compaction to include shrubs Once again another function we need to nearly-completely duplicate thanks to the recursive nature of our shrubs. I wasn't planning to test this at this stage, but it turns out byte-level syncs quickly fill up mdirs, triggering early ERANGE asserts unless we split. A 32-byte, byte-level synced, shrub already takes up 928 bytes when including tree overhead, 1856 bytes if you include the unsynced copy, which is very close to the 2048 byte threshold for splitting 4KiB blocks.	2023-10-14 00:20:25 -05:00
Christopher Haster	b008c2af75	Fixed bug where pre-compact commit clobbered inlined files, other tweaks We were not properly resetting the staged shrub in lfsr_mdir_commit__, well, we were sometimes, but only when transitioning from a sprout to a shrub. Also tweaked the mdir commit logic to try to only use the staging inlined state. This just simplifies how much state needs to be considered when debugging and may result in less data fetches.	2023-10-14 00:13:16 -05:00
Christopher Haster	edc4cb2fa9	Changed TEST_PLS to track number of powerlosses seen by the current test This turned out to have limited use for the tests themselves. I was hoping to avoid the mount->format->mount fallback when powerloss testing, but we still need it in case format was interrupted. Still, TEST_PLS is very useful for debugging. Previouly it was difficult to set a breakpoint at a specific location, and after a specific powerloss event. Now all you need is this in gdb: b <line> if test_pls == <pls>	2023-10-14 00:11:20 -05:00
Christopher Haster	582dc5f1b2	Added some tests, quick seek impl, fixed bugs Turns out it's hard to test file holes without seek. It's interesting to note most of seek's buffer flush work actually occurs lazily in lfsr_file_write, so lfsr_file_seek turns out to be a relatively simple function.	2023-10-14 00:09:27 -05:00
Christopher Haster	0724b9a8c4	Really revamped flushbuffer, now leveraging overwriting grow tags I had completely forgotten about overwriting grow tags, that is tags that both change the attr's weight while also changing the tag itself.	2023-10-14 00:06:55 -05:00
Christopher Haster	c2d33a1843	Reworked btree-commit/flushbuffer to incrementally build attrs This basically turns these functions into tiny bounded compilers, which is interesting to think about. I wonder if this sort of evolution led to how queries are compiled in modern databases. This method of attr generation is both easier to use and more flexible. It also saves some code, but note lfsr_file_flushbuffer underwent significant tweaking leveraging this, so the actual code savings are a bit muddy: code stack before: 25672 2024 after: 25452 (-0.9%) 1920 (-5.4%)	2023-10-14 00:01:00 -05:00
Christopher Haster	dc8dce8f0c	Introduced coalesce_size and crystallize_size, deduplicated test cfg - coalesce_size - The amount of data allowed to coalesce into single data entries. - crystallize_size - How much data is allowed to be written to btree inner nodes before needing to be compacted into a block. Also deduplicated the test config is something I've been wanting to do for a while. It doesn't make sense to need to modify several different instantiations of lfs_config every time a config option is added or removed...	2023-10-13 23:56:33 -05:00
Christopher Haster	2b950bb16b	Reworked flushbuffer logic to merge neighboring pieces of data This gets pretty ugly and mainly just involves a lot of subtle range logic. Our CAT data representation really shines here, but all of the scratch datas do come with a code/ram cost: code stack before: 25448 1920 after: 25672 (+0.9%) 2024 (+5.1%)	2023-10-13 23:48:54 -05:00
Christopher Haster	4334a848a3	Tweaked mdir commit so it handles all inlined file staging This saves a bit of code: code stack before: 25552 1920 after: 25448 (-0.4%) 1920 (+0.0%) But more importantly, this simplifies things and moves all of the staging/updating logic into lfsr_mdir_commit, where most of the subtle post-compaction interactions play out.	2023-10-13 23:46:17 -05:00
Christopher Haster	02ae6050de	Changed lfsr_data_t internals, added LFSR_DATA_CAT The main purpose of this change is to introduce LFSR_DATA_CAT, a generalized way to concatenated various data references internally. As a side-effect lfsr_data_t has been completely restructured. Now, lfsr_data_t can be in one of 4 modes: If the size field's sign bit=0, the lfsr_data_t points in-device. A new, count field, determines the encoding: sign(size)=0, count=0 => inlined: .---+---+---+---. \| size \| \|---+---+---+---\| \|c=0\| inlined d \| note inlined data is just enough to hold \|---+ \| one encoded leb128 \| ata... \| '---------------' sign(size)=1, count=1 => direct: .---+---+---+---. .---+---+---+---. \| size \| .>\| data... \| \|---+---+---+---\| \| \| . \| \|c=1\| \| \| . . . \|---+---+---+---\| \| . . . \| direct ptr -----' . . '---------------' sign(size)=1, count>=2 => indirect: .---+---+---+---. .---+---+---+---. .---+---+---+---. \| size \| .>\| size \| .>\| data... \| \|---+---+---+---\| \| \|---+---+---+---\| \| \| . \| \|c>1\| \| \| \|c=1\| \| \| . . . \|---+---+---+---\| \| \|---+---+---+---\| \| . . . \| indirect ptr ---' \| direct ptr -----' . . '---------------' '---------------' .---+---+---+---. \| size \| .>\| data... \| \|---+---+---+---\| \| \| . \| \|c=1\| \| \| . . . \|---+---+---+---\| \| . . . \| direct ptr -----' . . '---+---+---+---' \| . \| \| . \| . . . . . . . note only one indirect layer is allowed due to no recursion If the size field's sign bit=1, the lfsr_data_t points on-disk: sign(size)=0 => on-disk: .---+---+---+---. ..... \| size \| ..'' ''.. \|---+---+---+---\| : : : \| block ------+->\| ..:\| \|---+---+---+---\| \| \|......( )::::::\| \| off -------' \|:::' : \| '---------------' :' : : ''.. :.'' ''''' My goal with this commit was to test the new implementation and see how it would impact code/RAM size before adopting it in the actual file handling code, and the results are... not great... code stack before: 24668 1840 after: 25552 (+3.5%) 1920 (+4.2%) I think most of the new cost comes from the now correct handling of read/cmp with concatentated datas, which previously would just assert. This change gives us LFSR_DATA_CAT, so I will be working with it for now, but this may be worth looking at again in the future. Maybe the correct handling of read/cmp should just be reverted to an assert...	2023-10-13 23:45:41 -05:00

1 2 3 4 5 ...

1147 Commits