10 Commits

Author SHA1 Message Date
Christopher Haster
3820be180d scripts: Adopted crc32c lib when available
Jumping from a simple Python implementation to the fully hardware
accelerated crc32c library basically deletes any crc32c related
bottlenecks:

  crc32c.py disk (1MiB) w/  crc32c lib: 0m0.027s
  crc32c.py disk (1MiB) w/o crc32c lib: 0m0.844s

This uses the same try-import trick we use for inotify_simple, so we get
the speed improvement without losing portability.

---

In dbgbmap.py:

  dbgbmap.py w/  crc32c lib:             0m0.273s
  dbgbmap.py w/o crc32c lib:             0m0.697s
  dbgbmap.py w/  crc32c lib --no-ckdata: 0m0.269s
  dbgbmap.py w/o crc32c lib --no-ckdata: 0m0.490s
  dbgbmap.old.py:                        0m0.231s

The bulk of the runtime is still in Rbyd.fetch, but this is now
dominated by leb128 decoding, which makes sense. We do ~twice as many
fetches in the new dbgbmap.py in order to calculate the gcksum (which
we then ignore...).
2025-04-16 15:22:34 -05:00
Christopher Haster
5f06558cbe scripts: Added dbgbmapd3.py for bmap -> svg rendering
Like codemapd3.py this include an interactive UI for viewing the
underlying filesystem graph, including:

- mode-tree - Shows all reachable blocks from a given block
- mode-branches - Shows immediate children of a given block
- mode-references - Shows parents of a given block
- mode-redund - Shows sibling blocks in redund groups (This is
  currently just mdir pairs, but the plan is to add more)

This is _not_ a full filesystem explorer, so we don't embed all block
data/metadata in the svg. That's probably a project for another time.
However we do include interesting bits such as trunk addresses,
checksums, etc.

An example:

  # create an filesystem image
  $ make test-runner -j
  $ ./scripts/test.py -B test_files_many -a -ddisk -O- \
          -DBLOCK_SIZE=1024 \
          -DCHUNK=10 \
          -DSIZE=2050 \
          -DN=128 \
          -DBLOCK_RECYCLES=1
  ... snip ...
  done: 2/2 passed, 0/2 failed, 164pls!, in 0.16s

  # generate bmap svg
  $ ./scripts/dbgbmapd3.py disk -b1024 -otest.svg \
          -W1400 -H750 -Z --dark
  updated test.svg, littlefs v0.0 1024x1024 0x{26e,26f}.d8 w64.128, cksu
  m 41ea791e

And open test.svg in a browser of your choice.

Here's what the current colors mean:

- yellow => mdirs
- blue   => btree nodes
- green  => data blocks
- red    => corrupt/conflict issue
- gray   => unused blocks

But like codemapd3.py the output is decently customizable. See -h/--help
for more info.

And, just like codemapd3.py, this is based on ideas from d3 and
brendangregg's flamegraphs:

- d3 - https://d3js.org
- brendangregg's flamegraphs - https://github.com/brendangregg/FlameGraph

Note we don't actually use d3... the name might be a bit confusing...

---

One interesting change from the previous dbgbmap.py is the addition of
"corrupt" (bad checksum) and "conflict" (multiple parents) blocks, which
can help find bugs.

You may find the "conflict" block reporting a bit strange. Yes it's
useful for finding block allocation failures, but won't naturally formed
dags in file btrees also be reported as "conflicts"?

Yes, but the long-term plan is to move away from dags and make littlefs
a pure tree (for block allocator and error correction reasons). This
hasn't been implemented yet, so for now dags will result in false
positives.

---

Implementation wise, this script was pretty straightforward given prior
dbglfs.py and codemapd3.py work.

However there was an interesting case of https://xkcd.com/1425:

- Traverse the filesystem and build a graph - easy
- Tile a rectangle with n nice looking rectangles - uhhh

I toyed around with an analytical approach (something like block width =
sqrt(canvas_width*canvas_height/n) * block_aspect_ratio), but ended up
settling on an algorithm that divides the number of columns by 2 until
we hit our target aspect ratio.

This algorithm seems to work quite well, runs in only O(log n), and
perfectly tiles the grid for powers-of-two. Honestly the result is
better than I was expecting.
2025-04-16 15:22:17 -05:00
Christopher Haster
82f4fd3c0f scripts: Dropped list/tuple distinction in Rbyd.fetch
Also tweaked how we fetch shrubs, adding Rbyd.fetchshrub and
Btree.fetchshrub instead of overloading the bd argument.

Oh, and also added --trunk to dbgmtree.py and dbglfs.py. Actually
_using_ --trunk isn't advised, since it will probably just result in a
corrupted filesystem, but these scripts are for accessing things that
aren't normally allowed anyways.

The reason for dropping the list/tuple distinction is because it was a
big ugly hack, unpythonic, and likely to catch users (and myself) by
surprise. Now, Rbyd.fetch and friends always require separate
block/trunk arguments, and the exercise of deciding which trunk to use
is left up to the caller.
2025-04-16 15:22:11 -05:00
Christopher Haster
73127470f9 scripts: Adopted rbydaddr/tagrepr changes across scripts
Just some minor tweaks:

- rbydaddr: Return list instead of tuple, note we rely on the type
  distinction in Rbyd.fetch now.

- tagrepr: Rename w -> weight.
2025-04-16 15:21:59 -05:00
Christopher Haster
c028735741 scripts: dbgblock.py/dbgcat.py: Fixed some range corner cases
Mainly fixing unbounded ranges, which required a bit of tweaking of when
we flatten block arguments.

This adopts the trick of using slice as the representation of, well,
slices in arguments instead of tuples. This avoids type confusion with
rbydaddr also returning tuples (of tuples!).
2025-04-16 15:21:51 -05:00
Christopher Haster
62cc4dbb14 scripts: Disabled local import hack on import
Moved local import hack behind if __name__ == "__main__"

These scripts aren't really intended to be used as python libraries.
Still, it's useful to import them for debugging and to get access to
their juicy internals.
2025-01-28 14:41:30 -06:00
Christopher Haster
7cfcc1af1d scripts: Renamed summary.py -> csv.py
This seems like a more fitting name now that this script has evolved
into more of a general purpose high-level CSV tool.

Unfortunately this does conflict with the standard csv module in Python,
breaking every script that imports csv (which is most of them).
Fortunately, Python is flexible enough to let us remove the current
directory before imports with a bit of an ugly hack:

  # prevent local imports
  __import__('sys').path.pop(0)

These scripts are intended to be standalone anyways, so this is probably
a good pattern to adopt.
2024-11-09 12:31:16 -06:00
Christopher Haster
e3fdc3dbd7 scripts: Added simple mroot cycle detectors to dbg scripts
These work by keeping a set of all seen mroots as we descend down the
mroot chain. Simple, but it works.

The downside of this approach is that the mroot set grows unbounded, but
it's unlikely we'll ever have enough mroots in a system for this to
really matter.

This fixes scripts like dbgbmap.py getting stuck on intentional mroot
cycles created for testing. It's not a problem for a foreground script
to get stuck in an infinite loop, since you can just kill it, but a
background script getting stuck at 100% CPU is a bit more annoying.
2024-11-07 11:46:39 -06:00
Christopher Haster
007ac97bec scripts: Adopted double-indent on multiline expressions
This matches the style used in C, which is good for consistency:

  a_really_long_function_name(
          double_indent_after_first_newline(
              single_indent_nested_newlines))

We were already doing this for multiline control-flow statements, simply
because I'm not sure how else you could indent this without making
things really confusing:

  if a_really_long_function_name(
          double_indent_after_first_newline(
              single_indent_nested_newlines)):
      do_the_thing()

This was the only real difference style-wise between the Python code and
C code, so now both should be following roughly the same style (80 cols,
double-indent multiline exprs, prefix multiline binary ops, etc).
2024-11-06 15:31:17 -06:00
Christopher Haster
fe772e08cd Added dbgcat.py for extracted raw data from block devices
dbgcat.py is basically the same as dbgblock.py except:

- dbgcat.py pipes the block's contents directly to stdout.

  The intention is to enable external tools with Unix pipes while
  letting dbgcat.py take care of block address decoding:

    $ ./scripts/dbgcat.py disk -b4096 0x1.8 -n8 | grep littlefs
    littlefs

- Unlike dbgblock.py, dbgcat.py accepts multiple block addresses,
  concatenating the data that resides at those blocks.

  This is dbgCAT.py after all.
2024-04-01 16:28:51 -05:00