This is a pretty classic case for memoization. We don't really need to
recalculate every stack limit at every call site.
Cuts the runtime in half:
before: 0.335s
after: 0.139s (-58.5%)
---
Unfortunately functools.cache was not fit for purpose. It's stuck using
all parameters as the key, which breaks on the "seen" parameter we use
for cycle detection that otherwise has no impact on results.
Fortunately decorators aren't too difficult in Python, so I just rolled
my own (cache1).