scripts: Enabled symbol->dwarf mapping via address

We have symbol->addr info and dwarf->addr info (DW_AT_low_pc), so why
not use this to map symbols to dwarf entries?

This should hopefully be more reliable than the current name based
heuristic, but only works for functions (DW_TAG_subprogram).

Note that we still have to fuzzy match due to thumb-bit weirdness (small
rant below).

---

Ok. Why in Thumb does the symbol table include the thumb bit, but the
dwarf info does not?? Would it really have been that hard to add the
thumb bit to DW_AT_low_pc so symbols and dwarf entries match?

So, because of Thumb, we can't expect either the address or name to
match exactly. The best we can do is binary search and expect the symbol
to point somewhere _within_ the dwarf's DW_AT_low_pc/DW_AT_high_pc
range.

Also why does DW_AT_high_pc store the _size_ of the function?? Why isn't
it, idunno, the _high_pc_? I get that the size takes up less space when
leb128 encoding, but surely there could have been a better name?
This commit is contained in:
Christopher Haster
2024-12-05 19:28:07 -06:00
parent eb09865868
commit 02ccbdfed2
4 changed files with 211 additions and 11 deletions

View File

@@ -393,6 +393,24 @@ class DwarfEntry:
else:
return None
@ft.cached_property
def addr(self):
if (self.tag == 'DW_TAG_subprogram'
and 'DW_AT_low_pc' in self):
return int(self['DW_AT_low_pc'], 0)
else:
return None
@ft.cached_property
def size(self):
if (self.tag == 'DW_TAG_subprogram'
and 'DW_AT_high_pc' in self):
# this looks wrong, but high_pc does store the size,
# for whatever reason
return int(self['DW_AT_high_pc'], 0)
else:
return None
def info(self, tags=None):
# recursively flatten children
def flatten(entry):
@@ -412,10 +430,42 @@ class DwarfInfo:
self.entries = entries
def get(self, k, d=None):
# allow lookup by both offset and dwarf name
if not isinstance(k, str):
# allow lookup by offset, symbol, or dwarf name
if not isinstance(k, str) and not hasattr(k, 'addr'):
return self.entries.get(k, d)
elif hasattr(k, 'addr'):
import bisect
# organize by address
if not hasattr(self, '_by_addr'):
# sort and keep largest/first when duplicates
entries = [entry
for entry in self.entries.values()
if entry.addr is not None
and entry.size is not None]
entries.sort(key=lambda x: (x.addr, -x.size))
by_addr = []
for entry in entries:
if (len(by_addr) == 0
or by_addr[-1].addr != entry.addr):
by_addr.append(entry)
self._by_addr = by_addr
# find entry by range
i = bisect.bisect(self._by_addr, k.addr,
key=lambda x: x.addr)
# check that we're actually in this entry's size
if (i > 0
and k.addr
< self._by_addr[i-1].addr
+ self._by_addr[i-1].size):
return self._by_addr[i-1]
else:
# fallback to lookup by name
return self.get(k.name, d)
else:
# organize entries by name
if not hasattr(self, '_by_name'):
@@ -548,7 +598,7 @@ def collect(obj_paths, *,
# find best matching dwarf entry, this may be slightly different
# due to optimizations
entry = info.get(sym.name)
entry = info.get(sym)
# if we have no file guess from obj path
if entry is not None and 'DW_AT_decl_file' in entry: