gdb/dwarf: write offset to parent entry for DW_IDX_parent

New in v2:

 - add doc
 - fix computation of offset in entry pool

Due to a mistake in the DWARF 5 spec, the way that GDB interprets
DW_IDX_parent when generating and reading .debug_names is not correct.

In Section 6.1.1.2, the parent index entry attribute is described as:

  Parent debugging information entry, a reference to the index entry for
  the parent. This is represented as the offset of the entry relative to
  the start of the entry pool.

But in Table 6.1, DW_IDX_parent is described as:

  Index of name table entry for parent

These two contradict each other.  The former is the correct one and the
latter is an unfortunate leftover from an earlier version of the
proposal, according to [1].  It does make sense, because pointing to a
name table entry is ambiguous, while poiting to an index entry directly
is not.  Unfortunately, GDB implemented pointing to a name table entry.

Changes on the writer side:

 - For each written pool entry, remember the offset within the pool.

 - Change the DW_IDX_parent form to DW_FORM_data4.

   Using DW_FORM_udata isn't an option, because we don't know the actual
   value when doing the first pass of writing the pool (see next point),
   so we wouldn't know how many bytes to reserve, if we used a
   variable-size encoding.

   Using a fixed 4 bytes encoding would be an issue if the entry pool
   was larger than 4 GiB, but that seems unlikely.

   Note that clang uses DW_FORM_ref4 for this, but I'm not sure it is
   appropriate, since forms of the reference class are specified as
   referring "to one of the debugging information entries that describe
   the program".  Since we're not referring to a DIE, I decided to stay
   with a form of the "constant" class.  I think that readers will be
   able to understand either way.

 - Write a dummy 4 byte number when writing the pool, then patch those
   values later.  This is needed because parents can appear before their
   children in the pool (there's no way to ensure that parents always
   appear before their children), so we might now know at first what
   value to put in.

 - Add a `write_uint` method to `class data_buf` to support that use
   case of patching a value in the middle of the data buffer.

 - Simplify the type of `m_name_to_value_set`, we no longer need to
   track the index at which a name will be written at.

 - Produce a new augmentation string, "GDB3", to be able to distinguish
   "old" and "new" indexes.  It would be possible for a reader to
   distinguish the two semantics of DW_IDX_parent using the form.
   However, current versions of GDB don't do that, so they would be
   confused trying to read a new index.  I think it is preferable to use
   a new augmentation string so that they will reject a new index
   instead.

Changes on the reader side:

 - Track the GDB augmentation version, in addition to whether the
   augmentation string indicates the index was produced by GDB.

 - When reading index entries, maintain a "pool offset" -> "cooked index
   entry" mapping, to be able to find parents by pool offset.

 - When resolving parents, keep the existing behavior of finding parents
   by name table index if the augmentation string is "GDB2.  Otherwise,
   look up parents by pool offset.  This assumes that .debug_names from
   other producers (if/when we add support for reading them) use pool
   offsets for DW_IDX_parent.  This at least what clang does.

 - Simplify augmentation string comparison a bit by using array views.

Update the "Extensions to ‘.debug_names’" section of the documentation
to reflect the new augmentation string version.

Tested by:

 - manually producing executables with "GDB2" and "GDB3" .debug_names
   sections and reading them.
 - running the testsuite with the cc-with-debug-names board

[1] https://lists.dwarfstd.org/pipermail/dwarf-discuss/2025-January/002618.html

Change-Id: I265fa38070b86ef320e0a972c300d1d755735d8d
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Approved-By: Tom Tromey <tom@tromey.com>
This commit is contained in:
Simon Marchi
2025-02-09 00:51:01 -05:00
committed by Simon Marchi
parent 9044044c27
commit ad6dde5aaa
4 changed files with 205 additions and 72 deletions

View File

@@ -22976,16 +22976,48 @@ The DWARF specification documents an optional index section called
section. However, in order to work with @value{GDBN}, some extensions
were necessary.
@value{GDBN} uses the augmentation string @samp{GDB2}. Earlier
versions used the string @samp{GDB}, but these versions of the index
are no longer supported.
@value{GDBN} uses an augmentation string to specify which extensions
are in use and to allow support of backwards-incompatible changes in
this functionality. The augmentation string has the form
@samp{GDB@var{n}}, where @var{n} is an integral version number of the
extensions, which is incremented when the extensions are added or
modified. The smallest @var{n} is 2; earlier versions of
@value{GDBN} used just @samp{GDB} with no version number, but these
versions of the index are no longer supported.
Here is a list of augmentation string versions along with the changes
introduced with each version, compared to the previous version.
@table @samp
@item GDB2
Specifies the use of attributes @code{DW_IDX_GNU_internal},
@code{DW_IDX_GNU_main}, @code{DW_IDX_GNU_language} and
@code{DW_IDX_GNU_linkage_name}, described below.
@item GDB3
Changes the semantic of the @code{DW_IDX_parent} attribute.
With @samp{GDB2}, @code{DW_IDX_parent} provided an offset into the name
table. With @samp{GDB3}, it now provides an offset to the index entry
of the parent, relative to the start of the entry pool region.
@end table
@value{GDBN} produces indexes with the augmentation string @samp{GDB3}.
@value{GDBN} can read indexes with augmentation strings @samp{GDB2} or
@samp{GDB3}. @value{GDBN} does not support reading indexes with any
other augmentation strings.
@value{GDBN} does not use the specified hash table. Therefore,
because this hash table is optional, @value{GDBN} also does not write
it.
@value{GDBN} also generates and uses some extra index attributes:
@value{GDBN} generates and uses the following non-standard index
attributes:
@table @code
@item DW_IDX_GNU_internal
This has the value @samp{0x2000}. It is a flag that, when set,
indicates that the associated entry has @code{static} linkage.
@@ -23002,6 +23034,7 @@ indicating the language of the associated entry.
This has the value @samp{0x2004}. It is a flag that, when set,
indicates that the associated entry is a linkage name, and not a
source name.
@end table
@node Symbol Errors

View File

@@ -130,6 +130,15 @@ public:
::store_unsigned_integer (grow (len), len, byte_order, val);
}
/* Accept a host-format integer in VAL and write it in the buffer at offset
OFFSET as a target-format integer which is LEN bytes long. */
void write_uint (size_t offset, size_t len, bfd_endian byte_order,
ULONGEST val)
{
gdb_assert (offset + len <= m_vec.size ());
::store_unsigned_integer (&m_vec[offset], len, byte_order, val);
}
/* Copy VALUE to the end of the buffer, little-endian. */
void append_offset (offset_type value)
{
@@ -675,10 +684,7 @@ public:
if (entry->lang == language_ada && entry->tag == DW_TAG_namespace)
return;
const auto insertpair
= m_name_to_value_set.try_emplace (c_str_view (entry->name));
entry_list &elist = insertpair.first->second;
elist.entries.push_back (entry);
m_name_to_value_set[entry->name].emplace_back (entry);
}
/* Build all the tables. All symbols must be already inserted.
@@ -692,25 +698,16 @@ public:
m_name_table_string_offs.reserve (name_count);
m_name_table_entry_offs.reserve (name_count);
/* The name table is indexed from 1. The numbers are needed here
so that parent entries can be handled correctly. */
int next_name = 1;
for (auto &item : m_name_to_value_set)
item.second.index = next_name++;
/* The next available abbrev number. */
int next_abbrev = 1;
for (auto &item : m_name_to_value_set)
for (auto &[name, these_entries] : m_name_to_value_set)
{
const c_str_view &name = item.first;
entry_list &these_entries = item.second;
/* Sort the items within each bucket. This ensures that the
generated index files will be the same no matter the order in
which symbols were added into the index. */
std::sort (these_entries.entries.begin (),
these_entries.entries.end (),
std::sort (these_entries.begin (),
these_entries.end (),
[] (const cooked_index_entry *a,
const cooked_index_entry *b)
{
@@ -730,7 +727,7 @@ public:
(m_debugstrlookup.lookup (name.c_str ())); /* ??? */
m_name_table_entry_offs.push_back_reorder (m_entry_pool.size ());
for (const cooked_index_entry *entry : these_entries.entries)
for (const cooked_index_entry *entry : these_entries)
{
unit_kind kind = (entry->per_cu->is_debug_types
? unit_kind::tu
@@ -778,7 +775,7 @@ public:
if (parent != nullptr)
{
m_abbrev_table.append_unsigned_leb128 (DW_IDX_parent);
m_abbrev_table.append_unsigned_leb128 (DW_FORM_udata);
m_abbrev_table.append_unsigned_leb128 (DW_FORM_data4);
}
/* Terminate attributes list. */
@@ -786,6 +783,14 @@ public:
m_abbrev_table.append_unsigned_leb128 (0);
}
/* Record the offset in the pool at which this entry will
reside. */
const auto offset_inserted
= (m_entry_pool_offsets.emplace (entry, m_entry_pool.size ())
.second);
gdb_assert (offset_inserted);
/* Write the entry to the pool. */
m_entry_pool.append_unsigned_leb128 (idx);
const auto it = m_cu_index_htab.find (entry->per_cu);
@@ -800,11 +805,11 @@ public:
if (parent != nullptr)
{
c_str_view par_name (parent->name);
auto name_iter = m_name_to_value_set.find (par_name);
gdb_assert (name_iter != m_name_to_value_set.end ());
gdb_assert (name_iter->second.index != 0);
m_entry_pool.append_unsigned_leb128 (name_iter->second.index);
m_offsets_to_patch.emplace_back (m_entry_pool.size (), parent);
/* Write a dummy number, this gets patched later. */
m_entry_pool.append_uint (4, m_dwarf5_byte_order,
0xfafafafa);
}
}
@@ -814,6 +819,15 @@ public:
/* Terminate tags list. */
m_abbrev_table.append_unsigned_leb128 (0);
/* Write the parent offset values. */
for (const auto &[reloc_offset, parent] : m_offsets_to_patch)
{
const auto parent_offset_it = m_entry_pool_offsets.find (parent);
gdb_assert (parent_offset_it != m_entry_pool_offsets.cend ());
m_entry_pool.write_uint (reloc_offset, 4, m_dwarf5_byte_order,
parent_offset_it->second);
}
}
/* Return .debug_names names count. This must be called only after
@@ -1059,15 +1073,26 @@ private:
offset_vec_tmpl<OffsetSize> m_name_table_entry_offs;
};
struct entry_list
{
unsigned index = 0;
std::vector<const cooked_index_entry *> entries;
};
/* Store the index entries for each name.
/* Store value of each symbol. Note that we rely on the sorting
behavior of map to make the output stable. */
std::map<c_str_view, entry_list> m_name_to_value_set;
Note that we rely on the sorting behavior of map to make the output
stable. */
std::map<c_str_view, std::vector<const cooked_index_entry *>>
m_name_to_value_set;
/* Offset at which each entry is written in the entry pool. */
gdb::unordered_map<const cooked_index_entry *, offset_type>
m_entry_pool_offsets;
/* The locations where we need to patch offset to entries.
The first element of the pair is the offset into the pool that needs to
be patched.
The second element is the entry the offset to which needs to be
patched in. */
std::vector<std::pair<offset_type, const cooked_index_entry *>>
m_offsets_to_patch;
const bfd_endian m_dwarf5_byte_order;
dwarf_tmpl<uint32_t> m_dwarf32;
@@ -1406,7 +1431,7 @@ write_debug_names (dwarf2_per_bfd *per_bfd, cooked_index *table,
const offset_type bytes_of_header
= ((dwarf5_is_dwarf64 ? 12 : 4)
+ 2 + 2 + 7 * 4
+ sizeof (dwarf5_augmentation));
+ sizeof (dwarf5_augmentation_3));
size_t expected_bytes = 0;
expected_bytes += bytes_of_header;
expected_bytes += cu_list.size ();
@@ -1458,9 +1483,9 @@ write_debug_names (dwarf2_per_bfd *per_bfd, cooked_index *table,
/* augmentation_string_size - The size in bytes of the augmentation
string. This value is rounded up to a multiple of 4. */
static_assert (sizeof (dwarf5_augmentation) % 4 == 0);
header.append_uint (4, dwarf5_byte_order, sizeof (dwarf5_augmentation));
header.append_array (dwarf5_augmentation);
static_assert (sizeof (dwarf5_augmentation_3) % 4 == 0);
header.append_uint (4, dwarf5_byte_order, sizeof (dwarf5_augmentation_3));
header.append_array (dwarf5_augmentation_3);
gdb_assert (header.size () == bytes_of_header);

View File

@@ -74,7 +74,15 @@ struct mapped_debug_names_reader
bfd *abfd = nullptr;
bfd_endian dwarf5_byte_order {};
bool dwarf5_is_dwarf64 = false;
/* True if the augmentation string indicates the index was produced by
GDB. */
bool augmentation_is_gdb = false;
/* If AUGMENTATION_IS_GDB is true, this indicates the version. Otherwise,
this value is meaningless. */
unsigned int gdb_augmentation_version = 0;
uint8_t offset_size = 0;
uint32_t cu_count = 0;
uint32_t tu_count = 0, bucket_count = 0, name_count = 0;
@@ -106,7 +114,25 @@ struct mapped_debug_names_reader
std::unordered_map<ULONGEST, index_val> abbrev_map;
std::unique_ptr<cooked_index_shard> shard;
/* Maps entry pool offsets to cooked index entries. */
gdb::unordered_map<ULONGEST, cooked_index_entry *>
entry_pool_offsets_to_entries;
/* Cooked index entries for which the parent needs to be resolved.
The second value of the pair is the DW_IDX_parent value. Its meaning
depends on the augmentation string:
- GDB2: an index in the name table
- GDB3: an offset offset into the entry pool */
std::vector<std::pair<cooked_index_entry *, ULONGEST>> needs_parent;
/* All the cooked index entries created, in the same order and groups as
listed in the name table.
The size of the outer vector is equal to the number of entries in the name
table (NAME_COUNT). */
std::vector<std::vector<cooked_index_entry *>> all_entries;
};
@@ -121,6 +147,7 @@ mapped_debug_names_reader::scan_one_entry (const char *name,
std::optional<ULONGEST> &parent)
{
unsigned int bytes_read;
const auto offset_in_entry_pool = entry - entry_pool;
const ULONGEST abbrev = read_unsigned_leb128 (abfd, entry, &bytes_read);
entry += bytes_read;
if (abbrev == 0)
@@ -239,8 +266,12 @@ mapped_debug_names_reader::scan_one_entry (const char *name,
/* Skip if we couldn't find a valid CU/TU index. */
if (per_cu != nullptr)
*result = shard->add (die_offset, (dwarf_tag) indexval.dwarf_tag, flags,
lang, name, nullptr, per_cu);
{
*result = shard->add (die_offset, (dwarf_tag) indexval.dwarf_tag, flags,
lang, name, nullptr, per_cu);
entry_pool_offsets_to_entries.emplace (offset_in_entry_pool, *result);
}
return entry;
}
@@ -296,19 +327,43 @@ mapped_debug_names_reader::scan_all_names ()
scan_entries (i, name, entry);
}
/* Now update the parent pointers for all entries. This has to be
done in a funny way because DWARF specifies the parent entry to
point to a name -- but we don't know which specific one. */
for (auto [entry, parent_idx] : needs_parent)
/* Resolve the parent pointers for all entries that have a parent.
If the augmentation string is "GDB2", the DW_IDX_parent value is an index
into the name table. Since there may be multiple index entries associated
to that name, we have a little heuristic to figure out which is the right
one.
Otherwise, the DW_IDX_parent value is an offset into the entry pool, which
is not ambiguous. */
for (auto &[entry, parent_val] : needs_parent)
{
/* Name entries are indexed from 1 in DWARF. */
std::vector<cooked_index_entry *> &entries = all_entries[parent_idx - 1];
for (const auto &parent : entries)
if (parent->lang == entry->lang)
{
entry->set_parent (parent);
break;
}
if (augmentation_is_gdb && gdb_augmentation_version == 2)
{
/* Name entries are indexed from 1 in DWARF. */
std::vector<cooked_index_entry *> &entries
= all_entries[parent_val - 1];
for (const auto &parent : entries)
if (parent->lang == entry->lang)
{
entry->set_parent (parent);
break;
}
}
else
{
const auto parent_it
= entry_pool_offsets_to_entries.find (parent_val);
if (parent_it == entry_pool_offsets_to_entries.cend ())
{
complaint (_ ("Parent entry not found for .debug_names entry"));
continue;
}
entry->set_parent (parent_it->second);
}
}
}
@@ -404,16 +459,6 @@ check_signatured_type_table_from_debug_names
/* DWARF-5 debug_names reader. */
/* The old, no-longer-supported GDB augmentation. */
static const gdb_byte old_gdb_augmentation[]
= { 'G', 'D', 'B', 0 };
static_assert (sizeof (old_gdb_augmentation) % 4 == 0);
/* DWARF-5 augmentation string for GDB's DW_IDX_GNU_* extension. This
must have a size that is a multiple of 4. */
const gdb_byte dwarf5_augmentation[8] = { 'G', 'D', 'B', '2', 0, 0, 0, 0 };
static_assert (sizeof (dwarf5_augmentation) % 4 == 0);
/* A helper function that reads the .debug_names section in SECTION
and fills in MAP. FILENAME is the name of the file containing the
section; it is used for error reporting.
@@ -525,18 +570,24 @@ read_debug_names_from_section (dwarf2_per_objfile *per_objfile,
addr += 4;
augmentation_string_size += (-augmentation_string_size) & 3;
if (augmentation_string_size == sizeof (old_gdb_augmentation)
&& memcmp (addr, old_gdb_augmentation,
sizeof (old_gdb_augmentation)) == 0)
const auto augmentation_string
= gdb::make_array_view (addr, augmentation_string_size);
if (augmentation_string == gdb::make_array_view (dwarf5_augmentation_1))
{
warning (_(".debug_names created by an old version of gdb; ignoring"));
return false;
}
map.augmentation_is_gdb = ((augmentation_string_size
== sizeof (dwarf5_augmentation))
&& memcmp (addr, dwarf5_augmentation,
sizeof (dwarf5_augmentation)) == 0);
else if (augmentation_string == gdb::make_array_view (dwarf5_augmentation_2))
{
map.augmentation_is_gdb = true;
map.gdb_augmentation_version = 2;
}
else if (augmentation_string == gdb::make_array_view (dwarf5_augmentation_3))
{
map.augmentation_is_gdb = true;
map.gdb_augmentation_version = 3;
}
if (!map.augmentation_is_gdb)
{
@@ -544,6 +595,7 @@ read_debug_names_from_section (dwarf2_per_objfile *per_objfile,
return false;
}
/* Skip past augmentation string. */
addr += augmentation_string_size;
/* List of CUs */

View File

@@ -22,7 +22,30 @@
struct dwarf2_per_objfile;
extern const gdb_byte dwarf5_augmentation[8];
/* DWARF-5 augmentation strings.
They must have a size that is a multiple of 4.
"GDB" is the old, no-longer-supported GDB augmentation.
The "GDB2" augmentation string specifies the use of the DW_IDX_GNU_*
attributes.
The meaning of the "GDB3" augmentation string is identical to "GDB2", except
for the meaning of DW_IDX_parent. With "GDB2", DW_IDX_parent represented an
index in the name table. With "GDB3", it represents an offset into the entry
pool. */
constexpr gdb_byte dwarf5_augmentation_1[4] = { 'G', 'D', 'B', 0 };
static_assert (sizeof (dwarf5_augmentation_1) % 4 == 0);
constexpr gdb_byte dwarf5_augmentation_2[8]
= { 'G', 'D', 'B', '2', 0, 0, 0, 0 };
static_assert (sizeof (dwarf5_augmentation_2) % 4 == 0);
constexpr gdb_byte dwarf5_augmentation_3[8]
= { 'G', 'D', 'B', '3', 0, 0, 0, 0 };
static_assert (sizeof (dwarf5_augmentation_3) % 4 == 0);
/* Read .debug_names. If everything went ok, initialize the "quick"
elements of all the CUs and return true. Otherwise, return false. */