mirror of
https://github.com/bminor/binutils-gdb.git
synced 2025-11-16 12:34:43 +00:00
This commit removes the attempted de-duplication of types when building the gdb-index. This commit is the natural extension of this earlier commit: commitaef36dee93Date: Sun Aug 13 14:08:06 2023 +0200 [gdb/symtab] Don't deduplicate variables in gdb-index Which removed the de-duplication of variables. It is worth reading the earlier commit as all the justifications for that patch also apply to this one. Currently, when building the gdb-index we sort the type entries, moving declarations to the end of the entry list, and non-declarations to the front. Then within each group, declarations, and non-declarations, the index entries are sorted by CU offset. We then emit the first entry for any given type name. There are two problems with this. First, a non-declaration entry could be a definition, but it could also be a typedef. Now sure, a typedef is a type definition, but not necessarily a useful one. If we have a header file that contains: typedef struct foo_t foo_t; And a CU which makes use of 'foo_t', then the CU will include both a typedef and a type declaration. The target of the typedef will be the declaration. But notice, the CU will not include a type definition. If we have two CUs, one which only sees the above typedef and declaration, and another which sees the typedef and an actual type definition, then the final list of entries for this type's name will be: 1. A typedef entry that points at the declaration. 2. A typedef entry that points at the definition. 3. A definition. 4. A declaration. Now (4) will get sorted to the end of the entry list. But the order of (1), (2), and (3) will depend on the CU offset. If the CU which containing the typedef and declaration has the smallest offset, then (1) will be sorted to the front of the list of entries for this type name. Due to the de-duplication code this means that only (1) will be added to the gdb-index. After GDB starts and parses the index, if a user references 'foo_t' GDB will look in the index and find just (1). GDB loads the CU containing (1) and finds both the typedef and the declaration. But GDB does not find the full type definition. As a result GDB will display 'foo_t' as an incomplete type. This differs from the behaviour when no index is used. With no index GDB expands the first CU containing 'foo_t', finds the typedef and type declaration, decides that this is not good enough and carries on. GDB will then expand the second CU and find the type's definition, GDB now has a full understanding of the type, and can print the type correctly. We could solve this problem by marking typedefs as a distinct sub-category of types, just as we do with declarations. Then we could sort definitions to the front of the list, then typedefs, and finally, declarations after that. This would, I think, mean that we always prefer emitting a definition for a type, which would resolve this first problem, or at least, it would resolve it well enough, but it wouldn't fix the second problem. The second problem is that the Python API and the 'info types' command can be used to query all type symbols. As such, GDB needs to be able to find all the CUs which contain a given type. Especially as it is possible that a type might be defined differently within different CUs. NOTE: Obviously a program doing this (defining a type differently in different CUs) would need to be mindful of the One Definition Rule, but so long as the type doesn't escape outside of a single CU then reusing a type name isn't, as I understand it, wrong. And even if it is, the fact that it compiles, and could be a source of bugs, means (in my opinion) that GDB should handle this case to enable debugging of it. Even something as simple as 'info types ....' relies on GDB being able to find multiple entries for a given type in different CUs. If the index only contains a single type entry, then this means GDB will see different things depending on which CUs happen to have been expanded. Given all of the above, I think that any attempt to remove type entries from the gdb-index is unsafe and can result in GDB behaving differently when using the gdb-index compared to using no index. The solution is to remove the de-duplication code, which is what this patch does. Now that we no longer need to sort declarations to the end of the entry list, I've removed all the code related to the special use of GDB_INDEX_SYMBOL_KIND_UNUSED5 (which is how we marked declarations), this cleans things up a little bit. I've also renamed some of the functions away from minimize, now that there's no minimization being done. A problem was revealed by this change. When running the test gdb.cp/stub-array-size.exp with the --target_board=cc-with-gdb-index, I was seeing a failure using gcc 15.1.0. This test has two CUs, and a type 'A'. The test description says: Test size of arrays of stubbed types (structures where the full definition is not immediately available). Which I don't really understand given the test's source code. The type 'A' is defined in a header, which is included in both CUs. However, the test description does seem to be accurate; in one CU the type looks like this: <1><4a>: Abbrev Number: 8 (DW_TAG_structure_type) <4b> DW_AT_name : A <4d> DW_AT_declaration : 1 <4d> DW_AT_sibling : <0x6d> <2><51>: Abbrev Number: 9 (DW_TAG_subprogram) <52> DW_AT_external : 1 <52> DW_AT_name : ~A <55> DW_AT_decl_file : 2 <56> DW_AT_decl_line : 20 <57> DW_AT_decl_column : 11 <58> DW_AT_linkage_name: (indirect string, offset: 0x103): _ZN1AD4Ev <5c> DW_AT_virtuality : 1 (virtual) <5d> DW_AT_containing_type: <0x4a> <61> DW_AT_declaration : 1 <61> DW_AT_object_pointer: <0x66> <65> DW_AT_inline : 0 (not inlined) <3><66>: Abbrev Number: 10 (DW_TAG_formal_parameter) <67> DW_AT_type : <0x8c> <6b> DW_AT_artificial : 1 <3><6b>: Abbrev Number: 0 <2><6c>: Abbrev Number: 0 while in the second CU, the type looks like this: <1><178>: Abbrev Number: 4 (DW_TAG_structure_type) <179> DW_AT_name : A <17b> DW_AT_byte_size : 8 <17c> DW_AT_decl_file : 2 <17d> DW_AT_decl_line : 18 <17e> DW_AT_decl_column : 8 <17f> DW_AT_containing_type: <0x178> <183> DW_AT_sibling : <0x1ac> <2><187>: Abbrev Number: 5 (DW_TAG_member) <188> DW_AT_name : (indirect string, offset: 0x19e): _vptr.A <18c> DW_AT_type : <0x1be> <190> DW_AT_data_member_location: 0 <191> DW_AT_artificial : 1 <2><191>: Abbrev Number: 6 (DW_TAG_subprogram) <192> DW_AT_external : 1 <192> DW_AT_name : ~A <195> DW_AT_decl_file : 1 <196> DW_AT_decl_line : 20 <197> DW_AT_decl_column : 1 <198> DW_AT_linkage_name: (indirect string, offset: 0x103): _ZN1AD4Ev <19c> DW_AT_virtuality : 1 (virtual) <19d> DW_AT_containing_type: <0x178> <1a1> DW_AT_declaration : 1 <1a1> DW_AT_object_pointer: <0x1a5> <3><1a5>: Abbrev Number: 7 (DW_TAG_formal_parameter) <1a6> DW_AT_type : <0x1cd> <1aa> DW_AT_artificial : 1 <3><1aa>: Abbrev Number: 0 <2><1ab>: Abbrev Number: 0 So, for reasons that I don't understand, the type, despite (as far as I can see) having its full definition available, is recorded only as declared in one CU. The test then performs some actions that rely on 'sizeof(A)' and expects GDB to correctly figure out the size. This requires GDB to find, and expand the CU containing the real definition of 'A'. Prior to this patch GDB would sort the two type entries for 'A', placing the declaration second, and then record only one entry, the definition. When it came to expansion there was only one thing to expand, and this is the declaration we needed. It happens that in this test the definition is in the second CU, that is, the CU with the biggest offset. This means that, if all index entries were considered equal, the definition entry would be second. However, currently, due to the way GDB forces definitions to the front, the entry for the second CU, the definition, is placed first in the index, and with de-duplication, this is the only entry added to the index. After this patch, both the declaration and the definition are placed in the index, and as the declaration is in the CU at offset 0, the declaration is added first to the index. This should be fine. When looking for 'A' GDB should expand the CU containing the declaration, see that all we have is a declaration, and so continue, next expanding the definition, at which point we're done. However, in read-gdb-index.c, in the function mapped_gdb_index::build_name_components, there is a work around for gold bug PR gold/15646. Ironically, the bug here is that gold was not removing duplicate index entries, and it is noted that this has a performance impact on GDB. A work around for this was added to GDB in commit: commit8943b87476Date: Tue Nov 12 09:43:17 2013 -0800 Work around gold/15646. A test for this was added in: commit40d22035a7Date: Tue May 26 11:35:32 2020 +0200 [gdb/testsuite] Add test-case gold-gdb-index.exp And the fix was tweaked in commit: commitf030440daaDate: Thu May 28 17:26:22 2020 +0200 [gdb/symtab] Make gold index workaround more precise The problem specifically called out in the bug report is that namespaces can appear in multiple CUs, and that trying to complete 'ns::misspelled' would expand every CU containing namespace 'ns' due to the duplicate 'ns' type symbols. The work around that was added in8943b87476was to ignore duplicate global symbols when expanding entries from the index. In commitf030440daathis work around was restricted to only ignore duplicate type entries. This restriction was required to allow the earlier de-duplication patchaef36dee93to function correctly. Now that I'm taking the work started inaef36dee93to its logical conclusion, and allowing duplicate type entries, the work around of ignoring duplicate global type symbols is no longer needed, and can be removed. The associated test for this, added in40d22035a7, is also removed in this commit. To be clear; the performance issue mentioned in PR gold/15646 is now back again. But my claim is that gold was right all along to include the duplicate index entries, and any performance hit we see as a result, though unfortunate, is just a consequence of doing it right. That doesn't mean there's not room for optimisation and improvement in the future, though I don't have any immediate ideas, or plans in this area. It's just we can't throw out a bunch of index entries that are critical, and claim this as a performance optimisation. I am seeing some failure with this patch when using the board file dwarf5-fission-debug-types. These failures all have the error: DWARF Error: wrong unit_type in unit header (is DW_UT_skeleton, should be DW_UT_type) [in module ....] However, I ran the whole testsuite with this board, and this error crops up often, so I don't think this is something specific to my patch, so I'm choosing to ignore this. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=15646 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=15035 Approved-By: Tom Tromey <tom@tromey.com>