After this series, expand_symtabs_matching is now misnamed. This
patch renames it, renames some associated types, and also fixes up
some comments that I previously missed.
Acked-By: Simon Marchi <simon.marchi@efficios.com>
This updates the copyright headers to include 2025. I did this by
running gdb/copyright.py and then manually modifying a few files as
noted by the script.
Approved-By: Eli Zaretskii <eliz@gnu.org>
The cooked index worker maintains the state for the various state
transition in the scanner. It is held by the cooked_index while
scanning is in progress, then deleted once this has completed.
I noticed that none of the arguments to cooked_index::done_reading
were really needed -- the cooked_index already has access to the
worker should it need it. Removing these parameters makes the code a
bit simpler and also cleans up some confusing code around the use of
the deferred warnings object.
Regression tested on x86-64 Fedora 40.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
This updates the cooked_index comment with some notes about object
lifetimes, in an attempt to make navigating this code a bit simpler.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
This moves cooked_index_shard to a couple of new files,
dwarf2/cooked-index-shard.[ch]. The rationale is the same as the
previous patch: cooked-index.h had to be split to enable other
cleanups.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
This moves cooked_index_entry and some related helper code to a couple
of new files, dwarf2/cooked-index-entry.[ch].
The main rationale for this is that in order to finish this series and
remove "cooked_index_worker::result_type", I had to split
cooked-index.h into multiple parts to avoid circular includes.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Following the previous patch, this parameter is now unused. Remove it.
Change-Id: I7e96a3ba61ad9a0d6b64f9129aeeb9a8f3da22a7
Approved-By: Tom Tromey <tom@tromey.com>
Add a few -Wunused-* diagnostic flags that look useful. Some are known
to gcc, some to clang, some to both. Fix the fallouts.
-Wunused-const-variable=1 is understood by gcc, but not clang.
-Wunused-const-variable would be undertsood by both, but for gcc at
least it would flag the unused const variables in headers. This doesn't
make sense to me, because as soon as one source file includes a header
but doesn't use a const variable defined in that header, it's an error.
With `=1`, gcc only warns about unused const variable in the main source
file. It's not a big deal that clang doesn't understand it though: any
instance of that problem will be flagged by any gcc build.
Change-Id: Ie20d99524b3054693f1ac5b53115bb46c89a5156
Approved-By: Tom Tromey <tom@tromey.com>
The cooked index needs to allocate names in some cases -- when
canonicalizing or when synthesizing Ada package names. This process
currently uses a vector of unique_ptrs to manage the memory.
Another series I'm writing adds another spot where this allocation
must be done, and examining the result showed that certain names were
allocated multiple times.
To clean this up, this patch introduces a string cache object and
changes the cooked indexer to use it. I considered using bcache here,
but bcache doesn't work as nicely with string_view -- because bcache
is fundamentally memory-based, a temporary copy of the contents must
be made to ensure that bcache can see the trailing \0. Furthermore,
writing a custom class lets us avoid another copy when canonicalizing
C++ names.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
I accidentally pushed my work-in-progress branch... revert that. Sorry
for the noise :(.
The list of commits reverted are:
ae2a50a9ae attempt to revamp to the CU/TU list
e9386435c9 gdb/dwarf: print DWARF CUs/TUs in "maint print objfiles"
6cbd64aa3e gdb/dwarf: add dwarf_source_language_name
32a187da76 libiberty: move DW_LANG_* definitions to dwarf2.def
b3fa38aef5 gdb/dwarf: move index unit vectors to debug names reader and use them
30ba744189 gdb/dwarf: track comp and type units count
bedb4e09f2 gdb/dwarf: remove unnecessary braces
b4f18de12c gdb/dwarf: use ranged for loop in some pots
Change-Id: I80aed2847025f5b15c16c997680783b39858a703
This was useful to me, to debug some problems.
Before printing cooked index entries, print a list of CUs and TUs. The
information printed for each is a bit arbitrary, I took a look at the
types and printed what seemed relevant.
An example of output for a CU:
[0] ((dwarf2_per_cu_data *) 0x50f000007840)
type: DW_UT_compile
offset: 0x0
size: 0x1bff
artificial: false
GDB lang: c++
DWARF lang: DW_LANG_C_plus_plus
And for a TU:
[2] ((signatured_type *) 0x511000040000)
type: DW_UT_type
offset: 0x0
size: 0x94
signature: 0x2e966c0dc94b065b
I moved the call to cooked_index_functions::wait before printing the
CU/TU list, otherwise trying to call "maint print objfiles" quickly,
like this, would lead to an internal error:
$ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.dwarf2/struct-with-sig/struct-with-sig -ex "maint print objfiles"
This is because dwarf2_per_cu_data::m_unit_type was not yet set, when
trying to read it. Waiting for the index to be built ensures that it is
set, since setting the unit type is done as a side-effect somewhere.
Change-Id: Ic810ec3bb4d3f5abb481cf1cee9b2954ff4f0874
I found a small bug coming from a couple of recent patches of mine for
cooked_index_entry::full_name.
First, commit aab26529b3 (Add "Ada linkage" mode to
cooked_index_entry::full_name) added a small hack to optionally
compute the Ada linkage name.
Then, commit aab2ac34d7 (Avoid excessive CU expansion on failed
matches) changed the relevant expand_symtabs_matching implementation
to use this feature.
However, the feature was used unconditionally, causing a bad side
effect: the non-canonical name is now used for all languages, not just
Ada. But, for C++ this is wrong.
Furthermore, consider the declaration of full_name:
const char *full_name (struct obstack *storage,
bool for_main = false,
bool for_ada_linkage = false,
const char *default_sep = nullptr) const;
... and then consider this call in cooked_index::dump:
gdb_printf (" qualified: %s\n",
entry->full_name (&temp_storage, false, "::"));
Oops! The "::" is silently converted to 'true' here.
To fix both of these problems, this patch changes full_name to accept
a flags enum rather than booleans. This avoids the type-safety
problem.
Then, full_name is changed to remove the "Ada" flag when the entry is
not in fact an Ada symbol.
Regression tested on x86-64 Fedora 40.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
cooked_index_storage is currently declared in `cooked-index.h` and
implemented in `read.c`. Move all that to new
`cooked-index-storage.{h,c}` files.
Change-Id: I2a07eb446d8a07b15c5664dfe01e3a820cdd45be
Approved-By: Tom Tromey <tom@tromey.com>
Unfortunately, due to some details of how the Ada support in gdb
currently works, the DWARF reader will still have to synthesize some
"full name" entries after the cooked index has been constructed.
You can see one particular finding related to this in:
https://sourceware.org/bugzilla/show_bug.cgi?id=32142
This patch adds a new flag to cooked_index_entry::full_name to enable
the construction of these names.
I hope to redo this part of the Ada support eventually, so that this
code can be removed and the full-name entries simply not created.
handle_gnat_encoded_entry might create synthetic cooked index entries
for Ada packages. These aren't currently kept in m_entries, but it
seems to me that they should be, particularly because a forthcoming
GNAT will emit explicit DW_TAG_module for these names -- with this
change, the indexes will be roughly equivalent regardless of which
compiler was used.
Currently, gdb will synthesize DW_TAG_module entries for Ada names.
These entries are treated specially by the index writer,
When GNAT starts emitting DW_TAG_module, the special case will be
incorrect, because there will be non-synthetic DW_TAG_module entries
in the index.
This patch arranges to mark the synthetic entries and changes the
index writer to follow.
I think debug-names-tu.exp.tcl only passes by accident -- the type
unit does not have a language, which gdb essentially requires.
This isn't noticeable right now because the type unit in question is
expanded in one phase and then the symbol found in another. However,
I'm working on a series that would regress this.
This patch partially fixes the problem by correcting the test case,
adding the language to the TU.
Hoewver, it then goes a bit further and arranges for this information
not to be written to .debug_names. Whether or not a type should be
considered "static" seems like something that is purely internal to
gdb, so this patch has the entry-creation function apply the
appropriate transform.
It also may make sense to change the "debug_names" proc in the test
suite to process attributes more like the ordinary "cu" proc does.
This scratches an itch I had for a while. I don't know why this struct
type has "data" in its name. Others like "dwarf2_per_objfile" and
"dwarf2_per_bfd" don't. The primary job of a structure is to hold data,
there's no need to specify it. It also makes the name a bit shorter,
which is always nice.
Rename related types too.
Change-Id: Ifb63195ff105809fc15b502f639c0bb4d18a675e
Approved-By: Tom Tromey <tom@tromey.com>
Reviewed-By: Guinevere Larsen <guinevere@redhat.com>
All users of these typedefs use them inside a gdb::function_view. Move
the gdb::function_view in the typedefs themselves. This shortens the
types in function signatures and helps with readability, IMO.
Rename them to remove the `_ftype` suffix: this suffix is not as
relevant in C++ as it was in C. With function_view, the caller can pass
more than just a simple "function". Anyway, I think it's clearer to
name them after the role the callback has (listener, matcher, etc).
Adjust some related comments.
Change-Id: Iaf9f8ede68b51ea9e4d954792e8eb90def8659a6
Approved-By: Tom Tromey <tom@tromey.com>
Throughout gdb/dwarf2, use `*_up` typedefs. Add a few missing typedefs,
and move some so they are, ideally, just after the corresponding class.
Change-Id: Iab5cd8fc2e9989d4bd8d4868586703c2312f254f
Approved-By: Tom Tromey <tom@tromey.com>
A bit more changes as in 8e745eac7d ("gdb/dwarf: rename
cooked_index::m_vector to m_shards"). I think it's clearer if the term
"index" is reserved for the whole thing, while "shard" or "index shard"
are used for the parts.
Change-Id: I457bb0016a70f3f9918f4a3c3977262a7801705b
Approved-By: Tom Tromey <tom@tromey.com>
I think that is clearer and helps readability.
Rename a few iteration variables from "index" or "idx" to "shard". In
my mental model, the "index" is the whole thing, so it's confusing to
use that word when referring to shards.
Change-Id: I208cb839e873c514d1f8eae250d4a16f31016148
Approved-By: Tom Tromey <tom@tromey.com>
I find this typedef to be confusing. The name is a bit too generic, so
it's not clear what it represents. When using the typedef for a
cooked_index_shard unique pointer, I think that spelling out the vector
type is not overly long.
Change-Id: I99fdab5cd925c37c3835b466ce40ec9c1ec7209d
Approved-By: Tom Tromey <tom@tromey.com>
New in v2:
- install address map in a single shard
- update test gdb.mi/mi-sym-info.exp to cope with the fact that
different symbols could be returned when using --max-results
When playing with the .debug_names reader, I noticed it was
significantly slower than the DWARF scanner. Using a "performance"
build of GDB (with optimization, no runtime sanitizer enabled, etc), I
measure with the following command on a rather large debug info file
(~4 GB):
$ time ./gdb -q -nx --data-directory=data-directory <binary> -iex 'maint set dwarf sync on' -batch
This measures the time it takes for GDB to build the cooked index (plus
some startup and exit overhead). I have a version of the binary without
.debug_names and a version with .debug_names added using gdb-add-index.
The results are:
- without .debug_names: 7.5 seconds
- with .debug_names: 24 seconds
This is a bit embarrassing, given that the purpose of .debug_names is to
accelerate things :). The reason is that the .debug_names processing is
not parallelized at all, while the DWARF scanner is heavily
parallelized.
The process of creating the cooked index from .debug_names is roughly in
two steps:
1. scanning of .debug_names and creation of cooked index entries (see
mapped_debug_names_reader::scan_all_names)
2. finalization of the index, name canonicalization and sorting of the
entries (see cooked_index::set_contents).
This patch grabs a low hanging fruit by creating multiple cooked index
shards instead of a single one during step one. Just doing this allows
the second step of the processing to be automatically parallelized, as
each shard is sent to a separate thread to be finalized.
With this patch, I get:
- without .debug_names: 7.5 seconds
- with .debug_names: 9.7 seconds
Not as fast as we'd like, but it's an improvement.
The process of scanning .debug_names could also be parallelized to shave
off a few seconds. My profiling shows that out of those ~10 seconds of
excecution, about 6 are inside scan_all_names. Assuming perfect
parallelization with 8 threads, it means that at best we could shave
about 5 seconds from that time, which sounds interesting. I gave it a
shot, but it's a much more intrusive change, I'm not sure if I will
finish it.
This patch caused some regressions in gdb.mi/mi-sym-info.exp with the
cc-with-debug-names board, in the test about the `--max-results` switch.
It appears at this test is relying on the specific symbols returned when
using `--max-results`. As far as I know, we don't guarantee which
specific symbols are returned, so any of the matching symbols could be
returned.
The round robin method used in this patch to assign index entries to
shards ends up somewhat randomizing which CU gets expanded first during
the symbol search, and therefore which order they appear in the
objfile's CU list, and therefore which one gets searched first.
I meditated on whether keeping compunits sorted within objfiles would
help make things more stable and predictable. It would somewhat, but it
wouldn't remove all sources of randomness. It would still possible for
a call to `expand_symtabs_matching` to stop on the first hit. Which
compunit gets expanded then would still be dependent on the specific
`quick_symbol_functions` internal details / implementation.
Commit 5b99c5718f ("[gdb/testsuite] Fix various issues in
gdb.mi/mi-sym-info.exp") had already started to make the test a bit more
flexible in terms of which symbols it accepts, but with this patch, I
think it's possible to get wildly varying results. I therefore modified
the test to count the number of returned symbols, but not expect any
specific symbol.
Change-Id: Ifd39deb437781f72d224ec66daf6118830042941
Approved-By: Tom Tromey <tom@tromey.com>
The following patch makes the .debug_names reader create multiple cooked
index shards, only one of them having an address map. The others will
have a nullptr address map.
Change the code using cooked_index_shard::m_addrmap to account for the
fact that it can be nullptr.
Change-Id: Id05b974e661d901dd43bb5ecb3a8fcfc15abc7ed
Approved-By: Tom Tromey <tom@tromey.com>
The cooked index "start_reading" method can only be called after the
dwarf2_per_bfd "index_table" member is set. This patch refactors this
code a little to centralize this constraint, adding a new
dwarf2_per_bfd::start_reading method and another (virtual) method to
dwarf_scanner_base.
This removes some casts, but also is also useful to support another
series I'm working on where the .gdb_index is rewritten.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
It can never return nullptr, return a reference instead of a pointer.
Change-Id: Ibc6f16eb74dc16059152982600ca9f426d7f80a4
Approved-By: Tom Tromey <tom@tromey.com>
Make `abbrev_table_cache::find` const, make it return a pointer to
`const abbrev_table`, adjust the fallouts.
Make `cooked_index_storage::get_abbrev_table_cache` const, make itreturn
a pointer to const `abbrev_table_cache`.
Change-Id: If63b4b3a4c253f3bd640b13bce4a854eb2d75ece
Approved-By: Tom Tromey <tom@tromey.com>
This cache holds `abbrev_table` objects, so I think it's clearer and
more consistent to name it `abbrev_table_cache`. Rename it and
everything that goes along with it.
Change-Id: I43448c0aa538dd2c3ae5efd2f7b3e7b827409d8c
Approved-By: Tom Tromey <tom@tromey.com>
While looking at the cooked index entry for local variable l4 of function test
in test-case gdb.fortran/logical.exp:
...
$ gdb -q -batch outputs/gdb.fortran/logical/logical \
-ex "maint print objfiles"
...
[9] ((cooked_index_entry *) 0x7fc6e0003010)
name: l4
canonical: l4
qualified: l4
DWARF tag: DW_TAG_variable
flags: 0x2 [IS_STATIC]
DIE offset: 0x17c
parent: ((cooked_index_entry *) 0x7fc6e0002f20) [test]
...
I noticed that while the entry does have a parent, that's not reflected in the
qualified name.
This makes it harder to write test-cases that check the parent of a cooked
index entry.
This is due to the implementation of full_name, which skips printing
parents if the language does not specify an appropriate separator.
Fix this by using "::" as default separator, getting us instead:
...
[9] ((cooked_index_entry *) 0x7f94ec0040c0)
name: l4
canonical: l4
qualified: test::l4
DWARF tag: DW_TAG_variable
flags: 0x2 [IS_STATIC]
DIE offset: 0x17c
parent: ((cooked_index_entry *) 0x7f94ec003fd0) [test]
...
Tested on x86_64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
I noticed when running test-case gdb.ada/info_exc.exp with glibc debug info
installed, that the "info exceptions" command that lists all Ada exceptions
also expands non-Ada CUs, which includes CUs in
/lib64/ld-linux-x86-64.so.2 and /lib64/libc.so.6.
Fix this by:
- adding a new lang_matcher parameter to the expand_symtabs_matching
function, and
- using that new parameter in the expand_symtabs_matching call in
ada_add_global_exceptions.
The new parameter is a hint, meaning implementations are free to ignore it and
expand CUs with any language. This is the case for partial symtabs, I'm not
sure whether it makes sense to implement support for this there.
Conversely, when processing a CU with language C and name "<artificial>"
(as produced by GCC LTO), the CU may not really have a single language and we
should ignore the lang_matcher. See also commit d2f6771173
("Fix 'catch exception' with -flto").
Now that we have lang_matcher available, also use it to limit name splitting
styles and symbol matchers to those applicable to the matched languages.
Without this patch we have (with a gdb build with -O0):
...
$ time gdb -q -batch -x outputs/gdb.ada/info_exc/gdb.in.1 > /dev/null
real 0m1.866s
user 0m2.089s
sys 0m0.120s
...
and with this patch we have:
...
$ time gdb -q -batch -x outputs/gdb.ada/info_exc/gdb.in.1 > /dev/null
real 0m0.469s
user 0m0.777s
sys 0m0.051s
...
Or, to put it in terms of number of CUs, we have 1853 CUs:
...
$ gdb -q -batch -readnow outputs/gdb.ada/info_exc/foo \
-ex start \
-ex "maint info symtabs" \
| grep -c " name "
1853
...
Without this patch, we have:
...
$ gdb -q -batch outputs/gdb.ada/info_exc/foo \
-ex start \
-ex "info exceptions" \
-ex "maint info symtabs" \
| grep -c " name "
1393
...
so ~75% of the CUs is expanded, and with this patch we have:
...
$ gdb <same-as-above>
20
...
so ~1% of the CUs is expanded.
Tested on x86_64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
PR symtab/32182
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32182
This fixes a couple of comments in dwarf2/cooked-index.h.
The comment by cooked_index_entry::canonical mentions C++, but this
field can also be different from 'name' in other situations. Rather
than enumerate the cases here (which doesn't seem important), make the
text a little less specific.
Also, cooked_index_entry::write_scope doesn't document its "for_main"
parameter -- and it is misnamed in the prototype as well.
Reviewed-By: Tom de Vries <tdevries@suse.de>
This changes cooked_index_shard::handle_gnat_encoded_entry to modify
the incoming entry itself, and to return void rather than a new name.
this simplifies the caller a little, which is convenient for a
different series I am working on.
Approved-By: Tom de Vries <tdevries@suse.de>
Cleanup includes in dwarf2/*.
1. Add the necessary includes so that clangd reports no errors when
opening header files. This ensures that header files include what
they use.
2. Remove all includes reported as unused by clangd (except
gdb-safe-ctype.h, which I think does some magic that affects what
follows).
Built-tested --enable-threading at "yes" and "no", since there are some
portions of code gated by `#ifdef CXX_STD_THREAD`.
Change-Id: I21debffcd7c2caf90f08e1e0fbba3ce30422d042
Approved-By: Tom Tromey <tom@tromey.com>
This is a simple find / replace from "struct bound_minimal_symbol" to
"bound_minimal_symbol", to make things shorter and more consisten
througout. In some cases, move variable declarations where first used.
Change-Id: Ica4af11c4ac528aa842bfa49a7afe8fe77a66849
Reviewed-by: Keith Seitz <keiths@redhat.com>
Approved-By: Andrew Burgess <aburgess@redhat.com>
dwarf2_per_bfd::index_addrmap is only used by the .gdb_index reader,
so this field can be moved to mapped_gdb_index instead. Then,
cooked_index_functions::find_per_cu can be removed in favor of a
method on the index object.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31821
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Tom de Vries pointed out that the combination of sharding,
multi-threading, and per-CU "racing" means that sometimes a cross-CU
DIE reference might not be correctly resolved. However, it's
important to handle this correctly, due to some unfortunate aspects of
DWARF.
This patch implements this by arranging to preserve each worker's DIE
map through the end of index finalization. The extra data is
discarded when finalization is done. This approach also allows the
parent name resolution to be sharded, by integrating it into the
existing entry finalization loop.
In an earlier review, I remarked that addrmap couldn't be used here.
However, I was mistaken. A *mutable* addrmap cannot be used, as those
are based on splay trees and restructure the tree even during lookups
(and thus aren't thread-safe). A fixed addrmap, on the other hand, is
just a vector and is thread-safe.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30846
This changes the DIE range map from a raw addrmap to a custom class.
A new type is used to represent the ranges, in an attempt to gain a
little type safety as well.
Note that the new code includes a map-of-maps type. This is not used
yet, but will be used in the next patch.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
This patch makes allocate_on_obstack a little bit safer, by enforcing
the rule that objects allocated on an obstack must have a trivial
destructor.
The static assert is done in a method -- doing it inside the class
itself won't work because the class is incomplete at that point.
There are a few spots in the tree that use 'addrmap' where only an
addrmap_fixed will ever really be seen. This patch changes this code
to use the more specific type.
The background DWARF reader changes introduced a race when writing to
the index cache. The problem here is that constructing the
index_cache_store_context object should only happen on the main
thread, to ensure that the various value captures do not race.
This patch adds an assert to the construct to that effect, and then
arranges for this object to be constructed by the cooked_index_worker
constructor -- which is only invoked on the main thread.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31262
This changes index_cache_store_context to also capture the per-BFD
object when it is constructed. This is used when storing to the
cache, and this approach makes the code a little simpler.
This changes quick_symbol_functions::lookup_global_symbol_language to
accept domain_search_flags rather than just a domain_enum, and fixes
up the fallout.
To avoid introducing any regressions, any code passing VAR_DOMAIN now
uses SEARCH_VFT.
That is, no visible changes should result from this patch. However,
it sets the stage to refine some searches later on.
I noticed that cooked_index_worker::start_reading isn't really needed.
This patch removes it, and also removes the SCOPED_EXIT, in favor of a
direct call.