Commit Graph

54 Commits

Author SHA1 Message Date
Nick Alcock
b58e5ee6d8 libctf: archive: endianness-flipping and range-checking
This does endianness-flipping just like CTF dicts, flipping aggressively on
open, taking advantage of the archive's mmapped nature to flip all the size
words before each archive member as well.

The range checking verifies non-overlappingness of archive sections and
non-overrunning: it does not verify that archive members don't overlap,
because any such overlap would almost certainly fail at open time anyway
(due to the prefixed size word if nothing else).

This dug up a bug in v1 archives, where the size word included the length
of *the size word itself*: we correspondingly reduce that size if v1
archives are encountered (and fail if the result underflows).
2025-05-28 15:53:52 +01:00
Nick Alcock
16e0dd9aab libctf: archive: format v2
This commit does a bunch of things, all tangled together tightly enough that
disentangling them seemed no to be worth doing.

The biggest is a new archive format, v2, identified by a magic number which
is one higher than the v1 format's magic number.  As usual with libctf we
can only write out the new format, but can still read the old one.

The new format has multiple improvements over the old:

 - It is written native-endian and aggressively endian-swapped at open time,
   just like CTF and BTF dicts; format v1 was little-endian, necessitating
   byteswapping all over the place at read and write time rather than
   localized in one pair of functions at read time.

 - The modent array of name-offset -> archive-offset mappings for the CTF
   archives is explicitly pointed at via a new ctfa_modents header member
   rather than just starting after the end of the header.

 - The length that prepends each archive member actually indicates its
   length rather than always being sizeof (uint64_t) bytes too high (this
   was an outright bug)

 - There is a new shared properties table which in future we may be able to
   use to unify common values from the constituent CTF headers, reducing the
   size overhead of these (repeated, uncompressed) entities.  Right now it
   only contains one value, parent_name, which is the parent dict name if
   one is common across all dicts in the archive (always true for any
   archives derived from ctf_link()).  This is used to let
   ctf_archive_next() et al reliably open dicts in the archive even if they
   are child BTF dicts (which do not contain a header name).

   The properties table shares its property names with the CTF members,
   and uses the same format (and shared code) for the property values as for
   CTF archive members: length-prepended.  The archive members and
   name->value table ("modents") use distinct tables for properties and CTF
   dicts, to ensure they are spatially separated in the file, to maximize
   compressibility if we end up with a lot of properties and people compress
   the whole thing.

We can also restrict various old bug-workaround kludges that only apply to
dicts found in v1 archives: in particular, we needed to dig out the preamble
of some CTF dicts without opening them to figure out whether they used the
.dynstr or .strtab sections: this whole bug workaround is now unnecessary
for v2 and above.

There are other changes for readability and consistency:

 - The archive wrapper data structure, known outside ctf-archive.c as
   ctf_archive_t, is now consistently referred to inside ctf-archive.c as
   'struct ctf_archive_internal' and given the parameter name 'arci' rather
   than sometimes using ctf_archive_t and sometimes using 'wrapper' or 'arc'
   as parameter names.  The archive itself is always called 'struct
   ctf_archive' to emphasise that it is *not* a ctf_archive_t.
   ctf_archive_t remains the public typedef: the fact that it's not actually
   the same thing as the archive file format is an internal implementation
   detail.

 - We keep the archive header around in a new ctfi_hdr member, distinct
   from the actual archive itself, to make upgrading from v1 and cross-
   endianness support easier.  The archive itself is now kept as a char *
   and used only to root pointer arithmetic.
2025-05-28 15:11:37 +01:00
Nick Alcock
a9f8ddc5ae libctf: archive, open: when opening, always set errp to something
ctf_arc_import_parent, called by the cached-opening machinery used by
ctf_archive_next and archive-wide lookup functions like
ctf_arc_lookup_symbol, has an err-pointer parameter like all other opening
functions.  Unfortunately it unconditionally initializes it whenever
provided, even if there was no error, which can lead to its being
initialized to an uninitialized value.  This is not technically an
API-contract violation, since we don't define what happens to the error
value except when an error happens, but it is still unpleasant.

Initialize it only when there is an actual error, so we never initialize it
to an uninitialized value.

While we're at it, improve all the opening pathways: on success, set errp to
0, rather than leaving it what it was, reducing the likelihood of
uninitialized error param returns in callers too.  (This is inconsistent
with the treatment of ctf_errno(), but the err value being a parameter
passed in from outside makes the divergence acceptable: in open functions,
you're never going to be overwriting some old error value someone might want
to keep around across multiple calls, some of which are successful and some
of which are not.)

Soup up existing tests to verify all this.

Thanks to Bruce McCulloch for the original patch, and Stephen Brennan for
the report.

libctf/
	PR libctf/32903
	* ctf-archive.c (ctf_arc_open_internal): Zero errp on success.
	(ctf_dict_open_sections): Zero errp at the start.
	(ctf_arc_import_parent): Intialize err.
	* ctf-open.c (ctf_bufopen): Zero errp at the start.
	* testsuite/libctf-lookup/add-to-opened.c: Make sure one-element
	archive opens update errp.
	* testsuite/libctf-writable/ctf-compressed.c: Make sure real archive
	opens update errp.
2025-05-20 14:34:55 +01:00
Nick Alcock
918e356b18 libctf: archive: allow opening BTF dicts in archives (not for upstreaming)
BTF dicts are normally suppressed in archives, but it is possible
to create them with enough cunning.  If such an archive is
encountered, the BTF dicts in it have no parent name, which
means that ctf_arc_import_parent (used by ctf_dict_open_cached,
ctf_archive_next, and all the ctf_arc_lookup functions) fails
to figure out what parent to import, and fails.

Kludge around it by relying on our secret knowledge that ctf_link_write
always emits the parent dict into the archive first.  If no name is set,
import the parent dict for now.  (Before upstreaming, a new archive format
with a dedicated parent dict field will turn up, obviating this kludge.)
2025-04-25 21:23:08 +01:00
Nick Alcock
88f2c13d1c libctf: archive: fix ctf_dict_open_cached error handling
We were misreporting a failure to ctf_dict_open the dict as
an out-of-memory error.
2025-04-25 21:23:08 +01:00
Nick Alcock
f782340ba5 libctf, serialize: preparatory steps
The new serializer is quite a lot more customizable than the old, because it
can write out BTF as well as CTF: you can ask to write out BTF or fail,
write out CTF if required to avoid information loss, otherwise BTF, or
always write out CTF.

Callers often need to find out whether a dict could be written out as BTF
before deciding how to write it out (because a dict can never be written out
as BTF if it is compressed, a caller might well want to ask if there is
anything else that prevents BTF writeout -- say, slices, conflicting types,
or CTF_K_BIG -- before deciding whether to compress it).  GNU ld will do
this whenever it is passed only BTF sections on the input.

Figuring out whether a dict can be written out as BTF is quite expensive: we
have to traverse all the types and check them, including every member of
every struct.  So we'd rather do that work only once.  This means making a
lot of state once private to ctf_preserialize public enough that another
function can initialize it; and since the whole API is available after
calling this function and before serializing, we should probably arrange
that if we do things we know will invalidate the results of all this
checking, we are forced to do it again.

This commit does that, moving all the existing serialization state into a
new ctf_serialize_t and adding to it.  Several functions grow force_ctf
arguments that allow the caller to force CTF emission even if the type
section looks BTFish: the writeout code and archive creation use this to
force CTF emission if we are compressing, and archive creation uses it
to force CTF emission if a CTF multi-member archive is in use, because
BTF doesn't support archives at all so there's no point maintaining
BTF compatibility in that case.  The ctf_write* functions gain support for
writing out BTF headers as well as CTF, depending on whether what was
ultimately written out was actually BTF or not.

Even more than most commits in this series, there is no way this is
going to compile right now: we're in the middle of a major transition,
completed in the next few commits.
2025-04-25 18:07:44 +01:00
Nick Alcock
b5d3790c66 libctf: consecutive ctf_id_t assignment
This change modifies type ID assignment in CTF so that it works like BTF:
rather than flipping the high bit on for types in child dicts, types ascend
directly from IDs in the parent to IDs in the child, without interruption
(so type 0x4 in the parent is immediately followed by 0x5 in all children).

Doing this while retaining useful semantics for modification of parents is
challenging.  By definition, child type IDs are not known until the parent
is written out, but we don't want to find ourselves constrained to adding
types to the parent in one go, followed by all child types: that would make
the deduplicator a nightmare and would frankly make the entire ctf_add*()
interface next to useless: all existing clients that add types at all
add types to both parents and children without regard for ordering, and
breaking that would probably necessitate redesigning all of them.

So we have to be a litle cleverer.

We approach this the same way as we approach strings in the recent refs
rework: if a parent has children attached (or has ever had them attached
since it was created or last read in), any new types created in the parent
are assigned provisional IDs starting at the very top of the type space and
working down.  (Their indexes in the internal libctf arrays remain
unchanged, so we don't suddenly need multigigabyte indexes!).  At writeout
(preserialization) time, we traverse the type table (and all other table
containing type IDs) and assign refs to every type ID in exactly the same
way we assign refs to every string offset (just a different set of refs --
we don't want to update type IDs with string offset values!).

For a parent dict with children, these refs are real entities in memory:
pointers to the memory locations where type IDs are stored, tracked in the
DTD of each type.  As we traverse the type table, we assign real IDs to each
type (by simple incrementation), storing those IDs in a new dtd_final_type
field in the DTD for each type.  Once the type table and all other tables
containing type IDs are fully traversed, we update all the refs and
overwrite the IDs currently residing in each with the final IDs for each
type.

That fixes up IDs in the parent dict itself (including forward references in
structs and the like: that's why the ref updates only happen at the end);
but what about child dicts' references, both to parent types and to their
own?  We add armouring to enforce that parent dicts are always serialized
before their children (which ctf-link.c already does, because it's a
precondition for strtab deduplication), and then arrange that when a ref is
added to a type whose ID has been assigned (has a dtd_final_type), we just
immediately do an update rather than storing a ref for later updating.
Since the parent is already serialized, all parent type IDs have a
dtd_final_type by this point, and all parent IDs in the children are
properly updated. The child types can now be renumbered now we now the
number of types in the parent, and their refs updated identically to what
was just done with the parent.

One wrinkle: before the child refs are updated, while we are working over
the child's type section, the type IDs in the child start from 1 (or
something like that), which might seem to overlap the parent IDs.  But this
is not the case: when you serialize the parent, the IDs written out to disk
are changed, but the only change to the representation in memory is that we
remember a dtd_final_type for each type (and use it to update all the child
type refs): its ID in memory is the same as it always was, a nonoverlapping
provisional ID higher than any other valid ID.  We enforce all of this by
asserting that when you add a ref to a type, the memory location that is
modified must be in the buffer being serialized: the code will not let you
accidentally modify the actual DTDs in memory.

We track the number of types in the parent in a new CTFv4 (not BTF) header
field (the dumper is updated): we will also use this to open CTFv3 child
dicts without change by simply declaring for them that the parent dict has
2^31 types in it (or 2^15, for v2 and below): the IDs in the children then
naturally come out right with no other changes needed.  (Right now, opening
CTFv3 child dicts requires extra compatibility code that has not been
written, but that code will no longer need to worry about type ID
differences.)

Various things are newly forbidden:

 - you cannot ctf_import() a child into a parent if you already ctf_add()ed
   types to the child, because all its IDs would change (and since you
   already cannot ctf_add() types to a child that hasn't had its parent
   imported, this in practice means only that ctf_create() must be followed
   immediately by a ctf_import() if this is a new child, which all sane
   clients were doing anyway).

 - You cannot import a child into a parent which has the wrong number of
   (non-provisional) types, again because all its IDs would be wrong:
   because parents only add types in the provisional space if children are
   attached to it, this would break the not unknown case of opening an
   archive, adding types to the parent, and only then importing children
   into it, so we add a special case: archive members which are not children
   in an archive with more than one member always pretend to have at least
   one child, so type additions in them are always provisional even before
   you ctf_import anything. In practice, this does exactly what we want,
   since all archives so far are created by the linker and have one parent
   and N children of that parent.

Because this introduces huge gaps between index and type ID for provisional
types, some extra assertions are added to ensure that the internal
ctf_type_to_index() is only ever called on types in the current dict (never
a parent dict): before now, this was just taken on trust, and it was often
wrong (which at best led to wrong results, as wrong array indexes were used,
and at worst to a buffer overflow). When hash debugging is on (suggesting
that the user doesn't mind expensive checks), every ctf_type_to_index()
triggers a ctf_index_to_type() to make sure that the operations are proper
inverses.

Lots and lots of tests are added to verify that assignment works and that
updating of every type kind works fine -- existing tests suffice for
type IDs in the variable and symtypetab sections.

The ld-ctf tests get a bunch of largely display-based updates: various
tests refer to 0x8... type IDs, which no longer exist, and because the
IDs are shorter all the spacing and alignment has changed.
2025-03-16 15:25:27 +00:00
Nick Alcock
beccf36b88 libctf: move string deduplication into ctf-archive
This means that any archive containing dicts can get its strings dedupped
together, rather than only those that are ctf_linked.

(For now, we are still constrained to ctf_linked archives, since fixing that
requires further changes to ctf_dedup_strings: but this gives us the first
half of what is necessary.)

libctf/
	* ctf-link.c (ctf_link_write): Move string dedup into...
	* ctf-archive.c (ctf_arc_preserialize): ... this new function.
	(ctf_arc_write_fd): Call it.
2025-02-28 15:13:24 +00:00
Nick Alcock
30cced0da6 libctf, archive, link: fix parent importing
We are about to move to a regime where there are very few things you can do
with most dicts before you ctf_import them.  So emit a warning if
ctf_archive_next()'s convenience ctf_import of parents fails.  Rip out the
buggy code in ctf_link_deduplicating_open_inputs which opened the parent by
hand (with a hardwired name), and instead rely on ctf_archive_next to do it
for us (which also means we don't end up opening it twice, once in
ctf_archive_next, once in ctf_link_deduplicating_open_inputs).

While we're there, arrange to close the inputs we already opened if opening
of some inputs fails, rather than leaking them.  (There are still some leaks
here, so add a comment to remind us to clean them up later.)

libctf/
	* ctf-archive.c (ctf_arc_import_parent): Emit a warning if importing
	fails.
	* ctf-link.c (ctf_link_deduplicating_open_inputs): Rely on the
        ctf_archive_next to open parent dicts.
2025-02-28 14:47:24 +00:00
Alan Modra
e8e7cf2abe Update year range in copyright notice of binutils files 2025-01-01 18:29:57 +10:30
Nick Alcock
86fd34fde1 libctf: fix ctf_archive_count return value on big-endian
This failed to properly byteswap its return value.

The ctf_archive format predates the idea of "just write natively and
flip on open", and byteswaps all over the place.  It's too easy to
forget one.  The next revision of the archive format (not versioned,
so we just tweak the magic number instead) should be native-endianned
like the dicts inside it are.

libctf/
	* ctf-archive.c (ctf_archive_count): Byteswap return value.
2024-07-31 21:10:06 +01:00
Nick Alcock
36c771b179 libctf: fix CTF dict compression
Commit 483546ce4f ("libctf: make ctf_serialize() actually serialize")
accidentally broke dict compression.  There were two bugs:

 - ctf_arc_write_one_ctf was still making its own decision about
   whether to compress the dict via direct ctf_size comparison, which is
   unfortunate because now that it no longer calls ctf_serialize itself,
   ctf_size is always zero when it does this: it should let the writing
   functions decide on the threshold, which they contain code to do which is
   simply not used for lack of one trivial wrapper to write to an fd and
   also provide a compression threshold

 - ctf_write_mem, the function underlying all writing as of the commit
   above, was calling zlib's compressBound and avoiding compression if this
   returned a value larger than the input.  Unfortunately compressBound does
   not do a trial compression and determine whether the result is
   compressible: it just adds zlib header sizes to the value passed in, so
   our test would *always* have concluded that the value was incompressible!
   Avoid by simply always compressing if the raw size is larger than the
   threshold: zlib is quite clever enough to avoid actually compressing
   if the data is incompressible.

Add a testcase for this.

libctf/
	* ctf-impl.h (ctf_write_thresholded): New...
	* ctf-serialize.c (ctf_write_thresholded): ... defined here,
        a wrapper around...
        (ctf_write_mem): ... this.  Don't check compressibility.
	(ctf_compress_write): Reimplement as a ctf_write_thresholded
        wrapper.
	(ctf_write): Likewise.
	* ctf-archive.c (arc_write_one_ctf): Just call
        ctf_write_thresholded rather than trying to work out whether
        to compress.
	* testsuite/libctf-writable/ctf-compressed.*: New test.
2024-07-31 21:02:05 +01:00
Nick Alcock
2fa4b6e6df libctf, include: new functions for looking up enumerators
Three new functions for looking up the enum type containing a given
enumeration constant, and optionally that constant's value.

The simplest, ctf_lookup_enumerator, looks up a root-visible enumerator by
name in one dict: if the dict contains multiple such constants (which is
possible for dicts created by older versions of the libctf deduplicator),
ECTF_DUPLICATE is returned.

The next simplest, ctf_lookup_enumerator_next, is an iterator which returns
all enumerators with a given name in a given dict, whether root-visible or
not.

The most elaborate, ctf_arc_lookup_enumerator_next, finds all
enumerators with a given name across all dicts in an entire CTF archive,
whether root-visible or not, starting looking in the shared parent dict;
opened dicts are cached (as with all other ctf_arc_*lookup functions) so
that repeated use does not incur repeated opening costs.

All three of these return enumerator values as int64_t: unfortunately, API
compatibility concerns prevent us from doing the same with the other older
enum-related functions, which all return enumerator constant values as ints.
We may be forced to add symbol-versioning compatibility aliases that fix the
other functions in due course, bumping the soname for platforms that do not
support such things.

ctf_arc_lookup_enumerator_next is implemented as a nested ctf_archive_next
iterator, and inside that, a nested ctf_lookup_enumerator_next iterator
within each dict.  To aid in this, add support to ctf_next_t iterators for
iterators that are implemented in terms of two simultaneous nested iterators
at once.  (It has always been possible for callers to use as many nested or
semi-overlapping ctf_next_t iterators as they need, which is one of the
advantages of this style over the _iter style that calls a function for each
thing iterated over: the iterator change here permits *ctf_next_t iterators
themselves* to be implemented by iterating using multiple other iterators as
part of their internal operation, transparently to the caller.)

Also add a testcase that tests all these functions (which is fairly easy
because ctf_arc_lookup_enumerator_next is implemented in terms of
ctf_lookup_enumerator_next) in addition to enumeration addition in
ctf_open()ed dicts, ctf_add_enumerator duplicate enumerator addition, and
conflicting enumerator constant deduplication.

include/
	* ctf-api.h (ctf_lookup_enumerator): New.
	(ctf_lookup_enumerator_next): Likewise.
	(ctf_arc_lookup_enumerator_next): Likewise.

libctf/
	* libctf.ver: Add them.
	* ctf-impl.h (ctf_next_t) <ctn_next_inner>: New.
	* ctf-util.c (ctf_next_copy): Copy it.
        (ctf_next_destroy): Destroy it.
	* ctf-lookup.c (ctf_lookup_enumerator): New.
	(ctf_lookup_enumerator_next): New.
	* ctf-archive.c (ctf_arc_lookup_enumerator_next): New.
	* testsuite/libctf-lookup/enumerator-iteration.*: New test.
	* testsuite/libctf-lookup/enum-ctf-2.c: New test CTF, used by the
	  above.
2024-06-18 13:20:32 +01:00
Nick Alcock
e3cd566075 libctf: fix dict leak on archive-wide symbol lookup error path
If a lookup fails for a reason unrelated to a lack of type data for this
symbol, we return with an error; but we fail to close the dict we opened
most recently, which is leaked.

libctf/
	* ctf-archive.c (ctf_arc_lookup_sym_or_name): Close dict.
2024-06-18 13:20:32 +01:00
Nick Alcock
2dd3fd0de4 libctf: ctf_archive_iter: fix tiny leak
If iteration fails because opening a dict has failed, ctf_archive_next does
not destroy the iterator, so the caller can keep going and try to open other
dicts further into the archive.  ctf_archive_iter just returns, though, so
it should free the iterator rather than leaking it.

libctf/
	* ctf-archive.c (ctf_archive_iter): Don't leak the iterator on
	failure.
2024-05-17 12:58:17 +01:00
Nick Alcock
61914bb699 libctf: failure to open parent dicts that exist should be an error
CTF archive member opening (via ctf_arc_open_by_name, ctf_archive_iter, et
al) attempts to be helpful and auto-open and import any needed parent dict
in the same archive.  But if this fails, the error is not reported but
simply discarded, and you silently get back a dict with no parent, that
*you* suddenly have to remember to import.

This is not helpful behaviour: if the parent is corrupted or we run out of
memory or something, the caller is going to want to know!  Split it in two:
if the dict cites a parent that doesn't exist at all (a lot of historic
dicts name "PARENT" as their parent, even when they're not even children, or
perhaps the parent dict is stored separately and you plan to manually
associate it), we skip it as now, but if the import fails with an actual
error other than ECTF_ARNNAME, return the error and fail the open.

libctf/
	* ctf-archive.c (ctf_arc_import_parent):  Return failure if
        parent opening fails for reasons other thnn nonexistence.
	(ctf_dict_open_sections): Adjust.
2024-05-17 12:58:17 +01:00
Nick Alcock
483546ce4f libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.

It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.

Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.

... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t.  This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly.  This simplifies most of its callers
significantly.

(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)

This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.

libctf/

	* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
	(ctf_type_sect_size): Add static type sizes too.
	(ctf_serialize): Return the new dict rather than updating the
	existing dict.  No longer fail for dicts with static types;
	copy them onto the start of the new types table.
	(ctf_gzwrite): Actually serialize before gzwriting.
	(ctf_write_mem): Improve forced (test-mode) endian-flipping:
	flip dicts even if they are too small to be compressed.
	Improve confusing variable naming.
	* ctf-archive.c (arc_write_one_ctf): Don't bother to call
	ctf_serialize: both the functions we call do so.
	* ctf-string.c (ctf_str_create_atoms): Drop serializing case
	(atoms arg).
	* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
	(ctf_simple_open_internal): Delete.
	(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
	longer bother with syn_ext_strtab or forced atoms table,
	serialization no longer needs them.
	* ctf-create.c (ctf_create): Call ctf_bufopen directly.
	* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
	(ctf_simple_open_internal): Delete.
	(ctf_bufopen_internal): Likewise.
	(ctf_serialize): Adjust.
	* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
	this is supposed to work.
2024-04-19 16:14:47 +01:00
Nick Alcock
ca01922784 libctf: don't leak the symbol name in the name->type cache
This cache replaced a cache of symbol index->ctf_id_t. That cache was
just an array, so it could get away with just being free()d, but the
ctfi_symnamedicts cache that replaced it is a full dynhash with a
dynamically-allocated string as the key.  As such, it needs freeing with
ctf_dynhash_destroy(), not just free(), or we leak parts of the
underlying hashtab, and all the keys.

libctf/ChangeLog:

	* ctf-archive.c (ctf_arc_flush_caches): Fix leak.
2024-04-19 16:14:45 +01:00
Alan Modra
fd67aa1129 Update year range in copyright notice of binutils files
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:

1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
   author I haven't committed, 'Kalray SA.', to cover gas testsuite
   files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
2024-01-04 22:58:12 +10:30
Alan Modra
d664a6aad2 libctf: unused variable
* ctf-archive.c (arc_mmap_writeout): Delete unused variable.
2023-03-20 16:06:40 +10:30
Alan Modra
027333da75 ctf segfaults
PR 30228
	PR 30229
	* ctf-open.c (ctf_bufopen_internal): Check for NULL cts_data.
	* ctf-archive.c (ctf_arc_bufpreamble, ctf_arc_bufopen): Likewise.
2023-03-19 22:19:19 +10:30
Alan Modra
d87bef3a7b Update year range in copyright notice of binutils files
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
2023-01-01 21:50:11 +10:30
Alan Modra
a2c5833233 Update year range in copyright notice of binutils files
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.

The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
2022-01-02 12:04:28 +10:30
Nick Alcock
eefe721ead libctf: fix GNU style for do {} while
It's formatted like this:

do
  {
    ...
  }
while (...);

Not like this:

do
 {
    ...
  } while (...);

or this:

do {
  ...
} while (...);

We used both in various places in libctf.  Fixing it necessitated some
light reindentation.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-archive.c (ctf_archive_next): GNU style fix for do {} while.
	* ctf-dedup.c (ctf_dedup_rhash_type): Likewise.
	(ctf_dedup_rwalk_one_output_mapping): Likewise.
	* ctf-dump.c (ctf_dump_format_type): Likewise.
	* ctf-lookup.c (ctf_symbol_next): Likewise.
	* swap.h (swap_thing): Likewise.
2021-03-18 12:37:55 +00:00
Nick Alcock
ac36e134d9 libctf: reimplement many _iter iterators in terms of _next
Ever since the generator-style _next iterators were introduced, there
have been separate implementations of the functional-style _iter
iterators that do the same thing as _next.

This is annoying and adds more dependencies on the internal guts of the
file format.  Rip them all out and replace them with the corresponding
_next iterators.  Only ctf_archive_raw_iter and ctf_label_iter survive,
the former because there is no access to the raw binary data of archives
via any _next iterator, and the latter because ctf_label_next hasn't
been implemented (because labels are currently not used for anything).

Tested by reverting the change (already applied) that reimplemented
ctf_member_iter in terms of ctf_member_next, then verifying that the
_iter and _next iterators produced the same results for every iterable
entity within a large type archive.

libctf/ChangeLog
2021-03-02  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-types.c (ctf_member_iter): Move 'rc' to an inner scope.
	(ctf_enum_iter): Reimplement in terms of ctf_enum_next.
	(ctf_type_iter): Reimplement in terms of ctf_type_next.
	(ctf_type_iter_all): Likewise.
	(ctf_variable_iter): Reimplement in terms of ctf_variable_next.
	* ctf-archive.c (ctf_archive_iter_internal): Remove.
	(ctf_archive_iter): Reimplement in terms of ctf_archive_next.
2021-03-02 15:09:18 +00:00
Nick Alcock
eaa2913a7a libctf: ctf_archive_next should set the parent name consistently
The top level of CTF containers is a "CTF archive", which contains a
collection of named members (each a CTF dictionary).  In the serialized
file format, this is optional and skipped if the archive would have only
one member, as when no ambiguous types are present: so it is commonplace
to have a simple ctf_dict_t written out, with no archive container
wrapped around it.

But, unlike ctf_archive_iter, ctf_archive_next didn't quite handle this
case right.  It should set the name of this fake "member" to
_CTF_SECTION, i.e. ".ctf", but it was failing to do so, so callers got
an unintialized variable back instead and were understandably confused.

So set the name properly.

libctf/ChangeLog
2021-03-02  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-archive.c (ctf_archive_next): Set the name of parents in
	single-member archives.
2021-03-02 15:08:03 +00:00
Nick Alcock
f4f60336da libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number.  But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).

This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.

This is unhelpful and pointlessly inefficient.

So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.

To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name.  This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.

The actual name->index lookup is done by ctf_lookup_symbol_idx.  We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity.  Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache.  To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache.  This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict.  ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.

(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all.  We can
fix this later by using the archive caching machinery more
aggressively.)

In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive.  We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol.  This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching.  (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)

We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.

include/ChangeLog
2021-02-17  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (ctf_arc_lookup_symbol_name): New.
	(ctf_lookup_by_symbol_name): Likewise.

libctf/ChangeLog
2021-02-17  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
	<ctf_symhash_latest>: Likewise.
	(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
	<ctfi_symnamedicts>: New.
	<ctfi_syms>: Remove.
	(ctf_lookup_symbol_name): Remove.
	* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
	parent properly.  Make static.
	(ctf_lookup_symbol_idx): New, linear search for the symbol name,
	cached in the crossdict cache's ctf_symhash (if available), or
	this dict's (otherwise).
	(ctf_try_lookup_indexed): Allow the symname to be passed in.
	(ctf_lookup_by_symbol): Turn into a wrapper around...
	(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
	using ctf_lookup_symbol_idx in non-writable dicts.  Special-case
	name lookup in dynamic dicts without reported symbols, which have
	no symtab or dynsymidx but where name lookup should still work.
	(ctf_lookup_by_symbol_name): New, another wrapper.
	* ctf-archive.c (enosym): Note that this is present in
	ctfi_symnamedicts too.
	(ctf_arc_close): Adjust for removal of ctfi_syms.  Free the
	ctfi_symnamedicts.
	(ctf_arc_flush_caches): Likewise.
	(ctf_dict_open_cached): Memoize the first cached dict in the
	crossdict cache.
	(ctf_arc_lookup_symbol): Turn into a wrapper around...
	(ctf_arc_lookup_sym_or_name): ... this.  No longer cache
	ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
	still cache the dicts those lookups succeed in).  Add
	lookup-by-name support, with dicts of successful lookups cached in
	ctfi_symnamedicts.  Refactor the caching code a bit.
	(ctf_arc_lookup_symbol_name): New, another wrapper.
	* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
	* libctf.ver (LIBCTF_1.2): New version.  Add
	ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
	* testsuite/libctf-lookup/enum-symbol.c (main): Use
	ctf_arc_lookup_symbol rather than looking up the name ourselves.
	Fish it out repeatedly, to make sure that symbol caching isn't
	broken.
	(symidx_64): Remove.
	(symidx_32): Remove.
	* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
	in an unlinked object file (indexed symtypetab sections only).
	* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
	(try_maybe_reporting): Check symbol types via
	ctf_lookup_by_symbol_name as well as ctf_symbol_next.
	* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
	lookups in a multi-dict archive.
2021-02-20 16:37:08 +00:00
Nick Alcock
8769046e5a libctf: remove outdated comment about parent dict importing
Parent dicts are nowadays imported automatically in most situations, so
the comment in ctf_archive_iter warning people that they need to import
parents by hand is wrong.  Remove it.

libctf/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-archive.c (ctf_archive_iter): Remove outdated comment.
2021-01-05 14:53:40 +00:00
Alan Modra
250d07de5c Update year range in copyright notice of binutils files 2021-01-01 10:31:05 +10:30
Nick Alcock
53651de80f libctf, include: support foreign-endianness symtabs with CTF
The CTF symbol lookup machinery added recently has one deficit: it
assumes the symtab is in the machine's native endianness.  This is
always true when the linker is writing out symtabs (because cross
linkers byteswap symbols only after libctf has been called on them), but
may be untrue in the cross case when the linker or another tool
(objdump, etc) is reading them.

Unfortunately the easy way to model this to the caller, as an endianness
field in the ctf_sect_t, is precluded because doing so would change the
size of the ctf_sect_t, which would be an ABI break.  So, instead, allow
the endianness of the symtab to be set after open time, by calling one
of the two new API functions ctf_symsect_endianness (for ctf_dict_t's)
or ctf_arc_symsect_endianness (for entire ctf_archive_t's).  libctf
calls these functions automatically for objects opened via any of the
BFD-aware mechanisms (ctf_bfdopen, ctf_bfdopen_ctfsect, ctf_fdopen,
ctf_open, or ctf_arc_open), but the various mechanisms that just take
raw ctf_sect_t's will assume the symtab is in native endianness and need
a later call to ctf_*symsect_endianness to adjust it if needed.  (This
call is basically free if the endianness is actually native: it only
costs anything if the symtab endianness was previously guessed wrong,
and there is a symtab, and we are using it directly rather than using
symtab indexing.)

Obviously, calling ctf_lookup_by_symbol or ctf_symbol_next before the
symtab endianness is correctly set will probably give wrong answers --
but you can set it at any time as long as it is before then.

include/ChangeLog
2020-11-23  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h: Style nit: remove () on function names in comments.
	(ctf_sect_t): Mention endianness concerns.
	(ctf_symsect_endianness): New declaration.
	(ctf_arc_symsect_endianness): Likewise.

libctf/ChangeLog
2020-11-23  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dict_t) <ctf_symtab_little_endian>: New.
	(struct ctf_archive_internal) <ctfi_symsect_little_endian>: Likewise.
	* ctf-create.c (ctf_serialize): Adjust for new field.
	* ctf-open.c (init_symtab): Note the semantics of repeated calls.
	(ctf_symsect_endianness): New.
	(ctf_bufopen_internal): Set ctf_symtab_little_endian suitably for
	the native endianness.
	(_Static_assert): Moved...
	(swap_thing): ... with this...
	* swap.h: ... to here.
	* ctf-util.c (ctf_elf32_to_link_sym): Use it, byteswapping the
	Elf32_Sym if the ctf_symtab_little_endian demands it.
	(ctf_elf64_to_link_sym): Likewise swap the Elf64_Sym if needed.
	* ctf-archive.c (ctf_arc_symsect_endianness): New, set the
	endianness of the symtab used by the dicts in an archive.
	(ctf_archive_iter_internal): Initialize to unknown (assumed native,
	do not call ctf_symsect_endianness).
	(ctf_dict_open_by_offset): Call ctf_symsect_endianness if need be.
	(ctf_dict_open_internal): Propagate the endianness down.
	(ctf_dict_open_sections): Likewise.
	* ctf-open-bfd.c (ctf_bfdopen_ctfsect): Get the endianness from the
	struct bfd and pass it down to the archive.
	* libctf.ver: Add ctf_symsect_endianness and
	ctf_arc_symsect_endianness.
2020-11-25 19:11:35 +00:00
Nick Alcock
2c78e92523 libctf, include: CTF-archive-wide symbol lookup
CTF archives may contain multiple dicts, each of which contain many
types and possibly a bunch of symtypetab entries relating to those
types: each symtypetab entry is going to appear in exactly one dict,
with the corresponding entries in the other dicts empty (either pads, or
indexed symtypetabs that do not mention that symbol).  But users of
libctf usually want to get back the type associated with a symbol
without having to dig around to find out which dict that type might be
in.

This adds machinery to do that -- and since you probably want to do it
repeatedly, it adds internal caching to the ctf-archive machinery so
that iteration over archives via ctf_archive_next and repeated symbol
lookups do not have to repeatedly reopen the archive.  (Iteration using
ctf_archive_iter will gain caching soon.)

Two new API functions:

ctf_dict_t *
ctf_arc_lookup_symbol (ctf_archive_t *arc, unsigned long symidx,
		       ctf_id_t *typep, int *errp);

This looks up the symbol with index SYMIDX in the archive ARC, returning
the dictionary in which it resides and optionally the type index as
well.  Errors are returned in ERRP.  The dict should be
ctf_dict_close()d when done, but is also cached inside the ctf_archive
so that the open cost is only paid once.  The result of the symbol
lookup is also cached internally, so repeated lookups of the same symbol
are nearly free.

void ctf_arc_flush_caches (ctf_archive_t *arc);

Flush all the caches. Done at close time, but also available as an API
function if users want to do it by hand.

include/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (ctf_arc_lookup_symbol): New.
	(ctf_arc_flush_caches): Likewise.
	* ctf.h: Document new auto-ctf_import behaviour.

libctf/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (struct ctf_archive_internal) <ctfi_dicts>: New, dicts
	the archive machinery has opened and cached.
	<ctfi_symdicts>: New, cache of dicts containing symbols looked up.
	<ctfi_syms>: New, cache of types of symbols looked up.
	* ctf-archive.c (ctf_arc_close): Free them on close.
	(enosym): New, flag entry for 'symbol not present'.
	(ctf_arc_import_parent): New, automatically import the parent from
	".ctf" if this is a child in an archive and ".ctf" is present.
	(ctf_dict_open_sections): Use it.
	(ctf_archive_iter_internal): Likewise.
	(ctf_cached_dict_close): New, thunk around ctf_dict_close.
	(ctf_dict_open_cached): New, open and cache a dict.
	(ctf_arc_flush_caches): New, flush the caches.
	(ctf_arc_lookup_symbol): New, look up a symbol in (all members of)
	an archive, and cache the lookup.
	(ctf_archive_iter): Note the new caching behaviour.
	(ctf_archive_next): Use ctf_dict_open_cached.
	* libctf.ver: Add ctf_arc_lookup_symbol and ctf_arc_flush_caches.
2020-11-20 13:34:11 +00:00
Nick Alcock
3d16b64e28 bfd, include, ld, binutils, libctf: CTF should use the dynstr/sym
This is embarrassing.

The whole point of CTF is that it remains intact even after a binary is
stripped, providing a compact mapping from symbols to types for
everything in the externally-visible interface of an ELF object: it has
connections to the symbol table for that purpose, and to the string
table to avoid duplicating symbol names.  So it's a shame that the hooks
I implemented last year served to hook it up to the .symtab and .strtab,
which obviously disappear on strip, leaving any accompanying the CTF
dict containing references to strings (and, soon, symbols) which don't
exist any more because their containing strtab has been vaporized.  The
original Solaris design used .dynsym and .dynstr (well, actually,
.ldynsym, which has more symbols) which do not disappear. So should we.

Thankfully the work we did before serves as guide rails, and adjusting
things to use the .dynstr and .dynsym was fast and easy.  The only
annoyance is that the dynsym is assembled inside elflink.c in a fairly
piecemeal fashion, so that the easiest way to get the symbols out was to
hook in before every call to swap_symbol_out (we also leave in a hook in
front of symbol additions to the .symtab because it seems plausible that
we might want to hook them in future too: for now that hook is unused).
We adjust things so that rather than being offered a whole hash table of
symbols at once, libctf is now given symbols one at a time, with st_name
indexes already resolved and pointing at their final .dynstr offsets:
it's now up to libctf to resolve these to names as needed using the
strtab info we pass it separately.

Some bits might be contentious.  The ctf_new_dynstr callback takes an
elf_internal_sym, and this remains an elf_internal_sym right down
through the generic emulation layers into ldelfgen.  This is no worse
than the elf_sym_strtab we used to pass down, but in the future when we
gain non-ELF CTF symtab support we might want to lower the
elf_internal_sym to some other representation (perhaps a
ctf_link_symbol) in bfd or in ldlang_ctf_new_dynsym.  We rename the
'apply_strsym' hooks to 'acquire_strings' instead, becuse they no longer
have anything to do with symbols.

There are some API changes to pieces of API which are technically public
but actually totally unused by anything and/or unused by anything but ld
so they can change freely: the ctf_link_symbol gains new fields to allow
symbol names to be given as strtab offsets as well as strings, and a
symidx so that the symbol index can be passed in.  ctf_link_shuffle_syms
loses its callback parameter: the idea now is that linkers call the new
ctf_link_add_linker_symbol for every symbol in .dynsym, feed in all the
strtab entries with ctf_link_add_strtab, and then a call to
ctf_link_shuffle_syms will apply both and arrange to use them to reorder
the CTF symtab at CTF serialization time (which is coming in the next
commit).

Inside libctf we have a new preamble flag CTF_F_DYNSTR which is always
set in v3-format CTF dicts from this commit forwards: CTF dicts without
this flag are associated with .strtab like they used to be, so that old
dicts' external strings don't turn to garbage when loaded by new libctf.
Dicts with this flag are associated with .dynstr and .dynsym instead.
(The flag is not the next in sequence because this commit was written
quite late: the missing flags will be filled in by the next commit.)

Tests forthcoming in a later commit in this series.

bfd/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* elflink.c (elf_finalize_dynstr): Call examine_strtab after
	dynstr finalization.
	(elf_link_swap_symbols_out): Don't call it here.  Call
	ctf_new_symbol before swap_symbol_out.
	(elf_link_output_extsym): Call ctf_new_dynsym before
	swap_symbol_out.
	(bfd_elf_final_link): Likewise.
	* elf.c (swap_out_syms): Pass in bfd_link_info.  Call
	ctf_new_symbol before swap_symbol_out.
	(_bfd_elf_compute_section_file_positions): Adjust.

binutils/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* readelf.c (dump_section_as_ctf): Use .dynsym and .dynstr, not
	.symtab and .strtab.

include/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* bfdlink.h (struct elf_sym_strtab): Replace with...
	(struct elf_internal_sym): ... this.
	(struct bfd_link_callbacks) <examine_strtab>: Take only a
	symstrtab argument.
	<ctf_new_symbol>: New.
	<ctf_new_dynsym>: Likewise.
	* ctf-api.h (struct ctf_link_sym) <st_symidx>: New.
	<st_nameidx>: Likewise.
	<st_nameidx_set>: Likewise.
	(ctf_link_iter_symbol_f): Removed.
	(ctf_link_shuffle_syms): Remove most parameters, just takes a
	ctf_dict_t now.
	(ctf_link_add_linker_symbol): New, split from
	ctf_link_shuffle_syms.
	* ctf.h (CTF_F_DYNSTR): New.
	(CTF_F_MAX): Adjust.

ld/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ldelfgen.c (struct ctf_strsym_iter_cb_arg): Rename to...
	(struct ctf_strtab_iter_cb_arg): ... this, changing fields:
	<syms>: Remove.
	<symcount>: Remove.
	<symstrtab>: Rename to...
	<strtab>: ... this.
	(ldelf_ctf_strtab_iter_cb): Adjust.
	(ldelf_ctf_symbols_iter_cb): Remove.
	(ldelf_new_dynsym_for_ctf): New, tell libctf about a single
	symbol.
	(ldelf_examine_strtab_for_ctf): Rename to...
	(ldelf_acquire_strings_for_ctf): ... this, only doing the strtab
	portion and not symbols.
	* ldelfgen.h: Adjust declarations accordingly.
	* ldemul.c (ldemul_examine_strtab_for_ctf): Rename to...
	(ldemul_acquire_strings_for_ctf): ... this.
	(ldemul_new_dynsym_for_ctf): New.
	* ldemul.h: Adjust declarations accordingly.
	* ldlang.c (ldlang_ctf_apply_strsym): Rename to...
	(ldlang_ctf_acquire_strings): ... this.
	(ldlang_ctf_new_dynsym): New.
	(lang_write_ctf): Call ldemul_new_dynsym_for_ctf with NULL to do
	the actual symbol shuffle.
	* ldlang.h (struct elf_strtab_hash): Adjust accordingly.
	* ldmain.c (bfd_link_callbacks): Wire up new/renamed callbacks.

libctf/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-link.c (ctf_link_shuffle_syms): Adjust.
	(ctf_link_add_linker_symbol): New, unimplemented stub.
	* libctf.ver: Add it.
	* ctf-create.c (ctf_serialize): Set CTF_F_DYNSTR on newly-serialized
	dicts.
	* ctf-open-bfd.c (ctf_bfdopen_ctfsect): Check for the flag: open the
	symtab/strtab if not present, dynsym/dynstr otherwise.
	* ctf-archive.c (ctf_arc_bufpreamble): New, get the preamble from
	some arbitrary member of a CTF archive.
	* ctf-impl.h (ctf_arc_bufpreamble): Declare it.
2020-11-20 13:34:07 +00:00
Nick Alcock
ae41200ba8 libctf, include, binutils, gdb: rename CTF-opening functions
The functions that return ctf_dict_t's given a ctf_archive_t and a name
are very clumsily named.  It sounds like they return *archives*, not
dictionaries, and the names are very long and clunky.  Why do we
have a ctf_arc_open_by_name when it opens a dictionary, not an archive,
and when there is no way to open a dictionary in any other way?  The
answer is purely internal: the function is located in ctf-archive.c,
and everything in there was called ctf_arc_*, and there is another
way to open a dict (by offset in the archive), that is internal to
ctf-archive.c and that nothing else can call.

This is clearly bad naming. The internal organization of the source tree
should not dictate public API names!

So rename things (keeping the old, bad names for compatibility), and
adjust all users.  You now open a dict using ctf_dict_open, and
open it giving ELF sections via ctf_dict_open_sections.

binutils/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* objdump.c (dump_ctf): Use ctf_dict_open, not
	ctf_arc_open_by_name.
	* readelf.c (dump_section_as_ctf): Likewise.

gdb/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctfread.c (elfctf_build_psymtabs): Use ctf_dict_open, not
	ctf_arc_open_by_name.

include/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (ctf_arc_open_by_name): Rename to...
	(ctf_dict_open): ... this, keeping compatibility function.
	(ctf_arc_open_by_name_sections): Rename to...
	(ctf_dict_open_sections): ... this, keeping compatibility function.

libctf/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-archive.c (ctf_arc_open_by_offset): Rename to...
	(ctf_dict_open_by_offset): ... this.  Adjust callers.
	(ctf_arc_open_by_name_internal): Rename to...
	(ctf_dict_open_internal): ... this.  Adjust callers.
	(ctf_arc_open_by_name_sections): Rename to...
	(ctf_dict_open_sections): ... this, keeping compatibility function.
	(ctf_arc_open_by_name): Rename to...
	(ctf_dict_open): ... this, keeping compatibility function.
	* libctf.ver: New functions added.
	* ctf-link.c (ctf_link_one_input_archive): Adjusted accordingly.
	(ctf_link_deduplicating_open_inputs): Likewise.
2020-11-20 13:34:05 +00:00
Nick Alcock
139633c307 libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file".  Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago.  So the term "CTF file"
refers to something that is never a file!  This is at best confusing.

The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.

So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead.  Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.

Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).

binutils/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
	(dump_ctf_archive_member): Likewise.
	(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
	* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
	(dump_ctf_archive_member): Likewise.
	(dump_section_as_ctf): Likewise.  Use ctf_dict_close, not
	ctf_file_close.

gdb/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
	(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.

include/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (ctf_file_t): Rename to...
	(ctf_dict_t): ... this.  Keep ctf_file_t around for compatibility.
	(struct ctf_file): Likewise rename to...
	(struct ctf_dict): ... this.
	(ctf_file_close): Rename to...
	(ctf_dict_close): ... this, keeping compatibility function.
	(ctf_parent_file): Rename to...
	(ctf_parent_dict): ... this, keeping compatibility function.
	All callers adjusted.
	* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
	(struct ctf_archive) <ctfa_nfiles>: Rename to...
	<ctfa_ndicts>: ... this.

ld/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ldlang.c (ctf_output): This is a ctf_dict_t now.
	(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
	(ldlang_open_ctf): Adjust comment.
	(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
	* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
	ctf_dict_t.  Change opaque declaration accordingly.
	* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
	* ldemul.h (examine_strtab_for_ctf): Likewise.
	(ldemul_examine_strtab_for_ctf): Likewise.
	* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.

libctf/ChangeLog
2020-11-20  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
	adjusted.
	(ctf_fileops): Rename to...
	(ctf_dictops): ... this.
	(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
	<cd_id_to_dict_t>: ... this.
	(ctf_file_t): Fix outdated comment.
	<ctf_fileops>: Rename to...
	<ctf_dictops>: ... this.
	(struct ctf_archive_internal) <ctfi_file>: Rename to...
	<ctfi_dict>: ... this.
	* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
	Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
	Rename ctf_file_close to ctf_dict_close.  All users adjusted.
	* ctf-create.c: Likewise.  Refer to CTF dicts, not CTF containers.
	(ctf_bundle_t) <ctb_file>: Rename to...
	<ctb_dict): ... this.
	* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
	* ctf-dedup.c: Likewise.  Rename ctf_file_close to
	ctf_dict_close. Refer to CTF dicts, not CTF containers.
	* ctf-dump.c: Likewise.
	* ctf-error.c: Likewise.
	* ctf-hash.c: Likewise.
	* ctf-inlines.h: Likewise.
	* ctf-labels.c: Likewise.
	* ctf-link.c: Likewise.
	* ctf-lookup.c: Likewise.
	* ctf-open-bfd.c: Likewise.
	* ctf-string.c: Likewise.
	* ctf-subr.c: Likewise.
	* ctf-types.c: Likewise.
	* ctf-util.c: Likewise.
	* ctf-open.c: Likewise.
	(ctf_file_close): Rename to...
	(ctf_dict_close): ...this.
	(ctf_file_close): New trivial wrapper around ctf_dict_close, for
	compatibility.
	(ctf_parent_file): Rename to...
	(ctf_parent_dict): ... this.
	(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
	compatibility.
	* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 13:34:04 +00:00
Nick Alcock
926c9e7665 libctf, binutils, include, ld: gettextize and improve error handling
This commit follows on from the earlier commit "libctf, ld, binutils:
add textual error/warning reporting for libctf" and converts every error
in libctf that was reported using ctf_dprintf to use ctf_err_warn
instead, gettextizing them in the process, using N_() where necessary to
avoid doing gettext calls unless an error message is actually generated,
and rephrasing some error messages for ease of translation.

This requires a slight change in the ctf_errwarning_next API: this API
is public but has not been in a release yet, so can still change freely.
The problem is that many errors are emitted at open time (whether
opening of a CTF dict, or opening of a CTF archive): the former of these
throws away its incompletely-initialized ctf_file_t rather than return
it, and the latter has no ctf_file_t at all. So errors and warnings
emitted at open time cannot be stored in the ctf_file_t, and have to go
elsewhere.

We put them in a static local in ctf-subr.c (which is not very
thread-safe: a later commit will improve things here): ctf_err_warn with
a NULL fp adds to this list, and the public interface
ctf_errwarning_next with a NULL fp retrieves from it.

We need a slight exception from the usual iterator rules in this case:
with a NULL fp, there is nowhere to store the ECTF_NEXT_END "error"
which signifies the end of iteration, so we add a new err parameter to
ctf_errwarning_next which is used to report such iteration-related
errors.  (If an fp is provided -- i.e., if not reporting open errors --
this is optional, but even if it's optional it's still an API change.
This is actually useful from a usability POV as well, since
ctf_errwarning_next is usually called when there's been an error, so
overwriting the error code with ECTF_NEXT_END is not very helpful!
So, unusually, ctf_errwarning_next now uses the passed fp for its
error code *only* if no errp pointer is passed in, and leaves it
untouched otherwise.)

ld, objdump and readelf are adapted to call ctf_errwarning_next with a
NULL fp to report open errors where appropriate.

The ctf_err_warn API also has to change, gaining a new error-number
parameter which is used to add the error message corresponding to that
error number into the debug stream when LIBCTF_DEBUG is enabled:
changing this API is easy at this point since we are already touching
all existing calls to gettextize them.  We need this because the debug
stream should contain the errno's message, but the error reported in the
error/warning stream should *not*, because the caller will probably
report it themselves at failure time regardless, and reporting it in
every error message that leads up to it leads to a ridiculous chattering
on failure, which is likely to end up as ridiculous chattering on stderr
(trimmed a bit):

CTF error: `ld/testsuite/ld-ctf/A.c (0): lookup failure for type 3: flags 1: The parent CTF dictionary is unavailable'
CTF error: `ld/testsuite/ld-ctf/A.c (0): struct/union member type hashing error during type hashing for type 80000001, kind 6: The parent CTF dictionary is unavailable'
CTF error: `deduplicating link variable emission failed for ld/testsuite/ld-ctf/A.c: The parent CTF dictionary is unavailable'
ld/.libs/lt-ld-new: warning: CTF linking failed; output will have no CTF section: `The parent CTF dictionary is unavailable'

We only need to be told that the parent CTF dictionary is unavailable
*once*, not over and over again!

errmsgs are still emitted on warning generation, because warnings do not
usually lead to a failure propagated up to the caller and reported
there.

Debug-stream messages are not translated.  If translation is turned on,
there will be a mixture of English and translated messages in the debug
stream, but rather that than burden the translators with debug-only
output.

binutils/ChangeLog
2020-08-27  Nick Alcock  <nick.alcock@oracle.com>

	* objdump.c (dump_ctf_archive_member): Move error-
	reporting...
	(dump_ctf_errs): ... into this separate function.
	(dump_ctf): Call it on open errors.
	* readelf.c (dump_ctf_archive_member): Move error-
	reporting...
	(dump_ctf_errs): ... into this separate function.  Support
	calls with NULL fp. Adjust for new err parameter to
	ctf_errwarning_next.
	(dump_section_as_ctf): Call it on open errors.

include/ChangeLog
2020-08-27  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (ctf_errwarning_next): New err parameter.

ld/ChangeLog
2020-08-27  Nick Alcock  <nick.alcock@oracle.com>

	* ldlang.c (lang_ctf_errs_warnings): Support calls with NULL fp.
	Adjust for new err parameter to ctf_errwarning_next.  Only
	check for assertion failures when fp is non-NULL.
	(ldlang_open_ctf): Call it on open errors.
	* testsuite/ld-ctf/ctf.exp: Always use the C locale to avoid
	breaking the diags tests.

libctf/ChangeLog
2020-08-27  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-subr.c (open_errors): New list.
	(ctf_err_warn): Calls with NULL fp append to open_errors.  Add err
	parameter, and use it to decorate the debug stream with errmsgs.
	(ctf_err_warn_to_open): Splice errors from a CTF dict into the
	open_errors.
	(ctf_errwarning_next): Calls with NULL fp report from open_errors.
	New err param to report iteration errors (including end-of-iteration)
	when fp is NULL.
	(ctf_assert_fail_internal): Adjust ctf_err_warn call for new err
	parameter: gettextize.
	* ctf-impl.h (ctfo_get_vbytes): Add ctf_file_t parameter.
	(LCTF_VBYTES): Adjust.
	(ctf_err_warn_to_open): New.
	(ctf_err_warn): Adjust.
	(ctf_bundle): Used in only one place: move...
	* ctf-create.c: ... here.
	(enumcmp): Use ctf_err_warn, not ctf_dprintf, passing the err number
	down as needed.  Don't emit the errmsg.  Gettextize.
	(membcmp): Likewise.
	(ctf_add_type_internal): Likewise.
	(ctf_write_mem): Likewise.
	(ctf_compress_write): Likewise.  Report errors writing the header or
	body.
	(ctf_write): Likewise.
	* ctf-archive.c (ctf_arc_write_fd): Use ctf_err_warn, not
	ctf_dprintf, and gettextize, as above.
	(ctf_arc_write): Likewise.
	(ctf_arc_bufopen): Likewise.
	(ctf_arc_open_internal): Likewise.
	* ctf-labels.c (ctf_label_iter): Likewise.
	* ctf-open-bfd.c (ctf_bfdclose): Likewise.
	(ctf_bfdopen): Likewise.
	(ctf_bfdopen_ctfsect): Likewise.
	(ctf_fdopen): Likewise.
	* ctf-string.c (ctf_str_write_strtab): Likewise.
	* ctf-types.c (ctf_type_resolve): Likewise.
	* ctf-open.c (get_vbytes_common): Likewise. Pass down the ctf dict.
	(get_vbytes_v1): Pass down the ctf dict.
	(get_vbytes_v2): Likewise.
	(flip_ctf): Likewise.
	(flip_types): Likewise. Use ctf_err_warn, not ctf_dprintf, and
	gettextize, as above.
	(upgrade_types_v1): Adjust calls.
	(init_types): Use ctf_err_warn, not ctf_dprintf, as above.
	(ctf_bufopen_internal): Likewise. Adjust calls. Transplant errors
	emitted into individual dicts into the open errors if this turns
	out to be a failed open in the end.
	* ctf-dump.c (ctf_dump_format_type): Adjust ctf_err_warn for new err
	argument.  Gettextize.  Don't emit the errmsg.
	(ctf_dump_funcs): Likewise.  Collapse err label into its only case.
	(ctf_dump_type): Likewise.
	* ctf-link.c (ctf_create_per_cu): Adjust ctf_err_warn for new err
	argument.  Gettextize.  Don't emit the errmsg.
	(ctf_link_one_type): Likewise.
	(ctf_link_lazy_open): Likewise.
	(ctf_link_one_input_archive): Likewise.
	(ctf_link_deduplicating_count_inputs): Likewise.
	(ctf_link_deduplicating_open_inputs): Likewise.
	(ctf_link_deduplicating_close_inputs): Likewise.
	(ctf_link_deduplicating): Likewise.
	(ctf_link): Likewise.
	(ctf_link_deduplicating_per_cu): Likewise. Add some missed
	ctf_set_errnos to obscure error cases.
	* ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_err_warn for new
	err argument.  Gettextize.  Don't emit the errmsg.
	(ctf_dedup_populate_mappings): Likewise.
	(ctf_dedup_detect_name_ambiguity): Likewise.
	(ctf_dedup_init): Likewise.
	(ctf_dedup_multiple_input_dicts): Likewise.
	(ctf_dedup_conflictify_unshared): Likewise.
	(ctf_dedup): Likewise.
	(ctf_dedup_rwalk_one_output_mapping): Likewise.
	(ctf_dedup_id_to_target): Likewise.
	(ctf_dedup_emit_type): Likewise.
	(ctf_dedup_emit_struct_members): Likewise.
	(ctf_dedup_populate_type_mapping): Likewise.
	(ctf_dedup_populate_type_mappings): Likewise.
	(ctf_dedup_emit): Likewise.
	(ctf_dedup_hash_type): Likewise. Fix a bit of messed-up error
	status setting.
	(ctf_dedup_rwalk_one_output_mapping): Likewise. Don't hide
	unknown-type-kind messages (which signify file corruption).
2020-08-27 13:15:43 +01:00
Nick Alcock
4533ed564d libctf, binutils: fix big-endian libctf archive opening
The recent commit "libctf, binutils: support CTF archives like objdump"
broke opening of CTF archives on big-endian platforms.

This didn't affect anyone much before now because the linker never
emitted CTF archives because it wasn't detecting ambiguous types
properly: now it does, and this bug becomes obvious.

Fix trivial.

libctf/
	* ctf-archive.c (ctf_arc_bufopen): Endian-swap the archive magic
	number if needed.
2020-07-22 18:05:32 +01:00
Nick Alcock
ac2ff76030 libctf, archive: fix bad error message
Get the function name right.

libctf/
	* ctf-archive.c (ctf_arc_bufopen): Fix message.
2020-07-22 18:02:18 +01:00
Nick Alcock
d50c08025d libctf, open: fix opening CTF in binaries with no symtab
This is a perfectly possible case, and half of ctf_bfdopen_ctfsect
handled it fine.  The other half hit a divide by zero or two before we
got that far, and had no code path to load the strtab from anywhere
in the absence of a symtab to point at it in any case.

So, as a fallback, if there is no symtab, try loading ".strtab"
explicitly by name, like we used to before we started looking for the
strtab the symtab used.

Of course, such a strtab is not kept hold of by BFD, so this means we
have to bring back the code to possibly explicitly free the strtab that
we read in.

libctf/
	* ctf-impl.h (struct ctf_archive_internal) <ctfi_free_strsect>
	New.
	* ctf-open-bfd.c (ctf_bfdopen_ctfsect): Explicitly open a strtab
	if the input has no symtab, rather than dividing by
	zero. Arrange to free it later via ctfi_free_ctfsect.
	* ctf-archive.c (ctf_new_archive_internal): Do not
	ctfi_free_strsect by default.
	(ctf_arc_close): Possibly free it here.
2020-07-22 18:02:18 +01:00
Nick Alcock
688d28f621 libctf, next: introduce new class of easier-to-use iterators
The libctf machinery currently only provides one way to iterate over its
data structures: ctf_*_iter functions that take a callback and an arg
and repeatedly call it.

This *works*, but if you are doing a lot of iteration it is really quite
inconvenient: you have to package up your local variables into
structures over and over again and spawn lots of little functions even
if it would be clearer in a single run of code.  Look at ctf-string.c
for an extreme example of how unreadable this can get, with
three-line-long functions proliferating wildly.

The deduplicator takes this to the Nth level. It iterates over a whole
bunch of things: if we'd had to use _iter-class iterators for all of
them there would be twenty additional functions in the deduplicator
alone, for no other reason than that the iterator API requires it.

Let's do something better. strtok_r gives us half the design: generators
in a number of other languages give us the other half.

The *_next API allows you to iterate over CTF-like entities in a single
function using a normal while loop. e.g. here we are iterating over all
the types in a dict:

ctf_next_t *i = NULL;
int *hidden;
ctf_id_t id;

while ((id = ctf_type_next (fp, &i, &hidden, 1)) != CTF_ERR)
  {
    /* do something with 'hidden' and 'id' */
  }
if (ctf_errno (fp) != ECTF_NEXT_END)
    /* iteration error */

Here we are walking through the members of a struct with CTF ID
'struct_type':

ctf_next_t *i = NULL;
ssize_t offset;
const char *name;
ctf_id_t membtype;

while ((offset = ctf_member_next (fp, struct_type, &i, &name,
                                  &membtype)) >= 0
  {
    /* do something with offset, name, and membtype */
  }
if (ctf_errno (fp) != ECTF_NEXT_END)
    /* iteration error */

Like every other while loop, this means you have access to all the local
variables outside the loop while inside it, with no need to tiresomely
package things up in structures, move the body of the loop into a
separate function, etc, as you would with an iterator taking a callback.

ctf_*_next allocates 'i' for you on first entry (when it must be NULL),
and frees and NULLs it and returns a _next-dependent flag value when the
iteration is over: the fp errno is set to ECTF_NEXT_END when the
iteartion ends normally.  If you want to exit early, call
ctf_next_destroy on the iterator.  You can copy iterators using
ctf_next_copy, which copies their current iteration position so you can
remember loop positions and go back to them later (or ctf_next_destroy
them if you don't need them after all).

Each _next function returns an always-likely-to-be-useful property of
the thing being iterated over, and takes pointers to parameters for the
others: with very few exceptions all those parameters can be NULLs if
you're not interested in them, so e.g. you can iterate over only the
offsets of members of a structure this way:

while ((offset = ctf_member_next (fp, struct_id, &i, NULL, NULL)) >= 0)

If you pass an iterator in use by one iteration function to another one,
you get the new error ECTF_NEXT_WRONGFUN back; if you try to change
ctf_file_t in mid-iteration, you get ECTF_NEXT_WRONGFP back.

Internally the ctf_next_t remembers the iteration function in use,
various sizes and increments useful for almost all iterations, then
uses unions to overlap the actual entities being iterated over to keep
ctf_next_t size down.

Iterators available in the public API so far (all tested in actual use
in the deduplicator):

/* Iterate over the members of a STRUCT or UNION, returning each member's
   offset and optionally name and member type in turn.  On end-of-iteration,
   returns -1.  */
ssize_t
ctf_member_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it,
                 const char **name, ctf_id_t *membtype);

/* Iterate over the members of an enum TYPE, returning each enumerand's
   NAME or NULL at end of iteration or error, and optionally passing
   back the enumerand's integer VALue.  */
const char *
ctf_enum_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it,
              int *val);

/* Iterate over every type in the given CTF container (not including
   parents), optionally including non-user-visible types, returning
   each type ID and optionally the hidden flag in turn. Returns CTF_ERR
   on end of iteration or error.  */
ctf_id_t
ctf_type_next (ctf_file_t *fp, ctf_next_t **it, int *flag,
               int want_hidden);

/* Iterate over every variable in the given CTF container, in arbitrary
   order, returning the name and type of each variable in turn.  The
   NAME argument is not optional.  Returns CTF_ERR on end of iteration
   or error.  */
ctf_id_t
ctf_variable_next (ctf_file_t *fp, ctf_next_t **it, const char **name);

/* Iterate over all CTF files in an archive, returning each dict in turn as a
   ctf_file_t, and NULL on error or end of iteration.  It is the caller's
   responsibility to close it.  Parent dicts may be skipped.  Regardless of
   whether they are skipped or not, the caller must ctf_import the parent if
   need be.  */
ctf_file_t *
ctf_archive_next (const ctf_archive_t *wrapper, ctf_next_t **it,
                  const char **name, int skip_parent, int *errp);

ctf_label_next is prototyped but not implemented yet.

include/
	* ctf-api.h (ECTF_NEXT_END): New error.
	(ECTF_NEXT_WRONGFUN): Likewise.
	(ECTF_NEXT_WRONGFP): Likewise.
	(ECTF_NERR): Adjust.
	(ctf_next_t): New.
	(ctf_next_create): New prototype.
	(ctf_next_destroy): Likewise.
	(ctf_next_copy): Likewise.
	(ctf_member_next): Likewise.
	(ctf_enum_next): Likewise.
	(ctf_type_next): Likewise.
	(ctf_label_next): Likewise.
	(ctf_variable_next): Likewise.

libctf/
	* ctf-impl.h (ctf_next): New.
	(ctf_get_dict): New prototype.
	* ctf-lookup.c (ctf_get_dict): New, split out of...
	(ctf_lookup_by_id): ... here.
	* ctf-util.c (ctf_next_create): New.
	(ctf_next_destroy): New.
	(ctf_next_copy): New.
	* ctf-types.c (includes): Add <assert.h>.
	(ctf_member_next): New.
	(ctf_enum_next): New.
	(ctf_type_iter): Document the lack of iteration over parent
	types.
	(ctf_type_next): New.
	(ctf_variable_next): New.
	* ctf-archive.c (ctf_archive_next): New.
	* libctf.ver: Add new public functions.
2020-07-22 17:57:50 +01:00
Nick Alcock
9c23dfa5aa libctf: add ctf_archive_count
Another count that was otherwise unavailable without doing expensive
operations.

include/
	* ctf-api.h (ctf_archive_count): New.

libctf/
	* ctf-archive.c (ctf_archive_count): New.
	* libctf.ver: New public function.
2020-07-22 17:57:39 +01:00
Nick Alcock
601e455b75 libctf, archive: stop ctf_arc_bufopen triggering crazy unmaps
The archive machinery mmap()s its archives when possible: so it arranges
to do appropriately-sized unmaps by recording the unmap length in the
ctfa_magic value and unmapping that.

This brilliant (horrible) trick works less well when ctf_arc_bufopen is
called with an existing buffer (which might be a readonly mapping).
ctf_arc_bufopen always returns a ctf_archive_t wrapper, so record in
there the necessity to not unmap anything when a bufopen'ed archive is
closed again.

libctf/
	* ctf-impl.h (struct ctf_archive_internal)
	<ctfi_unmap_on_close>: New.
	(ctf_new_archive_internal): Adjust.
	* ctf-archive.c (ctf_new_archive_internal): Likewise.
	Initialize ctfi_unmap_on_close.  Adjust error path.
	(ctf_arc_bufopen): Adjust ctf_new_archive_internal call
	(unmap_on_close is 0).
	(ctf_arc_close): Only unmap if ctfi_unmap_on_close.
	* ctf-open-bfd.c (ctf_fdopen): Adjust.
2020-07-22 17:57:33 +01:00
Nick Clifton
df16e041de Fix problems in CTF handling code exposed by the Coverity static analysis tool.
readelf	* readelf.c (parse_args): Silence potential warnings about a
	memory resource leak when allocating space for ctf option values.
	(dump_section_as_ctf): Fix typo checking dump_ctf_strtab_name
	variable.

libctf	* ctf-archive.c (ctf_arc_write): Avoid calling close twice on the
	same file descriptor.
2020-07-22 16:07:48 +01:00
Nick Alcock
2e428e7440 libctf: avoid nonportable __thread in CTF archive handling
This keeps archive searching threadsafe using the new bsearch_r that was
just added to libiberty.

	PR25120
libctf/
	* ctf-archive.c (search_nametbl): No longer global: declare...
	(ctf_arc_open_by_name_internal): ... here. Use bsearch_r.
	(search_modent_by_name): Take and use ARG for the nametbl.
2020-06-26 15:56:39 +01:00
Nick Alcock
2f6ecaed66 libctf, binutils: support CTF archives like objdump
objdump and readelf have one major CTF-related behavioural difference:
objdump can read .ctf sections that contain CTF archives and extract and
dump their members, while readelf cannot.  Since the linker often emits
CTF archives, this means that readelf intermittently and (from the
user's perspective) randomly fails to read CTF in files that ld emits,
with a confusing error message wrongly claiming that the CTF content is
corrupt.  This is purely because the archive-opening code in libctf was
needlessly tangled up with the BFD code, so readelf couldn't use it.

Here, we disentangle it, moving ctf_new_archive_internal from
ctf-open-bfd.c into ctf-archive.c and merging it with the helper
function in ctf-archive.c it was already using.  We add a new public API
function ctf_arc_bufopen, that looks very like ctf_bufopen but returns
an archive given suitable section data rather than a ctf_file_t: the
archive is a ctf_archive_t, so it can be called on raw CTF dictionaries
(with no archive present) and will return a single-member synthetic
"archive".

There is a tiny lifetime tweak here: before now, the archive code could
assume that the symbol section in the ctf_archive_internal wrapper
structure was always owned by BFD if it was present and should always be
freed: now, the caller can pass one in via ctf_arc_bufopen, wihch has
the usual lifetime rules for such sections (caller frees): so we add an
extra field to track whether this is an internal call from ctf-open-bfd,
in which case we still free the symbol section.

include/
	* ctf-api.h (ctf_arc_bufopen): New.
libctf/
	* ctf-impl.h (ctf_new_archive_internal): Declare.
	(ctf_arc_bufopen): Remove.
	(ctf_archive_internal) <ctfi_free_symsect>: New.
	* ctf-archive.c (ctf_arc_close): Use it.
	(ctf_arc_bufopen): Fuse into...
	(ctf_new_archive_internal): ... this, moved across from...
	* ctf-open-bfd.c: ... here.
	(ctf_bfdopen_ctfsect): Use ctf_arc_bufopen.
	* libctf.ver: Add it.
binutils/
	* readelf.c (dump_section_as_ctf): Support .ctf archives using
	ctf_arc_bufopen.  Automatically load the .ctf member of such
	archives as the parent of all other members, unless specifically
	overridden via --ctf-parent.  Split out dumping code into...
	(dump_ctf_archive_member): ... here, as in objdump, and call
	it once per archive member.
	(dump_ctf_indent_lines): Code style fix.
2020-06-26 15:56:39 +01:00
Alan Modra
b3adc24a07 Update year range in copyright notice of binutils files 2020-01-01 18:42:54 +10:30
Nick Alcock
676c3ecbad libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen.  Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with.  The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too).  The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type.  Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.

Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.

This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards.  These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*.  So all of a sudden we are changing types that already exist in
the static portion.  ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it.  You get an
assertion failure after that, if you're lucky, or a coredump.

So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear".  The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones.  The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)

This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types).  You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.

There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types.  This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.

You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.

Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard.  It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.

With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.

v5: fix tabdamage.

libctf/
	* ctf-impl.h (ctf_names_t): New.
	(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
	(ctf_file_t) <ctf_structs>: Likewise.
	<ctf_unions>: Likewise.
	<ctf_enums>: Likewise.
	<ctf_names>: Likewise.
	<ctf_lookups>: Improve comment.
	<ctf_ptrtab_len>: New.
	<ctf_prov_strtab>: New.
	<ctf_str_prov_offset>: New.
	<ctf_dtbyname>: Remove, redundant to the names hashes.
	<ctf_dtnextid>: Remove, redundant to ctf_typemax.
	(ctf_dtdef_t) <dtd_name>: Remove.
	<dtd_data>: Note that the ctt_name is now populated.
	(ctf_str_atom_t) <csa_offset>: This is now the strtab
	offset for internal strings too.
	<csa_external_offset>: New, the external strtab offset.
	(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
	(ctf_name_table): New declaration.
	(ctf_lookup_by_rawname): Likewise.
	(ctf_lookup_by_rawhash): Likewise.
	(ctf_set_ctl_hashes): Likewise.
	(ctf_serialize): Likewise.
	(ctf_dtd_insert): Adjust.
	(ctf_simple_open_internal): Likewise.
	(ctf_bufopen_internal): Likewise.
	(ctf_list_empty_p): Likewise.
	(ctf_str_remove_ref): Likewise.
	(ctf_str_add): Returns uint32_t now.
	(ctf_str_add_ref): Likewise.
	(ctf_str_add_external): Now returns a boolean (int).
	* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
	for strings in the appropriate range.
	(ctf_str_create_atoms): Create the ctf_prov_strtab.  Detect OOM
	when adding the null string to the new strtab.
	(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
	(ctf_str_add_ref_internal): Add make_provisional argument.  If
	make_provisional, populate the offset and fill in the
	ctf_prov_strtab accordingly.
	(ctf_str_add): Return the offset, not the string.
	(ctf_str_add_ref): Likewise.
	(ctf_str_add_external): Return a success integer.
	(ctf_str_remove_ref): New, remove a single ref.
	(ctf_str_count_strtab): Do not count the initial null string's
	length or the existence or length of any unreferenced internal
	atoms.
	(ctf_str_populate_sorttab): Skip atoms with no refs.
	(ctf_str_write_strtab): Populate the nullstr earlier.  Add one
	to the cts_len for the null string, since it is no longer done
	in ctf_str_count_strtab.  Adjust for csa_external_offset rename.
	Populate the csa_offset for both internal and external cases.
	Flush the ctf_prov_strtab afterwards, and reset the
	ctf_str_prov_offset.
	* ctf-create.c (ctf_grow_ptrtab): New.
	(ctf_create): Call it.	Initialize new fields rather than old
	ones.  Tell ctf_bufopen_internal that this is a writable dictionary.
	Set the ctl hashes and data model.
	(ctf_update): Rename to...
	(ctf_serialize): ... this.  Leave a compatibility function behind.
	Tell ctf_simple_open_internal that this is a writable dictionary.
	Pass the new fields along from the old dictionary.  Drop
	ctf_dtnextid and ctf_dtbyname.	Use ctf_strraw, not dtd_name.
	Do not zero out the DTD's ctt_name.
	(ctf_prefixed_name): Rename to...
	(ctf_name_table): ... this.  No longer return a prefixed name: return
	the applicable name table instead.
	(ctf_dtd_insert): Use it, and use the right name table.	 Pass in the
	kind we're adding.  Migrate away from dtd_name.
	(ctf_dtd_delete): Adjust similarly.  Remove the ref to the
	deleted ctt_name.
	(ctf_dtd_lookup_type_by_name): Remove.
	(ctf_dynamic_type): Always return NULL on read-only dictionaries.
	No longer check ctf_dtnextid: check ctf_typemax instead.
	(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
	(ctf_rollback): Likewise.  No longer fail with ECTF_OVERROLLBACK. Use
	ctf_name_table and the right name table, and migrate away from
	dtd_name as in ctf_dtd_delete.
	(ctf_add_generic): Pass in the kind explicitly and pass it to
	ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid.  Migrate away
	from dtd_name to using ctf_str_add_ref to populate the ctt_name.
	Grow the ptrtab if needed.
	(ctf_add_encoded): Pass in the kind.
	(ctf_add_slice): Likewise.
	(ctf_add_array): Likewise.
	(ctf_add_function): Likewise.
	(ctf_add_typedef): Likewise.
	(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
	ctt_name rather than dtd_name.
	(ctf_add_struct_sized): Pass in the kind.  Use
	ctf_lookup_by_rawname, not ctf_hash_lookup_type /
	ctf_dtd_lookup_type_by_name.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_enum_encoded): Likewise.
	(ctf_add_forward): Likewise.
	(ctf_add_type): Likewise.
	(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
	being initialized until after the call.
	(ctf_write_mem): Likewise.
	(ctf_write): Likewise.
	* ctf-archive.c (arc_write_one_ctf): Likewise.
	* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
	ctf_hash_lookup_type.
	(ctf_lookup_by_id): No longer check the readonly types if the
	dictionary is writable.
	* ctf-open.c (init_types): Assert that this dictionary is not
	writable.  Adjust to use the new name hashes, ctf_name_table,
	and ctf_ptrtab_len.  GNU style fix for the final ptrtab scan.
	(ctf_bufopen_internal): New 'writable' parameter.  Flip on LCTF_RDWR
	if set.	 Drop out early when dictionary is writable.  Split the
	ctf_lookups initialization into...
	(ctf_set_cth_hashes): ... this new function.
	(ctf_simple_open_internal): Adjust.  New 'writable' parameter.
	(ctf_simple_open): Adjust accordingly.
	(ctf_bufopen): Likewise.
	(ctf_file_close): Destroy the appropriate name hashes.	No longer
	destroy ctf_dtbyname, which is gone.
	(ctf_getdatasect): Remove spurious "extern".
	* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
	specified name table, given a kind.
	(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
	(ctf_member_iter): Add support for iterating over the
	dynamic type list.
	(ctf_enum_iter): Likewise.
	(ctf_variable_iter): Likewise.
	(ctf_type_rvisit): Likewise.
	(ctf_member_info): Add support for types in the dynamic type list.
	(ctf_enum_name): Likewise.
	(ctf_enum_value): Likewise.
	(ctf_func_type_info): Likewise.
	(ctf_func_type_args): Likewise.
	* ctf-link.c (ctf_accumulate_archive_names): No longer call
	ctf_update.
	(ctf_link_write): Likewise.
	(ctf_link_intern_extern_string): Adjust for new
	ctf_str_add_external return value.
	(ctf_link_add_strtab): Likewise.
	* ctf-util.c (ctf_list_empty_p): New.
2019-10-03 17:04:56 +01:00
Nick Alcock
f046147d59 libctf: actually close bfds we have opened
When we do a ctf_fdopen, we open things via bfd_fdopenr and set up a
hook to close the bfd again... but then we never actually call that hook
from anywhere, so we eventually leak every bfd we open.

Fix this by calling the hook (if set) in ctf_arc_close.

New in v3.

libctf/
	* ctf-archive.c (ctf_arc_close): Call ctfi_bfd_close if set.
	* ctf-open-bfd.c (ctf_bfdclose): Fix comment.
2019-10-03 17:04:55 +01:00
Nick Alcock
5537f9b9a3 libctf: write CTF files to memory, and CTF archives to fds
Before now, we've been able to write CTF files to gzFile descriptors or
fds, and CTF archives to named files only.

Make this a bit less irregular by allowing CTF archives to be written
to fds with the new function ctf_arc_write_fd: also allow CTF
files to be written to a new memory buffer via ctf_write_mem.

(It would be nice to complete things by adding a new function to write
CTF archives to memory, but this is too difficult to do given the short
time the linker is expected to be writing them out: we will transition
to a better format in format v4, though we will always support reading
CTF archives that are stored in .ctf sections.)

include/
	* ctf-api.h (ctf_arc_write_fd): New.
	(ctf_write_mem): Likewise.
	(ctf_gzwrite): Spacing fix.

libctf/
	* ctf-archive.c (ctf_arc_write): Split off, and reimplement in terms
	of...
	(ctf_arc_write_fd): ... this new function.
	* ctf-create.c (ctf_write_mem): New.
2019-10-03 17:04:55 +01:00
Nick Alcock
6d5944fca6 libctf, bfd: fix ctf_bfdopen_ctfsect opening symbol and string sections
The code in ctf_bfdopen_ctfsect (which is the ultimate place where you
end up if you use ctf_open to open a CTF file and pull in the ELF string
and symbol tables) was written before it was possible to actually test
it, since the linker was not written.  Now it is, it turns out that the
previous code was completely nonfunctional: it assumed that you could
load the symbol table via bfd_section_from_elf_index (...,elf_onesymtab())
and the string table via bfd_section_from_elf_index on the sh_link.

Unfortunately BFD loads neither of these sections in the conventional
fashion it uses for most others: the symbol table is immediately
converted into internal form (which is useless for our purposes, since
we also have to work in the absence of BFD for readelf, etc) and the
string table is loaded specially via bfd_elf_get_str_section which is
private to bfd/elf.c.

So make this function public, export it in elf-bfd.h, and use it from
libctf, which does something similar to what bfd_elf_sym_name and
bfd_elf_string_from_elf_section do.  Similarly, load the symbol table
manually using bfd_elf_get_elf_syms and throw away the internal form
it generates for us (we never use it).

BFD allocates the strtab for us via bfd_alloc, so we can leave BFD to
deallocate it: we allocate the symbol table ourselves before calling
bfd_elf_get_elf_syms, so we still have to free it.

Also change the rules around what you are allowed to provide: It is
useful to provide a string section but no symbol table, because CTF
sections can legitimately have no function info or data object sections
while relying on the ELF strtab for some of their strings.  So allow
that combination.

v4: adjust to upstream changes.  ctf_bfdopen_ctfsect's first parameter
    is potentially unused again (if BFD is not in use for this link
    due to not supporting an ELF target).
v5: fix tabdamage.

bfd/
	* elf-bfd.h (bfd_elf_get_str_section): Add.
	* elf.c (bfd_elf_get_str_section): No longer static.

libctf/
	* ctf-open-bfd.c: Add <assert.h>.
	(ctf_bfdopen_ctfsect): Open string and symbol tables using
	techniques borrowed from bfd_elf_sym_name.
	(ctf_new_archive_internal): Improve comment.
	* ctf-archive.c (ctf_arc_close): Do not free the ctfi_strsect.
	* ctf-open.c (ctf_bufopen): Allow opening with a string section but
	no symbol section, but not vice versa.
2019-10-03 17:04:55 +01:00
Nick Alcock
f5e73be11b libctf: mark various args as unused in the !HAVE_MMAP case
Tested on x86_64-pc-linux-gnu, x86_64-unknown-freebsd12.0,
sparc-sun-solaris2.11, i686-pc-cygwin, i686-w64-mingw32.

libctf/
	* ctf-archive.c (arc_mmap_header): Mark fd as potentially unused.
	* ctf-subr.c (ctf_data_protect): Mark both args as potentially unused.
2019-06-07 13:46:38 +01:00