Commit Graph

90 Commits

Author SHA1 Message Date
Nick Alcock
a9f8ddc5ae libctf: archive, open: when opening, always set errp to something
ctf_arc_import_parent, called by the cached-opening machinery used by
ctf_archive_next and archive-wide lookup functions like
ctf_arc_lookup_symbol, has an err-pointer parameter like all other opening
functions.  Unfortunately it unconditionally initializes it whenever
provided, even if there was no error, which can lead to its being
initialized to an uninitialized value.  This is not technically an
API-contract violation, since we don't define what happens to the error
value except when an error happens, but it is still unpleasant.

Initialize it only when there is an actual error, so we never initialize it
to an uninitialized value.

While we're at it, improve all the opening pathways: on success, set errp to
0, rather than leaving it what it was, reducing the likelihood of
uninitialized error param returns in callers too.  (This is inconsistent
with the treatment of ctf_errno(), but the err value being a parameter
passed in from outside makes the divergence acceptable: in open functions,
you're never going to be overwriting some old error value someone might want
to keep around across multiple calls, some of which are successful and some
of which are not.)

Soup up existing tests to verify all this.

Thanks to Bruce McCulloch for the original patch, and Stephen Brennan for
the report.

libctf/
	PR libctf/32903
	* ctf-archive.c (ctf_arc_open_internal): Zero errp on success.
	(ctf_dict_open_sections): Zero errp at the start.
	(ctf_arc_import_parent): Intialize err.
	* ctf-open.c (ctf_bufopen): Zero errp at the start.
	* testsuite/libctf-lookup/add-to-opened.c: Make sure one-element
	archive opens update errp.
	* testsuite/libctf-writable/ctf-compressed.c: Make sure real archive
	opens update errp.
2025-05-20 14:34:55 +01:00
Nick Alcock
176afc3c8b libctf: open: fix closing of children with imported parents
Closing a parent dict for the last time erases all its types and strings,
which makes type and string lookups in any surviving children impossible
from then on.  Since children hold a reference to their parent, this can
only happen in ctf_dict_close of the last child, after the parent has
been closed by the caller as well.  Since DTD deletion now involves
doing type and string lookups in order to clean out the name tables,
close the parent only after the child DTDs have been deleted.
2025-04-25 18:09:02 +01:00
Nick Alcock
908a7e7167 libctf: open, types: ctf_import for BTF
ctf_import needs a bunch of fixes to work with pure BTF dicts -- and, for
that matter, importing newly-created parent dicts that have never been
written out, which may have a bunch of nonprovisional types (if types were
added to it before any imports were done) or may not (if at least one
ctf_import into it was done before any types were added).

So we adjust things so that the values that are checked against are the
nonprovisional-types values: the header revisions actually changed the name
of cth_parent_typemax to cth_parent_ntypes to make this clearer, so catch up
with that.  In the parent, we have to use ctf_idmax, not ctf_typemax.

One thing we must prohibit is that you cannot add a bunch of types to a
child and then import a parent into it: the type IDs will all be wrong
and the string offsets more so.  This was partly prohibited: prohibit it
entirely (excepting only that the not-actually-written-out void type
we might add to new BTF dicts does not influence this check).

Since BTF children don't have a cth_parent_ntypes or a cth_parent_strlen, we
cannot check this stuff, but just set them and hope.
2025-04-25 18:07:44 +01:00
Nick Alcock
f782340ba5 libctf, serialize: preparatory steps
The new serializer is quite a lot more customizable than the old, because it
can write out BTF as well as CTF: you can ask to write out BTF or fail,
write out CTF if required to avoid information loss, otherwise BTF, or
always write out CTF.

Callers often need to find out whether a dict could be written out as BTF
before deciding how to write it out (because a dict can never be written out
as BTF if it is compressed, a caller might well want to ask if there is
anything else that prevents BTF writeout -- say, slices, conflicting types,
or CTF_K_BIG -- before deciding whether to compress it).  GNU ld will do
this whenever it is passed only BTF sections on the input.

Figuring out whether a dict can be written out as BTF is quite expensive: we
have to traverse all the types and check them, including every member of
every struct.  So we'd rather do that work only once.  This means making a
lot of state once private to ctf_preserialize public enough that another
function can initialize it; and since the whole API is available after
calling this function and before serializing, we should probably arrange
that if we do things we know will invalidate the results of all this
checking, we are forced to do it again.

This commit does that, moving all the existing serialization state into a
new ctf_serialize_t and adding to it.  Several functions grow force_ctf
arguments that allow the caller to force CTF emission even if the type
section looks BTFish: the writeout code and archive creation use this to
force CTF emission if we are compressing, and archive creation uses it
to force CTF emission if a CTF multi-member archive is in use, because
BTF doesn't support archives at all so there's no point maintaining
BTF compatibility in that case.  The ctf_write* functions gain support for
writing out BTF headers as well as CTF, depending on whether what was
ultimately written out was actually BTF or not.

Even more than most commits in this series, there is no way this is
going to compile right now: we're in the middle of a major transition,
completed in the next few commits.
2025-04-25 18:07:44 +01:00
Nick Alcock
3c5eb5b20a libctf: lookup, open: chase header field changes
Nothing exciting here, just header fields slightly changing name
and a couple of new comments and indentation fixes.
2025-04-25 18:07:43 +01:00
Nick Alcock
f7f72bcca6 libctf, open: new API for getting the size of CTF/BTF file sections
I wrote this for BTF type size querying programs, but it might be
of more general use and it's impossible to get this info in any
other way, so we might want to keep it.

New API:
+size_t ctf_sect_size (ctf_dict_t *, ctf_sect_names_t sect);
2025-04-25 18:07:43 +01:00
Nick Alcock
05a2970ad1 libctf: create, lookup: delete DVDs; ctf_lookup_by_kind
Variable handling in BTF and CTFv4 works quite differently from in CTFv3.
Rather than a separate section containing sorted, bsearchable variables,
they are simply named entities like types, stored in CTF_K_VARs.

As a first stage towards migrating to this, delete most references to
the ctf_varent_t and ctf_dvdef_t, including the DVD lookup code, all
the linking code, and quite a lot of the serialization code.

Note: CTF_LINK_OMIT_VARIABLES_SECTION, and the whole "delete variables that
already exist in the symtypetabs section" stuff, has yet to be
reimplemented.  We can implement CTF_LINK_OMIT_VARIABLES_SECTION by simply
excising all CTF_K_VARs at deduplication time if requested.  (Note:
symtypetabs should still point directly at the type, not at the CTF_K_VAR.)

(Symtypetabs in general need a bit more thought -- perhaps we can now store
them in a separate .ctf.symtypetab section with its own little four-entry
header for the symtypetabs and their indexes, making .ctf even more like
.BTF; the only difference would then be that .ctf could include prefix
types, CTF_K_FLOAT, and external string refs.  For later discussion.)

We also add ctf_lookup_by_kind() at this stage (because it is hopelessly
diff-entangled with ctf_lookup_variable): this looks up a type of a
particular kind, without needing a per-kind lookup function for it,
nor needing to hack around adding string prefixes (so you can do
ctf_lookup_by_kind (fp, CTF_K_STRUCT, "foo") rather than having to
do ctf_lookup_by_name (fp, "struct foo"): often this is more convenient, and
anything that reduces string buffer manipulation in C is good.)
2025-04-25 18:07:42 +01:00
Nick Alcock
ad13b7d44f libctf: CTFv4: type opening
The majority of this commit rejigs the core type table opening
code for CTFv4: there are a few ancillary bits it drags in,
indicated below.

The internal definition of a child dict (that may not have type or string
lookups performed in it until ctf_open time) used to be 'has a
cth_parent_name', but since BTF doesn't have one of those at all, we add
an additional check: a dict the first byte of whose strtab is not 0 must
be a child.  (If *either* is true, this is a child dict, which allows for
the possibility of CTF dicts with non-deduplicated strtabs -- thus with
leading \0's -- to exist in future.)

The initial sweep through the type table in init_static_types (to size
the name-table lookup hashes) also now checks for various types which
indicate that this must be a CTF dict, in addition to being adjusted
to cater for new CTFv4 representations of things like forwards.  (At
this early stage, we cannot rely on the functions in ctf-type.c to
abstract over this for us.)

We make some new hashtables for new namespace-like things: datasecs
and type and decl tags.

The main name-population loop in init_static_types_names_internal
takes prefixes into account, looking for the name on the suffix type
(where the name is always found).  LSTRUCT handling is removed (they
no longer exist); ENUM64s, enum forwards, VARs, datasecs, and type
and decl tags get their names suitably populated.  Some buggy code
which tried to populate the name tables for cvr-quals (which are
nameless) was dropped.

We add an extra pass which traverses all datasecs and keeps track of which
datasec each var is instantiated in (if any) in a new ctf_var_datasecs hash
table.  (This uses a number of type-querying functions which don't yet
exist: they'll be added in the upcoming commits.)

We handle the type 0 == void case by pointing the first element of
ctf_txlate at a type read in named "void" (making type 0 an alias to it),
or, if one doesn't exist, creating a new one (outside the type table and dtd
arrays), and pointing type 0 at that.  Since it is numbered 0 and not in the
type table or dtd arrays, it will never be written out at serialization
time, but since it is *present*, libctf consumers who expect the void type
to have an integral definition rather than being a magic number will get
what they expect.
2025-04-25 18:07:42 +01:00
Nick Alcock
f7d05ab342 libctf: CTFv4: core opening (other than the type table)
This commit modifies the core opening code to handle opening CTFv4 and BTF.
Much of the back-compatibility side is left for later and is currently
untested, as is the type table side of things.

We keep the v3 header (if any) stashed away in ctf_dict_t.ctf_v3_header, for
the sake of the CTF dumper; we "upgrade" the BTF header to CTF (so that the
rest of the code can ignore the distinction, and so that you can do CTFish
things like adding symtypetab entries even to things opened as BTF), but
keep note of the fact that it was opened as BTF in ctf_dict_t.ctf_opened_btf,
so that things like ctf_import can allow for the absence of the various
parent-length fields.

A couple of ctf_dict_t fields are renamed for consistency with the headers'
names for them (ctf_parname becomes ctf_parent_name; ctf_dynparname becomes
ctf_dyn_parent_name; ctf_cuname becomes ctf_cu_name).  Not all users are yet
adjusted.
2025-04-25 18:07:42 +01:00
Nick Alcock
99e9ab4828 libctf: adjust foreign-endian byteswapping for v4
Split into a separate commit because it's not yet really tested.

Callers not yet adjusted.
2025-04-25 18:07:41 +01:00
Nick Alcock
a80b903b45 libctf: simplify ctf_txlate
Before now, this critical internal structure was an array mapping from a
type ID to the type index of the type with that ID.  This was critical for
the old world in which ctf_update() reserialized the entire dict, so things
moved around in memory all the time: but these days, a ctf_type_t * never
moves after creation, so we can just make ctf_txlate an array of ctf_type_t *
and be done with it.

This lets us point type indexes anywhere in memory, not just to entries
in the ctf_buf, which means we can have synthetic ones for various purposes.
And we will.
2025-04-25 18:07:41 +01:00
Nick Alcock
6a4a485c7b libctf: adapt core dictops for v4 and prefix types
The heart of libctf's reading code is the ctf_dictops_t and the functions it
provides for reading various things no matter what the CTF version in use:
these are called via LCTF_*() macros that translate into calls into the
dictops.

The introduction of prefix types in v4 requires changes here: in particular,
we want the ability to get the type kind of whatever ctf_type_t we are
looking at (the 'unprefixed' kind), as well as the ability to get the type
kind taking prefixes into account: and more generally we want the ability
to both look at a given prefix and look at the type as a whole.  So several
ctf_dictops_t entries are added for this (ctfo_get_prefixed_kind,
ctfo_get_prefixed_vlen).

This means API changes (no callers yet adjusted, it'll happen as we go),
because the existing macros were mostly called with e.g. a ctt_info value
and returned a type kind, while now we need to be called with the actual
ctf_type_t itself, so we can possibly walk beyond it to find the real type
record.  ctfo_get_vbytes needs adjusting for this.

We also add names to most of the ctf_type_t parameters, because suddenly we
can have up to three of them: one relating to the first entry in the type
record (which may be a prefix, usually called 'prefix'), one relating to the
true type record (which may be a suffix, so usually called 'suffix'), and
one possibly relating to some intermediate record if we have multiple
prefixes (usually called 'tp').

There is one horrible special case in here: the vlen of the new
CTF_K_FUNC_LINKAGE kind (equivalent to BTF_KIND_FUNC) is always zero: it
reuses the vlen field to encode the linkage (!).  BTF is rife with ugly
hacks like this.
2025-04-25 18:07:41 +01:00
Nick Alcock
3ae061cfb0 libctf: split out compatibility code
The compatibility-opening code is quite voluminous, and is stuck right in
the middle of ctf-open.c, rather interfering with maintenance.  Split it
out into a new ctf-open-compat.c.  (Since it is not yet upgraded to support
v4, the new file is not added to the build system yet: indeed, even the
calls to it haven't been diked out at this stage.)
2025-04-25 18:07:41 +01:00
Nick Alcock
13ce9e17b7 libctf: don't include cv-quals or pointers in the name table
Even if these types have a name recorded against them, we should
ignore it.  They don't have names, full stop.

libctf/ChangeLog:
	* ctf-open.c (init_static_types): Drop nameless types when sizing
	the name table.
	(init_static_types_names_internal): Never pass in their name.
2025-03-16 15:25:28 +00:00
Nick Alcock
b5d3790c66 libctf: consecutive ctf_id_t assignment
This change modifies type ID assignment in CTF so that it works like BTF:
rather than flipping the high bit on for types in child dicts, types ascend
directly from IDs in the parent to IDs in the child, without interruption
(so type 0x4 in the parent is immediately followed by 0x5 in all children).

Doing this while retaining useful semantics for modification of parents is
challenging.  By definition, child type IDs are not known until the parent
is written out, but we don't want to find ourselves constrained to adding
types to the parent in one go, followed by all child types: that would make
the deduplicator a nightmare and would frankly make the entire ctf_add*()
interface next to useless: all existing clients that add types at all
add types to both parents and children without regard for ordering, and
breaking that would probably necessitate redesigning all of them.

So we have to be a litle cleverer.

We approach this the same way as we approach strings in the recent refs
rework: if a parent has children attached (or has ever had them attached
since it was created or last read in), any new types created in the parent
are assigned provisional IDs starting at the very top of the type space and
working down.  (Their indexes in the internal libctf arrays remain
unchanged, so we don't suddenly need multigigabyte indexes!).  At writeout
(preserialization) time, we traverse the type table (and all other table
containing type IDs) and assign refs to every type ID in exactly the same
way we assign refs to every string offset (just a different set of refs --
we don't want to update type IDs with string offset values!).

For a parent dict with children, these refs are real entities in memory:
pointers to the memory locations where type IDs are stored, tracked in the
DTD of each type.  As we traverse the type table, we assign real IDs to each
type (by simple incrementation), storing those IDs in a new dtd_final_type
field in the DTD for each type.  Once the type table and all other tables
containing type IDs are fully traversed, we update all the refs and
overwrite the IDs currently residing in each with the final IDs for each
type.

That fixes up IDs in the parent dict itself (including forward references in
structs and the like: that's why the ref updates only happen at the end);
but what about child dicts' references, both to parent types and to their
own?  We add armouring to enforce that parent dicts are always serialized
before their children (which ctf-link.c already does, because it's a
precondition for strtab deduplication), and then arrange that when a ref is
added to a type whose ID has been assigned (has a dtd_final_type), we just
immediately do an update rather than storing a ref for later updating.
Since the parent is already serialized, all parent type IDs have a
dtd_final_type by this point, and all parent IDs in the children are
properly updated. The child types can now be renumbered now we now the
number of types in the parent, and their refs updated identically to what
was just done with the parent.

One wrinkle: before the child refs are updated, while we are working over
the child's type section, the type IDs in the child start from 1 (or
something like that), which might seem to overlap the parent IDs.  But this
is not the case: when you serialize the parent, the IDs written out to disk
are changed, but the only change to the representation in memory is that we
remember a dtd_final_type for each type (and use it to update all the child
type refs): its ID in memory is the same as it always was, a nonoverlapping
provisional ID higher than any other valid ID.  We enforce all of this by
asserting that when you add a ref to a type, the memory location that is
modified must be in the buffer being serialized: the code will not let you
accidentally modify the actual DTDs in memory.

We track the number of types in the parent in a new CTFv4 (not BTF) header
field (the dumper is updated): we will also use this to open CTFv3 child
dicts without change by simply declaring for them that the parent dict has
2^31 types in it (or 2^15, for v2 and below): the IDs in the children then
naturally come out right with no other changes needed.  (Right now, opening
CTFv3 child dicts requires extra compatibility code that has not been
written, but that code will no longer need to worry about type ID
differences.)

Various things are newly forbidden:

 - you cannot ctf_import() a child into a parent if you already ctf_add()ed
   types to the child, because all its IDs would change (and since you
   already cannot ctf_add() types to a child that hasn't had its parent
   imported, this in practice means only that ctf_create() must be followed
   immediately by a ctf_import() if this is a new child, which all sane
   clients were doing anyway).

 - You cannot import a child into a parent which has the wrong number of
   (non-provisional) types, again because all its IDs would be wrong:
   because parents only add types in the provisional space if children are
   attached to it, this would break the not unknown case of opening an
   archive, adding types to the parent, and only then importing children
   into it, so we add a special case: archive members which are not children
   in an archive with more than one member always pretend to have at least
   one child, so type additions in them are always provisional even before
   you ctf_import anything. In practice, this does exactly what we want,
   since all archives so far are created by the linker and have one parent
   and N children of that parent.

Because this introduces huge gaps between index and type ID for provisional
types, some extra assertions are added to ensure that the internal
ctf_type_to_index() is only ever called on types in the current dict (never
a parent dict): before now, this was just taken on trust, and it was often
wrong (which at best led to wrong results, as wrong array indexes were used,
and at worst to a buffer overflow). When hash debugging is on (suggesting
that the user doesn't mind expensive checks), every ctf_type_to_index()
triggers a ctf_index_to_type() to make sure that the operations are proper
inverses.

Lots and lots of tests are added to verify that assignment works and that
updating of every type kind works fine -- existing tests suffice for
type IDs in the variable and symtypetab sections.

The ld-ctf tests get a bunch of largely display-based updates: various
tests refer to 0x8... type IDs, which no longer exist, and because the
IDs are shorter all the spacing and alignment has changed.
2025-03-16 15:25:27 +00:00
Nick Alcock
a480362d88 libctf: string: refs rework
This commit moves provisional (not-yet-serialized) string refs towards the
scheme to be used for CTF IDs in the future.  In particular

 - provisional string offsets now count downwards from just under the
   external string offset space (all bits on but the high bit).  This makes
   it possible to detect an overflowing strtab, and also makes it trivial to
   determine whether any string offset (ref) updates were missed -- where
   before we might get a slightly corrupted or incorrect string, we now get
   a huge high strtab offset corresponding to no string, and an error is
   emitted at read time.

 - refs are emitted at serialization time during the pass through the types.
   They are strictly associated with the newly-written-out buffer: the
   existing opened CTF dict is not changed, though it does still get the new
   strtab so that new refs to the same string can just refer directly to it.
   The provisional strtab hash table that contains these strings is not
   deleted after serialization (because we might serialize again): instead,
   we keep track in the parent of the lowest-yet-used ("latest") provisional
   strtab offset, and any strtab offset above that, but not external
   (high-bit-on) is considered provisional.

   This is sort-of-enforced by moving most of the ref-addition function
   declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is
   not included by ctf-create.c or ctf-open.c.

 - because we don't add refs when adding types, we don't need to handle the
   case where we add things to expanding vlens (enums, struct members) and
   have to realloc() them.  So the entire painful movable refs system can
   just be deleted, along with the ability to remove refs piecemeal at all
   (purging all of them is still possible).  Strings added during type
   addition are added via ctf_str_add(), which adds no refs: the strings are
   picked up at serialization time and refs to their final, serialized
   resting place added.  The DTDs never have any refs in them, and their
   provisional strtab offsets are never updated by the ref system.

This caused several bugs to fall out of the earlier work and get fixed.
In particular, attempts to look up a string in a child dict now search
the parent's provisional strtab too: we add some extra special casing
for the null string so we don't need to worry about deduplication
moving it somewhere other than offset zero.

Finally, the optimization that removes an unreferenced synthetic external
strtab (the record of the strings the linker has told us about, kept around
internally for lookup during late serialization) is faulty: references to a
strtab entry will only produce CTF-level refs if their value might change,
and an external string's offset won't change, so it produces no refs: worse
yet, even if we did get a ref (say, if the string was originally believed
to be internal and only later were we told that the linker knew about it
too), when we serialize a strtab, all its refs are dropped (since they've
been updated and can no longer change); so if we serialized it a second
time, its synthetic external strtab would be considered empty and dropped,
even though the same external strings as before still exist, referencing
it.  We must keep the synthetic external strtab around as long as external
strings exist that reference it, i.e. for the life of the dict.

One benefit of all this: now we're emitting provisional string offsets at
a really high value, it's out of the way of the consecutive, deduplicated
string offsets in child dicts.  So we can drop the constraint that you
cannot add strings to a dict with children, which allows us to add types
freely to parent dicts again.  What you can't do is write that dict out
again: when we serialize, we currently update the dict being serialized
with the updated strtabs: when you write a dict out, its provisional
strings become real strings, and suddenly the offsets would overlap once
more.  But opening a dict and its children, adding to it, and then
writing it out again is rare indeed, and we have a workaround: anyone
wanting to do this can just use ctf_link instead.
2025-02-28 15:13:24 +00:00
Nick Alcock
dc93d01ff2 libctf: de-macroize LCTF_TYPE_TO_INDEX / LCTF_INDEX_TO_TYPE
Making these functions is unnecessary right now, but will become much
clearer shortly.

While we're at it, we can drop the third child argument to
LCTF_INDEX_TO_TYPE: it's only used for nontrivial purposes that aren't
literally the same as getting the result from the fp in one place,
in ctf_lookup_by_name_internal, and that place is easily fixed by just
looking in the right dictionary in the first place.
2025-02-28 15:13:24 +00:00
Nick Alcock
b875301e74 libctf: drop LCTF_TYPE_ISPARENT/LCTF_TYPE_ISCHILD
Parent/child determination is about to become rather more complex, making a
macro impractical.  Use the ctf_type_isparent/ischild function calls
everywhere and remove the macro.  Make them more const-correct too, to
make them more widely usable.

While we're about it, change several places that hand-implemented
ctf_get_dict() to call it instead, and armour several functions against
the null returns that were always possible in this case (but previously
unprotected-against).
2025-02-28 15:13:24 +00:00
Nick Alcock
9835747b21 libctf: generalize the ref system
Despite the removal of the separate movable ref list, the ref system as
a whole is more than complex enough to be worth generalizing now that
we are adding different kinds of ref.

Refs now are lists of uint32_t * which can be updated through the
pointer for all entries in the list and moved to new sites for all
pointers in a given range: they are no longer references to string
offsets in particular and can be references to other uint32_t-sized
things instead (note that ctf_id_t is a typedef to a uint32_t).

ctf-string.c has been adjusted accordingly (the adjustments are tiny,
more or less just turning a bunch of references to atom into
&atom->csa_refs).
2025-02-28 15:13:24 +00:00
Nick Alcock
6a6a3cc9c2 libctf, hash: add support for freeing functions taking an arg
There are a bunch of places in libctf where the code is complicated
by the fact that freeing a hash key or value requires access to the
dict: more generally, they want an arg pointer to *something*.

But for the sake of being able to use free() as a freeing function,
we can't do this at all times.  We also don't want to bloat up the
hash itself with an arg value unless necessary (in the same way we
already avoid storing the key or value freeing functions unless at
least one of them is specified).

So from the outside this change is simple: add a new
ctf_dynhash_create_arg which takes a new sort of freeing function
which takes an argument.  Internally, we store the arg only when
the key or owner is set, and cast from the one freeing function
to the other iff the arg is non-NULL.  This means it's impossible
to pass a value that may or may not be NULL to the freeing
function, but that's harmless for all current uses, and allows
significant simplifications elsewhere.
2025-02-28 15:13:23 +00:00
Nick Alcock
4d2d5afa60 libctf: actually deduplicate the strtab
This commit finally implements strtab deduplication, putting together all
the pieces assembled in the earlier commits.

The magic is entirely localized to ctf_link_write, which preserializes all
the dicts (parent first), and calls ctf_dedup_strings on the parent.

(The error paths get tweaked a bit too.)

Calling ctf_dedup_strings has implications elsewhere: the lifetime rules for
the inputs versus outputs change a bit now that the child output dicts
contain references to the parent dict's atoms table.  We also pre-purge
movable refs from all the deduplicated strings before freeing any of this
because movable refs contain backreferences into the dict they came from,
which means the parent contains references to all the children!  Purging
the refs first makes those references go away so we can free the children
without creating any wild pointers, even temporarily.

There's a new testcase that identifies a regression whereby offset 0 (the
null string) and index 0 (in children now often the parent dict name,
".ctf") got mixed up, leading to anonymous structs and unions getting the
not entirely C-valid name ".ctf" instead.

May other testcases get adjusted to no longer depend on the precise layout
of the strtab.

TODO: add new tests to verify that strings are actually being deduplicated.

libctf/
	* ctf-link.c (ctf_link_write): Deduplicate strings.
	* ctf-open.c (ctf_dict_close): Free refs, then the link outputs,
        then the out cu_mapping, then the inputs, in that order.
        * ctf-string.c (ctf_str_purge_refs): Not static any more.
	* ctf-impl.h: Declare it.

ld/
	* testsuite/ld-ctf/conflicting-cycle-2.A-1.d: Don't depend on
        strtab contents.
	* testsuite/ld-ctf/conflicting-cycle-2.A-2.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-2.parent.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-3.C-1.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-3.C-2.d: Likewise.
	* testsuite/ld-ctf/anonymous-conflicts*: New test.
2025-02-28 14:47:24 +00:00
Nick Alcock
ba66e0cc32 libctf: do not deduplicate strings in the header
It is unreasonable to expect users to ctf_import the parent before being
able to understand the header -- doubly so because the only string in the
header which is likely to be deduplicable is the parent name, which is the
same in every child, yet without the parent name being *available* in the
child's strtab you cannot call ctf_parent_name to figure out which parent
to import!

libctf/
	* ctf-serialize.c (ctf_preserialize): Prevent deduplication of header string
        fields.
	* ctf-open.c (ctf_set_base): Note this.
	* ctf-string.c (ctf_str_free_atom): Likewise.
2025-02-28 14:47:24 +00:00
Nick Alcock
a14fb397b2 libctf: tear opening and serialization in two
The next stage in sharing the strtab involves tearing two core parts
of libctf into two pieces.

Large parts of init_static_types, called at open time, involve traversing
the types table and initializing the hashtabs used by the type name lookup
functions and the enumerator conflicting checks.  If the string table is
partly located in the parent dict, this is obviously not going to work: so
split out that code into a new init_static_types_names function (which
also means moving the wrapper around init_static_types that was used
to simplify the enumerator code into being a wrapper around
init_static_types_names instead) and call that from init_static_types
(for parent dicts, and < v4 dicts), and from ctf_import (for v4 dicts).

At the same time as doing this we arrange to set LCTF_NO_STR (recently
introduced) iff this is a v4 child dict with a nonzero cth_parent_strlen:
this then blocks more or less everything that involves string operations
until a ctf_import has actually imported the strtab it depends on.  (No
string oeprations that actually use this have been introduced yet, but
since no string deduplication is happening yet either this is harmless.)

For v4 dicts, at import time we also validate that the cth_parent_strlen has
the same value as the parent's strlen (zero is also a valid value,
indicating a non-shared strtab, as is commonplace in older dicts, dicts
emitted by the compiler, parent dicts etc).  This makes ctf_import more
complex, so we simplify things again by dropping all the repeated code in
the obscure used-only-by-ctf_link ctf_import_unref and turning both into
wrappers around an internal function.  We prohibit repeated ctf_imports
(except of NULL or the same dict repeatedly), and set up some new fields
which will be used later to prevent people from adding strings to parent
dicts with pre-existing serialized strtabs once they have children imported
into them (which would change their string length and corrupt all those
strtabs).

Serialization also needs to be torn in two.  The problem here is that
currently serialization does too much: it emits everything including the
strtab, does things that depend on the strtab being finalized (notably
variable table sorting), and then writes it out.  Much of this emission
itself involves strtab writes, so the strtab is not actually complete until
halfway through ctf_serialize.  But when deduplicating, we want to use
machinery in ctf-link and ctf-dedup to deduplicate the strtab after it is
complete, and only then write it out.

We could do this via having ctf_serialize call some sort of horrible
callback, but it seems much simpler to just cut ctf_serialize in two,
and introduce a new ctf_preserialize which can optionally be called to do
all this "everything but the strtab" work.  (If it's not called,
ctf_serialize calls it itself.)

This means pulling some internal variables out of ctf_serialize into the
ctf_dict_t, and slightly abusing LCTF_NO_STR to mean (in addition to its
"no, you can't do much between opening a child dict and importing its
parent" semantics), "no, you can't do much between calling ctf_preserialize
and ctf_serialize". The requirements of both are not quite identical -- you
definitely can do things that involve string lookups after ctf_preserialize
-- but it serves to stop callers from accidentally adding more types after
the types table has been written out, and that's good enough.
ctf_preserialize isn't public API anyway.

libctf/
	* ctf-impl.h (struct ctf_dict) [ctf_serializing_buf]: New.
        [ctf_serializing_buf_size]: Likewise.
        [ctf_serializing_vars]: Likewise.
        [ctf_serializing_nvars]: Likewise.
        [ctf_max_children]: Likewise.
	(LCTF_PRESERIALIZED): New.
	(ctf_preserialize): New.
	(ctf_depreserialize): New.
	* ctf-open.c (init_static_types): Rename to...
	(init_static_types_names): ... this, wrapping a different
        function.
        (init_static_types_internal): Rename to...
        (init_static_types): ... this, and set LCTF_NO_STR if neecessary.
        Tear out the name-lookup guts into...
	(init_static_types_names_internal): ... this new function. Fix a few
        comment typos.
	(ctf_bufopen): Emphasise that you cannot rely on looking up strings
        at any point in ctf_bufopen any more.
	(ctf_dict_close): Free ctf_serializing_buf.
	(ctf_import): Turn into a wrapper, calling...
	(ctf_import_internal): ... this.  Prohibit repeated ctf_imports of
        different parent dicts, or "unimporting" by setting it back to NULL
        again.  Validate the parent we do import using cth_parent_strlen.
        Call init_static_types_names if the strtab is shared with the
        parent.
	(ctf_import_unref): Turn into a wrapper.
	* ctf-serialize.c (ctf_serialize): Split out everything before
        strtab serialization into...
	(ctf_preserialize): ... this new function.
	(ctf_depreserialize): New, undo preserialization on error.
2025-02-28 14:47:24 +00:00
Nick Alcock
6c77689963 include, libctf: add cth_parent_strlen CTFv4 header field
The first format difference between v3 and v4 is a cth_parent_strlen header
field.  This field (obviously not present in BTF) is populated from the
string table length of the parent at serialization time (protection against
being serialized before the parent is will be added in a later commit in
this series), and will be used at open time to prohibit opening of dicts
with a different strlen (which would corrupt the child's string table
if it was shared with the parent).

For now, just add the field, populate it at serialization time when linking
(when not linking, no deduplication is done and the correct value remains
unchanged), and dump it.

include/
	* ctf.h (ctf_header) [cth_parent_strlen]: New.

libctf/
	* ctf-dump.c (ctf_dump_header_sizefield): New.
	(ctf_dump_header): Use to dump the cth_parent_strlen.
	* ctf-open.c (upgrade_header_v2): Populate cth_parent_strlen.
	(upgrade_header_v3): Likewise.
	(ctf_flip_header): Flip it.
	(ctf_bufopen): Drop unnecessary initialization.
	* ctf-serialize.c (ctf_serialize): Write it out when linking.

ld/
	* testsuite/ld-ctf/data-func-conflicted-vars.d: Skip the nwe dump output.
	* testsuite/ld-ctf/data-func-conflicted.d: Likewise.
2025-02-28 14:47:24 +00:00
Nick Alcock
9a74ab12c8 include, libctf: start work on libctf v4
This format is a superset of BTF, but for now we just do the minimum to
declare a new file format version, without actually introducing any format
changes.

From now on, we refuse to reserialize CTFv1 dicts: these have a distinct
parent/child boundary which obviously cannot change upon reserialization
(that would change the type IDs): instead, we encoded this by stuffing in
a unique CTF version for such dicts.  We can't do that now we have one
version for all CTFv4 dicts, and testing such old dicts is very hard these
days anyway, and is not automated: so just drop support for writing them out
entirely. (You still *can* write them out, but you have to do a full-blown
ctf_link, which generates an all-new fresh dict and recomputes type IDs as
part of deduplication.)

To prevent this extremely-not-ready format escaping into the wild, add a
new mechanism whereby any format version higher than the new #define
CTF_STABLE_VERSION cannot be serialized unless I_KNOW_LIBCTF_IS_UNSTABLE is
set in the environment.

include/
	* ctf-api.h (_CTF_ERRORS) [ECTF_CTFVERS_NO_SERIALIZE]: New.
        [ECTF_UNSTABLE]: New.
         (ECTF_NERR): Update.
	* ctf.h: Small comment improvements..
        (ctf_header_v3): New, copy of ctf_header.
	(CTF_VERSION_4): New.
	(CTF_VERSION): Now CTF_VERSION_4.
	(CTF_STABLE_VERSION): Still 4, CTF_VERSION_3.

ld/
	* testsuite/ld-ctf/*.d: Update to CTF_VERSION_4.

libctf/
	* ctf-impl.h (LCTF_NO_SERIALIZE): New.
	* ctf-dump.c (ctf_dump_header): Add CTF_VERSION_4.
	* ctf-open.c (ctf_dictops): Likewise.
        (upgrade_header): Rename to...
	(upgrade_header_v2): ... this.
	(upgrade_header_v3): New.
	(upgrade_types): Support upgrading from CTF_VERSION_3.
        Turn on LCTF_NO_SERIALIZE for CTFv1.
	(init_static_types_internal): Upgrade all types tables older than
	* CTF_VERSION_4.
	(ctf_bufopen): Support CTF_VERSION_4: error out if we forget to
	update this switch in future.  Add header upgrading from v3 and
	below.  Improve comments slightly.
	* ctf-serialize.c (ctf_serialize): Block serialization of unstable
	file formats, and of file formats for which LCTF_NO_SERIALIZE is
	turned on (v1).
2025-02-28 14:47:24 +00:00
Alan Modra
e8e7cf2abe Update year range in copyright notice of binutils files 2025-01-01 18:29:57 +10:30
Nick Alcock
e34b3bde88 libctf: clean up hashtab error handling mess
The dict and archive opening code in libctf is somewhat unusual, because
unlike everything else, it cannot report errors by setting an error on the
dict, because in case of error there isn't one.  They get passed an error
integer pointer that is set on error instead.

Inside ctf_bufopen this is implemented by calling ctf_set_open_errno and
passing it a positive error value.  In turn this means that most things it
calls (including init_static_types) return zero on success and a *positive*
ECTF_* or errno value on error.

This trickles down to ctf_dynhash_insert_type, which is used by
init_static_types to add newly-detected types to the name tables.  This was
returning the error value it received from a variety of functions without
alteration.  ctf_dynhash_insert conformed to this contract by returning a
positive value on error (usually OOM), which is unfortunate for multiple
reasons:

- ctf_dynset_insert returns a *negative* value
- ctf_dynhash_insert and ctf_dynset_insert don't take an fp, so the value
  they return is turned into the errno, so it had better be right, callers
  don't just check for != 0 here
- more or less every single caller of ctf_dyn*_insert in libctf other than
  ctf_dynhash_insert_type (and there are a *lot*, mostly in the
  deduplicator) assumes that ctf_dynhash_insert returns a negative value
  on error, even though it doesn't.  In practice the only possible error is
  OOM, but if OOM does happen we end up with a nonsense error value.

The simplest fix for this seems to be to make ctf_dynhash_insert and
ctf_dynset_insert conform to the usual interface contract: negative
values are errors.  This in turn means that ctf_dynhash_insert_type
needs to change: let's make it consistent too, returning a negative
value on error, putting the error on the fp in non-negated form.

init_static_types_internal adapts to this by negating the error return from
ctf_dynhash_insert_type, so the value handed back to ctf_bufopen is still
positive: the new call site in ctf_track_enumerator does not need to change.

(The existing tests for this reliably detect when I get it wrong.
I know, because they did.)

libctf/
	* ctf-hash.c (ctf_dynhash_insert): Negate return value.
	(ctf_dynhash_insert_type): Set de-negated error on the dict:
        return negated error.
	* ctf-open.c (init_static_types_internal): Adapt to this change.
2024-07-31 21:10:00 +01:00
Nick Alcock
6da9267482 libctf, include: add ctf_dict_set_flag: less enum dup checking by default
The recent change to detect duplicate enum values and return ECTF_DUPLICATE
when found turns out to perturb a great many callers.  In particular, the
pahole-created kernel BTF has the same problem we historically did, and
gleefully emits duplicated enum constants in profusion.  Handling the
resulting duplicate errors from BTF -> CTF converters reasonably is
unreasonably difficult (it amounts to forcing them to skip some types or
reimplement the deduplicator).

So let's step back a bit.  What we care about mostly is that the
deduplicator treat enums with conflicting enumeration constants as
conflicting types: programs that want to look up enumeration constant ->
value mappings using the new APIs to do so might well want the same checks
to apply to any ctf_add_* operations they carry out (and since they're
*using* the new APIs, added at the same time as this restriction was
imposed, there is likely to be no negative consequence of this).

So we want some way to allow processes that know about duplicate detection
to opt into it, while allowing everyone else to stay clear of it: but we
want ctf_link to get this behaviour even if its caller has opted out.

So add a new concept to the API: dict-wide CTF flags, set via
ctf_dict_set_flag, obtained via ctf_dict_get_flag.  They are not bitflags
but simple arbitrary integers and an on/off value, stored in an unspecified
manner (the one current flag, we translate into an LCTF_* flag value in the
internal ctf_dict ctf_flags word). If you pass in an invalid flag or value
you get a new ECTF_BADFLAG error, so the caller can easily tell whether
flags added in future are valid with a particular libctf or not.

We check this flag in ctf_add_enumerator, and set it around the link
(including on child per-CU dicts).  The newish enumerator-iteration test is
souped up to check the semantics of the flag as well.

The fact that the flag can be set and unset at any time has curious
consequences. You can unset the flag, insert a pile of duplicates, then set
it and expect the new duplicates to be detected, not only by
ctf_add_enumerator but also by ctf_lookup_enumerator.  This means we now
have to maintain the ctf_names and conflicting_enums enum-duplication
tracking as new enums are added, not purely as the dict is opened.
Move that code out of init_static_types_internal and into a new
ctf_track_enumerator function that addition can also call.

(None of this affects the file format or serialization machinery, which has
to be able to handle duplicate enumeration constants no matter what.)

include/
	* ctf-api.h (CTF_ERRORS) [ECTF_BADFLAG]: New.
	(ECTF_NERR): Update.
	(CTF_STRICT_NO_DUP_ENUMERATORS): New flag.
	(ctf_dict_set_flag): New function.
	(ctf_dict_get_flag): Likewise.

libctf/
	* ctf-impl.h (LCTF_STRICT_NO_DUP_ENUMERATORS): New flag.
	(ctf_track_enumerator): Declare.
	* ctf-dedup.c (ctf_dedup_emit_type): Set it.
	* ctf-link.c (ctf_create_per_cu): Likewise.
	(ctf_link_deduplicating_per_cu): Likewise.
	(ctf_link): Likewise.
	(ctf_link_write): Likewise.
	* ctf-subr.c (ctf_dict_set_flag): New function.
	(ctf_dict_get_flag): New function.
	* ctf-open.c (init_static_types_internal): Move enum tracking to...
	* ctf-create.c (ctf_track_enumerator): ... this new function.
	(ctf_add_enumerator): Call it.
	* libctf.ver: Add the new functions.
	* testsuite/libctf-lookup/enumerator-iteration.c: Test them.
2024-07-31 21:02:05 +01:00
Nick Alcock
4cd2c266cf libctf, open: Fix enum error handling path
This new error-handling path was not properly initializing the
fp's errno.

libctf/
	* ctf-open.c (init_static_types_internal): Set errno properly.
2024-07-31 21:02:05 +01:00
Nick Alcock
95861bb396 libctf: we do in fact support foreign-endian old versions
The worry that caused this to not be supported was because we don't
bother endian-flipping version-related fields before checking them.
But they're all unsigned chars anyway, and don't need any flipping at
all.

This should be supported and should already work.  Enable it.

libctf/
	* ctf-open.c (ctf_bufopen): Don't prohibit foreign-endian
        upgrades.
2024-07-31 21:02:04 +01:00
Nick Alcock
6e09d4a6e6 libctf: prohibit addition of enums with overlapping enumerator constants
libctf has long prohibited addition of enums with overlapping constants in a
single enum, but now that we are properly considering enums with overlapping
constants to be conflciting types, we can go further and prohibit addition
of enumeration constants to a dict if they already exist in any enum in that
dict: the same rules as C itself.

We do this in a fashion vaguely similar to what we just did in the
deduplicator, by considering enumeration constants as identifiers and adding
them to the core type/identifier namespace, ctf_dict_t.ctf_names.  This is a
little fiddly, because we do not want to prohibit opening of existing dicts
into which the deduplicator has stuffed enums with overlapping constants!
We just want to prohibit the addition of *new* enumerators that violate that
rule.  Even then, it's fine to add overlapping enumerator constants as long
as at least one of them is in a non-root type.  (This is essential for
proper deduplicator operation in cu-mapped mode, where multiple compilation
units can be smashed into one dict, with conflicting types marked as
hidden: these types may well contain overlapping enumerators.)

So, at open time, keep track of all enums observed, then do a third pass
through the enums alone, adding each enumerator either to the ctf_names
table as a mapping from the enumerator name to the enum it is part of (if
not already present), or to a new ctf_conflicting_enums hashtable that
tracks observed duplicates. (The latter is not used yet, but will be soon.)

(We need to do a third pass because it's quite possible to have an enum
containing an enumerator FOO followed by a type FOO: since they're processed
in order, the enumerator would be processed before the type, and at that
stage it seems nonconflicting.  The easiest fix is to run through the
enumerators after all type names are interned.)

At ctf_add_enumerator time, if the enumerator to which we are adding a type
is root-visible, check for an already-present name and error out if found,
then intern the new name in the ctf_names table as is done at open time.

(We retain the existing code which scans the enum itself for duplicates
because it is still an error to add an enumerator twice to a
non-root-visible enum type; but we only need to do this if the enum is
non-root-visible, so the cost of enum addition is reduced.)

Tested in an upcoming commit.

libctf/
	* ctf-impl.h (ctf_dict_t) <ctf_names>: Augment comment.
        <ctf_conflicting_enums>: New.
	(ctf_dynset_elements): New.
	* ctf-hash.c (ctf_dynset_elements): Implement it.
	* ctf-open.c (init_static_types): Split body into...
        (init_static_types_internal): ... here.  Count enumerators;
        keep track of observed enums in pass 2; populate ctf_names and
        ctf_conflicting_enums with enumerators in a third pass.
	(ctf_dict_close): Free ctf_conflicting_enums.
	* ctf-create.c (ctf_add_enumerator): Prohibit addition of duplicate
        enumerators in root-visible enum types.

include/
	* ctf-api.h (CTF_ADD_NONROOT): Describe what non-rootness
        means for enumeration constants.
	(ctf_add_enumerator):  The name is not a misnomer.
        We now require that enumerators have unique names.
        Document the non-rootness of enumerators.
2024-06-18 13:20:32 +01:00
Nick Alcock
37ed36fc8b libctf: fix leak of entire dict when dict opening fails
Ever since commit 1fa7a0c24e ("libctf: sort out potential refcount
loops") ctf_dict_close has only freed anything if the refcount on entry
to the function is precisely 1.  >1 obviously just decrements the
refcount, but the linker machinery can sometimes cause freeing to recurse
from a dict to another dict and then back to the first dict again, so
we interpret a refcount of 0 as an indication that this is a recursive call
and we should just return, because a caller is already freeing this dict.

Unfortunately there is one situation in which this is not true: the bad:
codepath in ctf_bufopen entered when opening fails.  Because the refcount is
bumped only at the very end of ctf_bufopen, any failure causes
ctf_dict_close to be entered with a refcount of zero, and it frees nothing
and we leak the entire dict.

The solution is to bump the refcount to 1 right before freeing... but this
codepath is clearly delicate enough that we need to properly validate it,
so we add a test that uses malloc interposition to count allocations and
frees, creates a dict, writes it out, intentionally corrupts it (by setting
a bunch of bytes after the header to a value high enough that it is
definitely not a valid CTF type kind), then tries to open it again and
counts the malloc/free pairs to make sure they're matched.  (Test run only
on *-linux-gnu, because malloc interposition is not a thing you can rely
upon working everywhere, and this test is not arch-dependent so if it
passes on one arch it can be assumed to pass on all of them.)

libctf/
	* ctf-open.c (ctf_bufopen): Bump the refcount on failure.
	* testsuite/libctf-regression/open-error-free.*: New test.
2024-05-17 12:58:18 +01:00
Nick Alcock
7bc376bb97 libctf: typos
Some functions were renamed without the comments catching up.

libctf/
	* ctf-open.c (upgrade_types_v1): Fix comment typos.
2024-05-17 12:58:17 +01:00
Nick Alcock
483546ce4f libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.

It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.

Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.

... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t.  This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly.  This simplifies most of its callers
significantly.

(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)

This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.

libctf/

	* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
	(ctf_type_sect_size): Add static type sizes too.
	(ctf_serialize): Return the new dict rather than updating the
	existing dict.  No longer fail for dicts with static types;
	copy them onto the start of the new types table.
	(ctf_gzwrite): Actually serialize before gzwriting.
	(ctf_write_mem): Improve forced (test-mode) endian-flipping:
	flip dicts even if they are too small to be compressed.
	Improve confusing variable naming.
	* ctf-archive.c (arc_write_one_ctf): Don't bother to call
	ctf_serialize: both the functions we call do so.
	* ctf-string.c (ctf_str_create_atoms): Drop serializing case
	(atoms arg).
	* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
	(ctf_simple_open_internal): Delete.
	(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
	longer bother with syn_ext_strtab or forced atoms table,
	serialization no longer needs them.
	* ctf-create.c (ctf_create): Call ctf_bufopen directly.
	* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
	(ctf_simple_open_internal): Delete.
	(ctf_bufopen_internal): Likewise.
	(ctf_serialize): Adjust.
	* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
	this is supposed to work.
2024-04-19 16:14:47 +01:00
Nick Alcock
cf9da3b0b6 libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.

There are three intertwined changes here:

 - pull the contents of strtabs from newly ctf_bufopened dicts into the
   atoms table, so that future additions will reuse the existing offset etc
   rather than adding new identical strings
 - allow the internal ctf_bufopen done by serialization to contribute its
   existing atoms table, so that existing atoms can be used for the
   remainder of the open process (like name table construction): this atoms
   table currente gets thrown away in the mass reassignment done later in
   ctf_serialize in any case, but it needs to be there during the open.
 - rewrite ctf_str_write_strtab so that a) it uses iterators rather than
   ctf_*_iter, reducing pointless structures which serve no other purpose
   than to implement ordinary variable scope, but more clunkily, and b)
   retains the existing strtab on the front of the new one, with its sort
   retained, rather than resorting, so all existing already-written strtab
   offsets remain valid across the call.

This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.

(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize().  We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them.  This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)

libctf/

	* ctf-create.c (ctf_create): Add (temporary) atoms arg.
	* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
	(ctf_str_create_atoms): Adjust.
	(ctf_str_write_strtab): Likewise.
	(ctf_simple_open_internal): Likewise.
	* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
	(ctf_bufopen): Likewise.
	(ctf_bufopen_internal): Initialize just enough of an
	atoms table: pre-init from the atoms arg if supplied.
	(ctf_simple_open): Adjust.
	* ctf-serialize.c (ctf_serialize): Constify the strtab.
	Move ref list purging into ctf_str_write_strtab.
	Initialize the new dict with the old dict's atoms table.
	Accept the new strtab from ctf_str_write_strtab.
	Adjust for addition of ctf_dynstrtab.
	* ctf-string.c (ctf_strraw_explicit): Improve comments.
	(ctf_str_create_atoms): Prepopulate from an existing atoms table,
	or alternatively pull in all strings from the strtab and turn
	them into atoms.
	(ctf_str_free_atoms): Free the dynstrtab and its strtab.
	(struct ctf_strtab_write_state): Remove.
	(ctf_str_count_strtab): Fold this...
	(ctf_str_populate_sorttab): ... and this...
	(ctf_str_write_strtab): ... into this.  Prepend existing strings
	to the strtab rather than resorting them (and wrecking their
	offsets).  Keep the dynstrtab updated.  Update refs for all
	atoms with refs, whether or not they are strings newly added
	to the strtab.
2024-04-19 16:14:47 +01:00
Nick Alcock
629acbe4a3 libctf: rename ctf_dict.ctf_{symtab,strtab}
These two fields are constantly confusing because CTF dicts contain both
a symtypetab and strtab, but these fields are not that: they are the
symtab and strtab from the ELF file.  We have enough string tables now
(internal, external, synthetic external, dynamic) that we need to at
least name them better than this to avoid getting totally confused.
Rename them to ctf_ext_symtab and ctf_ext_strtab.

libctf/

	* ctf-dump.c (ctf_dump_objts): Rename ctf_symtab -> ctf_ext_symtab.
	* ctf-impl.h (struct ctf_dict.ctf_symtab): Rename to...
	(struct ctf_dict.ctf_ext_strtab): ... this.
	(struct ctf_dict.ctf_strtab): Rename to...
	(struct ctf_dict.ctf_ext_strtab): ... this.
	* ctf-lookup.c (ctf_lookup_symbol_name): Adapt.
	(ctf_lookup_symbol_idx): Adapt.
	(ctf_lookup_by_sym_or_name): Adapt.
	* ctf-open.c (ctf_bufopen_internal): Adapt.
	(ctf_dict_close): Adapt.
	(ctf_getsymsect): Adapt.
	(ctf_getstrsect): Adapt.
	(ctf_symsect_endianness): Adapt.
2024-04-19 16:14:46 +01:00
Nick Alcock
8a60c93096 libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.

But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.

So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them.  (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)

This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account.  Some of these irregularities were hard to define as
anything but bugs.

Notably:

 - The symbol handling was assuming that symbols only needed to be
   looked for in dynamic hashtabs or static linker-laid-out indexed/
   nonindexed layouts, but now we want to check both in case people
   added more symbols to a dict they opened.

 - The code that handles type additions wasn't checking to see if types
   with the same name existed *at all* (so you could do
   ctf_add_typedef (fp, "foo", bar) repeatedly without error).  This
   seems reasonable for types you just added, but we probably *do* want
   to ban addition of types with names that override names we already
   used in the ctf_open()ed portion, since that would probably corrupt
   existing type relationships.  (Doing things this way also avoids
   causing new errors for any existing code that was doing this sort of
   thing.)

 - ctf_lookup_variable entirely failed to work for variables just added
   by ctf_add_variable: you had to write the dict out and read it back
   in again before they appeared.

 - The symbol handling remembered what symbols you looked up but didn't
   remember their types, so you could look up an object symbol and then
   find it popping up when you asked for function symbols, which seems
   less than ideal.  Since we had to rejig things enough to be able to
   distinguish function and object symbols internally anyway (in order
   to give suitable errors if you try to add a symbol with a name that
   already existed in the ctf_open()ed dict), this bug suddenly became
   more visible and was easily fixed.

We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time).  This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).

There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.

libctf/

	* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
	(ctf_dict.ctf_symhash_func): ... this and...
	(ctf_dict.ctf_symhash_objt): ... this.
	(ctf_dict.ctf_stypes): New, counts static types.
	(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
	(LCTF_RDWR): Deleted.
	(LCTF_DIRTY): Renumbered.
	(LCTF_LINKING): Likewise.
	(ctf_lookup_variable_here): New.
	(ctf_lookup_by_sym_or_name): Likewise.
	(ctf_symbol_next_static): Likewise.
	(ctf_add_variable_forced): Likewise.
	(ctf_add_funcobjt_sym_forced): Likewise.
	(ctf_simple_open_internal): Adjust.
	(ctf_bufopen_internal): Likewise.
	* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
	(ctf_create): Migrate a bunch of initializations into bufopen.
	Force recreation of name tables.  Do not forcibly override the
	model, let ctf_bufopen do it.
	(ctf_static_type): New.
	(ctf_update): Drop LCTF_RDWR check.
	(ctf_dynamic_type): Likewise.
	(ctf_add_function): Likewise.
	(ctf_add_type_internal): Likewise.
	(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
	(ctf_set_array): Likewise.
	(ctf_add_struct_sized): Likewise.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_enumerator): Likewise (only on the target dict).
	(ctf_add_member_offset): Likewise.
	(ctf_add_generic): Drop LCTF_RDWR check.  Ban addition of types
	with colliding names.
	(ctf_add_forward): Note safety under the new rules.
	(ctf_add_variable): Split all but the existence check into...
	(ctf_add_variable_forced): ... this new function.
	(ctf_add_funcobjt_sym): Likewise...
	(ctf_add_funcobjt_sym_forced): ... for this new function.
	* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
	with any stypes.
	(ctf_link_add_strtab): Likewise.
	(ctf_link_shuffle_syms): Likewise.
	(ctf_link_intern_extern_string): Note pre-existing prohibition.
	* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
	(ctf_lookup_variable): Split out looking in a dict but not
	its parent into...
	(ctf_lookup_variable_here): ... this new function.
	(ctf_lookup_symbol_idx): Track whether looking up a function or
	object: cache them separately.
	(ctf_symbol_next): Split out looking in non-dynamic symtypetab
	entries to...
	(ctf_symbol_next_static): ... this new function.  Don't get confused
	by the simultaneous presence of static and dynamic symtypetab entries.
	(ctf_try_lookup_indexed):  Don't waste time looking up symbols by
	index before there can be any idea how symbols are numbered.
	(ctf_lookup_by_sym_or_name): Distinguish between function and
	data object lookups.  Drop LCTF_RDWR.
	(ctf_lookup_by_symbol): Adjust.
	(ctf_lookup_by_symbol_name): Likewise.
	* ctf-open.c (init_types): Rename to...
	(init_static_types): ... this.  Drop LCTF_RDWR.  Populate ctf_stypes.
	(ctf_simple_open): Drop writable arg.
	(ctf_simple_open_internal): Likewise.
	(ctf_bufopen): Likewise.
	(ctf_bufopen_internal): Populate fields only used for writable dicts.
	Drop LCTF_RDWR.
	(ctf_dict_close): Cater for symhash cache split.
	* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
	* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
	* testsuite/libctf-lookup/add-to-opened*: New test.
2024-04-19 16:14:46 +01:00
Nick Alcock
2ba5ec13b2 libctf: fix name lookup in dicts containing base-type bitfields
The intent of the name lookup code was for lookups to yield non-bitfield
basic types except if none existed with a given name, and only then
return bitfield types with that name.  Unfortunately, the code as
written only does this if the base type has a type ID higher than all
bitfield types, which is most unlikely (the opposite is almost always
the case).

Adjust it so that what ends up in the name table is the highest-width
zero-offset type with a given name, if any such exist, and failing that
the first type with that name we see, no matter its offset.  (We don't
define *which* bitfield type you get, after all, so we might as well
just stuff in the first we find.)

Reported by Stephen Brennan <stephen.brennan@oracle.com>.

libctf/

	* ctf-open.c (init_types): Modify to allow some lookups during open;
	detect bitfield name reuse and prefer less bitfieldy types.
	* testsuite/libctf-writable/libctf-bitfield-name-lookup.*: New test.
2024-04-19 16:14:46 +01:00
Nick Alcock
54a0219150 libctf: remove static/dynamic name lookup distinction
libctf internally maintains a set of hash tables for type name lookups,
one for each valid C type namespace (struct, union, enum, and everything
else).

Or, rather, it maintains *two* sets of hash tables: one, a ctf_hash *,
is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the
other, a ctf_dynhash *, is meant for lookups in ctf_create()d dicts.

This distinction was somewhat valuable in the far pre-binutils past when
two different hashtable implementations were used (one expanding, the
other fixed-size), but those days are long gone: the hash table
implementations are almost identical, both wrappers around the libiberty
hashtab. The ctf_dynhash has many more capabilities than the ctf_hash
(iteration, deletion, etc etc) and has no downsides other than starting
at a fixed, arbitrary small size.

That limitation is easy to lift (via a new ctf_dynhash_create_sized()),
following which we can throw away nearly all the ctf_hash
implementation, and all the code to choose between readable and writable
hashtabs; the few convenience functions that are still useful (for
insertion of name -> type mappings) can also be generalized a bit so
that the extra string verification they do is potentially available to
other string lookups as well.

(libctf still has two hashtable implementations, ctf_dynhash, above,
and ctf_dynset, which is a key-only hashtab that can avoid a great many
malloc()s, used for high-volume applications in the deduplicator.)

libctf/

	* ctf-create.c (ctf_create): Eliminate ctn_writable.
	(ctf_dtd_insert): Likewise.
	(ctf_dtd_delete): Likewise.
	(ctf_rollback): Likewise.
	(ctf_name_table): Eliminate ctf_names_t.
	* ctf-hash.c (ctf_dynhash_create): Comment update.
        Reimplement in terms of...
	(ctf_dynhash_create_sized): ... this new function.
	(ctf_hash_create): Remove.
	(ctf_hash_size): Remove.
	(ctf_hash_define_type): Remove.
	(ctf_hash_destroy): Remove.
	(ctf_hash_lookup_type): Rename to...
	(ctf_dynhash_lookup_type): ... this.
	(ctf_hash_insert_type): Rename to...
	(ctf_dynhash_insert_type): ... this, moving validation to...
	* ctf-string.c (ctf_strptr_validate): ... this new function.
	* ctf-impl.h (struct ctf_names): Extirpate.
	(struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t.
	(struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t.
	(ctf_name_table): Now returns a ctf_dynhash_t.
	(ctf_lookup_by_rawhash): Remove.
	(ctf_hash_create): Likewise.
	(ctf_hash_insert_type): Likewise.
	(ctf_hash_define_type): Likewise.
	(ctf_hash_lookup_type): Likewise.
	(ctf_hash_size): Likewise.
	(ctf_hash_destroy): Likewise.
	(ctf_dynhash_create_sized): New.
	(ctf_dynhash_insert_type): New.
	(ctf_dynhash_lookup_type): New.
	(ctf_strptr_validate): New.
	* ctf-lookup.c (ctf_lookup_by_name_internal): Adapt.
	* ctf-open.c (init_types): Adapt.
	(ctf_set_ctl_hashes): Adapt.
	(ctf_dict_close): Adapt.
	* ctf-serialize.c (ctf_serialize): Adapt.
	* ctf-types.c (ctf_lookup_by_rawhash): Remove.
2024-04-19 16:14:46 +01:00
Alan Modra
fd67aa1129 Update year range in copyright notice of binutils files
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:

1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
   author I haven't committed, 'Kalray SA.', to cover gas testsuite
   files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
2024-01-04 22:58:12 +10:30
Alan Modra
027333da75 ctf segfaults
PR 30228
	PR 30229
	* ctf-open.c (ctf_bufopen_internal): Check for NULL cts_data.
	* ctf-archive.c (ctf_arc_bufpreamble, ctf_arc_bufopen): Likewise.
2023-03-19 22:19:19 +10:30
Alan Modra
d87bef3a7b Update year range in copyright notice of binutils files
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
2023-01-01 21:50:11 +10:30
Nick Alcock
faf5e6ace8 libctf: add LIBCTF_WRITE_FOREIGN_ENDIAN debugging option
libctf has always handled endianness differences by detecting
foreign-endian CTF dicts on the input and endian-flipping them: dicts
are always written in native endianness.  This makes endian-awareness
very low overhead, but it means that the foreign-endian code paths
almost never get routinely tested, since "make check" usually reads in
dicts ld has just written out: only a few corrupted-CTF tests are
actually in fixed endianness, and even they only test the foreign-
endian code paths when you run make check on a big-endian machine.
(And the fix is surely not to add more .s-based tests like that, because
they are a nightmare to maintain compared to the C-code-based ones.)

To improve on this, add a new environment variable,
LIBCTF_WRITE_FOREIGN_ENDIAN, which causes libctf to unconditionally
endian-flip at ctf_write time, so the output is always in the wrong
endianness.  This then tests the foreign-endian read paths properly
at open time.

Make this easier by restructuring the writeout code in ctf-serialize.c,
which duplicates the maybe-gzip-and-write-out code three times (once
for ctf_write_mem, with thresholding, and once each for
ctf_compress_write and ctf_write just so those can avoid thresholding
and/or compression).  Instead, have the latter two call the former
with thresholds of 0 or (size_t) -1, respectively.

The endian-flipping code itself gains a bit of complexity, because
one single endian-flipper (flip_types) was assuming the input to be
in foreign-endian form and assuming it could pull things out of the
input once they had been flipped and make sense of them. At the
cost of a few lines of duplicated initializations, teach it to
read before flipping if we're flipping to foreign-endianness instead
of away from it.

libctf/
	* ctf-impl.h (ctf_flip_header): No longer static.
	(ctf_flip): Likewise.
	* ctf-open.c (flip_header): Rename to...
	(ctf_flip_header): ... this, now it is not private to one file.
	(flip_ctf): Rename...
	(ctf_flip): ... this too.  Add FOREIGN_ENDIAN arg.
	(flip_types): Likewise.  Use it.
	(ctf_bufopen_internal): Adjust calls.
	* ctf-serialize.c (ctf_write_mem): Add flip_endian path via
	a newly-allocated bounce buffer.
	(ctf_compress_write): Move below ctf_write_mem and reimplement
	in terms of it.
	(ctf_write): Likewise.
	(ctf_gzwrite): Note that this obscure writeout function does not
	support endian-flipping.
2022-03-23 13:48:32 +00:00
Nick Alcock
84f5c557a4 libctf, ld: diagnose corrupted CTF header cth_strlen
The last section in a CTF dict is the string table, at an offset
represented by the cth_stroff header field.  Its length is recorded in
the next field, cth_strlen, and the two added together are taken as the
size of the CTF dict.  Upon opening a dict, we check that none of the
header offsets exceed this size, and we check when uncompressing a
compressed dict that the result of the uncompression is the same length:
but CTF dicts need not be compressed, and short ones are not.
Uncompressed dicts just use the ctf_size without checking it.  This
field is thankfully almost unused: it is mostly used when reserializing
a dict, which can't be done to dicts read off disk since they're
read-only.

However, when opening an uncompressed foreign-endian dict we have to
copy it out of the mmaped region it is stored in so we can endian-
swap it, and we use ctf_size when doing that.  When the cth_strlen is
corrupt, this can overrun.

Fix this by checking the ctf_size in all uncompressed cases, just as we
already do in the compressed case.  Add a new test.

This came to light because various corrupted-CTF raw-asm tests had an
incorrect cth_strlen: fix all of them so they produce the expected
error again.

libctf/
	PR libctf/28933
	* ctf-open.c (ctf_bufopen_internal): Always check uncompressed
	CTF dict sizes against the section size in case the cth_strlen is
	corrupt.

ld/
	PR libctf/28933
	* testsuite/ld-ctf/diag-strlen-invalid.*: New test,
	derived from diag-cttname-invalid.s.
	* testsuite/ld-ctf/diag-cttname-invalid.s: Fix incorrect cth_strlen.
	* testsuite/ld-ctf/diag-cttname-null.s: Likewise.
	* testsuite/ld-ctf/diag-cuname.s: Likewise.
	* testsuite/ld-ctf/diag-parlabel.s: Likewise.
	* testsuite/ld-ctf/diag-parname.s: Likewise.
2022-03-23 13:48:32 +00:00
Alan Modra
a2c5833233 Update year range in copyright notice of binutils files
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.

The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
2022-01-02 12:04:28 +10:30
Nick Alcock
b62d5edd0a libctf: fix handling of CTF symtypetab sections emitted by older GCC
Older (pre-upstreaming) GCC emits a function symtypetab section of a
format never read by any extant libctf.  We can detect such CTF dicts by
the lack of the CTF_F_NEWFUNCINFO flag in their header, and we do so
when reading in the symtypetab section -- but if the set of symbols with
types is sufficiently sparse, even an older GCC will emit a function
index section.

In NEWFUNCINFO-capable compilers, this section will always be the exact
same length as the corresponding function section (each is an array of
uint32_t, associated 1:1 with each other). But this is not true for the
older compiler, for which the sections are different lengths.  We check
to see if the function symtypetab section and its index are the same
length, but we fail to skip this check when this is not a NEWFUNCINFO
dict, and emit a spurious corruption error for a CTF dict we could
have perfectly well opened and used.

Fix trivial: check the flag (and fix the terrible grammar of the error
message at the same time).

libctf/ChangeLog
2021-09-27  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-open.c (ctf_bufopen_internal): Don't complain about corrupt
	function index symtypetab sections if this is an old-format
	function symtypetab section (which should be ignored in any case).
	Fix bad grammar.
2021-09-27 20:31:25 +01:00
Alan Modra
174fe10cb6 ubsan: libctf: applying zero offset to null pointer
* ctf-open.c (init_symtab): Avoid ubsan error.
2021-09-03 11:45:58 +09:30
Nick Alcock
49da556c65 libctf, include: support an alternative encoding for nonrepresentable types
Before now, types that could not be encoded in CTF were represented as
references to type ID 0, which does not itself appear in the
dictionary. This choice is annoying in several ways, principally that it
forces generators and consumers of CTF to grow special cases for types
that are referenced in valid dicts but don't appear.

Allow an alternative representation (which will become the only
representation in format v4) whereby nonrepresentable types are encoded
as actual types with kind CTF_K_UNKNOWN (an already-existing kind
theoretically but not in practice used for padding, with value 0).
This is backward-compatible, because CTF_K_UNKNOWN was not used anywhere
before now: it was used in old-format function symtypetabs, but these
were never emitted by any compiler and the code to handle them in libctf
likely never worked and was removed last year, in favour of new-format
symtypetabs that contain only type IDs, not type kinds.

In order to link this type, we need an API addition to let us add types
of unknown kind to the dict: we let them optionally have names so that
GCC can emit many different unknown types and those types with identical
names will be deduplicated together.  There are also small tweaks to the
deduplicator to actually dedup such types, to let opening of dicts with
unknown types with names work, to return the ECTF_NONREPRESENTABLE error
on resolution of such types (like ID 0), and to print their names as
something useful but not a valid C identifier, mostly for the sake of
the dumper.

Tests added in the next commit.

include/ChangeLog
2021-05-06  Nick Alcock  <nick.alcock@oracle.com>

	* ctf.h (CTF_K_UNKNOWN): Document that it can be used for
	nonrepresentable types, not just padding.
	* ctf-api.h (ctf_add_unknown): New.

libctf/ChangeLog
2021-05-06  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-open.c (init_types): Unknown types may have names.
	* ctf-types.c (ctf_type_resolve): CTF_K_UNKNOWN is as
	non-representable as type ID 0.
	(ctf_type_aname): Print unknown types.
	* ctf-dedup.c (ctf_dedup_hash_type): Do not early-exit for
	CTF_K_UNKNOWN types: they have real hash values now.
	(ctf_dedup_rwalk_one_output_mapping): Treat CTF_K_UNKNOWN types
	like other types with no referents: call the callback and do not
	skip them.
	(ctf_dedup_emit_type): Emit via...
	* ctf-create.c (ctf_add_unknown): ... this new function.
	* libctf.ver (LIBCTF_1.2): Add it.
2021-05-06 09:30:59 +01:00
Nick Alcock
e4c78f303d libctf: a couple of small error-handling fixes
Out-of-memory errors initializing the string atoms table were
disregarded (though they would have caused a segfault very shortly
afterwards).  Errors hashing types during deduplication were only
reported if they happened on the output dict, which is almost never the
case (most errors are going to be on the dict we're working over, which
is going to be one of the inputs).  (The error was detected in both
cases, but the errno was extracted from the wrong dict.)

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-dedup.c (ctf_dedup_rhash_type): Report errors on the input
	dict properly.
	* ctf-open.c (ctf_bufopen_internal): Report errors initializing
	the atoms table.
2021-03-18 12:40:41 +00:00
Nick Alcock
f4f60336da libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number.  But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).

This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.

This is unhelpful and pointlessly inefficient.

So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.

To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name.  This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.

The actual name->index lookup is done by ctf_lookup_symbol_idx.  We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity.  Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache.  To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache.  This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict.  ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.

(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all.  We can
fix this later by using the archive caching machinery more
aggressively.)

In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive.  We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol.  This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching.  (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)

We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.

include/ChangeLog
2021-02-17  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (ctf_arc_lookup_symbol_name): New.
	(ctf_lookup_by_symbol_name): Likewise.

libctf/ChangeLog
2021-02-17  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
	<ctf_symhash_latest>: Likewise.
	(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
	<ctfi_symnamedicts>: New.
	<ctfi_syms>: Remove.
	(ctf_lookup_symbol_name): Remove.
	* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
	parent properly.  Make static.
	(ctf_lookup_symbol_idx): New, linear search for the symbol name,
	cached in the crossdict cache's ctf_symhash (if available), or
	this dict's (otherwise).
	(ctf_try_lookup_indexed): Allow the symname to be passed in.
	(ctf_lookup_by_symbol): Turn into a wrapper around...
	(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
	using ctf_lookup_symbol_idx in non-writable dicts.  Special-case
	name lookup in dynamic dicts without reported symbols, which have
	no symtab or dynsymidx but where name lookup should still work.
	(ctf_lookup_by_symbol_name): New, another wrapper.
	* ctf-archive.c (enosym): Note that this is present in
	ctfi_symnamedicts too.
	(ctf_arc_close): Adjust for removal of ctfi_syms.  Free the
	ctfi_symnamedicts.
	(ctf_arc_flush_caches): Likewise.
	(ctf_dict_open_cached): Memoize the first cached dict in the
	crossdict cache.
	(ctf_arc_lookup_symbol): Turn into a wrapper around...
	(ctf_arc_lookup_sym_or_name): ... this.  No longer cache
	ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
	still cache the dicts those lookups succeed in).  Add
	lookup-by-name support, with dicts of successful lookups cached in
	ctfi_symnamedicts.  Refactor the caching code a bit.
	(ctf_arc_lookup_symbol_name): New, another wrapper.
	* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
	* libctf.ver (LIBCTF_1.2): New version.  Add
	ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
	* testsuite/libctf-lookup/enum-symbol.c (main): Use
	ctf_arc_lookup_symbol rather than looking up the name ourselves.
	Fish it out repeatedly, to make sure that symbol caching isn't
	broken.
	(symidx_64): Remove.
	(symidx_32): Remove.
	* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
	in an unlinked object file (indexed symtypetab sections only).
	* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
	(try_maybe_reporting): Check symbol types via
	ctf_lookup_by_symbol_name as well as ctf_symbol_next.
	* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
	lookups in a multi-dict archive.
2021-02-20 16:37:08 +00:00