Commit Graph

111 Commits

Author SHA1 Message Date
Nick Alcock
fb8917ac21 libctf, create, types: type and decl tags
These are a little more fiddly than previous kinds, because their
namespacing rules are odd: they have names (so presumably we want an API to
look them up by name), but the names are not unique (they don't need to be,
because they are not entities you can refer to from C), so many distinct
tags in the same TU can have the same name.  Type tags only refer to a type
ID: decl tags refer to a specific function parameter or structure member via
a zero-indexed "component index".

The name tables for these things are a hash of name to a set of type IDs;
rather different from all the other named entities in libctf.  As a
consequence, they can presently be looked up only using their own dedicated
functions, not using ctf_lookup_by_name et al.  (It's not clear if this
restriction could ever be lifted: ctf_lookup_by_name and friends return a
type ID, not a set of them.)

They are similar enough to each other that we can at least have one function
to look up both type and decl tags if you don't care about their
component_idx and only want a type ID: ctf_tag.  (And one to iterate over
them, ctf_tag_next).

(A caveat: because tags aren't widely used or generated yet, much of this is
more or less untested and/or supposition and will need testing later.)

New API, more or less the minimum needed because it's not entirely clear how
these things will be used:

+ctf_id_t ctf_tag (ctf_dict_t *, ctf_id_t tag);
+ctf_id_t ctf_decl_tag (ctf_dict_t *, ctf_id_t decl_tag,
+		       int64_t *component_idx);
+ctf_id_t ctf_tag_next (ctf_dict_t *, const char *tag, ctf_next_t **);
+ctf_id_t ctf_add_type_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *);
+ctf_id_t ctf_add_decl_type_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *);
+ctf_id_t ctf_add_decl_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *,
+			   int component_idx);
2025-04-25 18:07:43 +01:00
Nick Alcock
39cdb3e395 libctf: create, types: functions, linkage, arg names (API ADVICE)
Functions change in CTFv4 by growing argument names as well as argument
types; the representation changes into a two-element array of (type, string
offset) rather than a simple array of arg types.  Functions also gain an
explicit linkage in a different type kind (CTF_K_FUNC_LINKAGE, which
corresponds to BTF_KIND_FUNC).

New API:

 typedef struct ctf_funcinfo {
 /* ... */
-  uint32_t ctc_argc;		/* Number of typed arguments to function.  */
+  size_t ctc_argc;		/* Number of typed arguments to function.  */
};

int ctf_func_arg_names (ctf_dict_t *, unsigned long, uint32_t, const char **);
int ctf_func_type_arg_names (ctf_dict_t *, ctf_id_t, uint32_t,
		 	    const char **names);
+extern int ctf_type_linkage (ctf_dict_t *, ctf_id_t);
-extern ctf_id_t ctf_add_function (ctf_dict_t *, uint32_t,
-				  const ctf_funcinfo_t *, const ctf_id_t *);
+extern ctf_id_t ctf_add_function (ctf_dict_t *, uint32_t,
+				  const ctf_funcinfo_t *, const ctf_id_t *,
+				  const char **arg_names);
+extern ctf_id_t ctf_add_function_linkage (ctf_dict_t *, uint32_t,
+					  ctf_id_t, const char *, int linkage);

Adding this is fairly straightforward; the only annoying part is the way the
callers need to allocate space for the arg name and type arrays.  Maybe we
should rethink these into something like ctf_type_aname(), allocating
space for the caller so the caller doesn't need to?  It would certainly
make all the callers in libctf much less complex...

While we're at it, adjust ctf_type_reference, ctf_type_align, and
ctf_type_size for the new internal API changes (they also all have
special-case code for functions).
2025-04-25 18:07:43 +01:00
Nick Alcock
ea21a1b2ae libctf: create, types: variables and datasecs (REVIEW NEEDED)
This is an area of significant difference from CTFv3.  The API changes
significantly, with quite a few additions to allow creation and querying of
these new datasec entities:

-typedef int ctf_variable_f (const char *name, ctf_id_t type, void *arg);
+typedef int ctf_variable_f (ctf_dict_t *, const char *name, ctf_id_t type,
+			    void *arg);
+typedef int ctf_datasec_var_f (ctf_dict_t *fp, ctf_id_t type, size_t offset,
+			       size_t datasec_size, void *arg);

+/* Search a datasec for a variable covering a given offset.
+
+   Errors with ECTF_NODATASEC if not found.  */
+
+ctf_id_t ctf_datasec_var_offset (ctf_dict_t *fp, ctf_id_t datasec,
+				 uint32_t offset);
+
+/* Return the datasec that a given variable appears in, or ECTF_NODATASEC if
+   none.  */
+
+ctf_id_t ctf_variable_datasec (ctf_dict_t *fp, ctf_id_t var);

+int ctf_datasec_var_iter (ctf_dict_t *, ctf_id_t, ctf_datasec_var_f *,
+			  void *);
+ctf_id_t ctf_datasec_var_next (ctf_dict_t *, ctf_id_t, ctf_next_t **,
+			       size_t *size, size_t *offset);

-int ctf_add_variable (ctf_dict_t *, const char *, ctf_id_t);
+/* ctf_add_variable adds variables to no datasec at all;
+   ctf_add_section_variable adds them to the given datasec, or to no datasec at
+   all if the datasec is NULL.  */
+
+ctf_id_t ctf_add_variable (ctf_dict_t *, const char *, int linkage, ctf_id_t);
+ctf_id_t ctf_add_section_variable (ctf_dict_t *, uint32_t,
+				   const char *datasec, const char *name,
+				   int linkage, ctf_id_t type,
+				   size_t size, size_t offset);

We tie datasecs quite closely to variables at addition (and, as should
become clear later, dedup) time: you never create datasecs, you only create
variables *in* datasecs, and the datasec springs into existence when you do
so: datasecs are always found in the same dict as the variables they contain
(the variables are never in the parent if the datasec is in a child or
anything).  We keep track of the variable->datasec mapping in
ctf_var_datasecs (populating it at addition and open time), to allow
ctf_variable_datasec to work at reasonable speed.  (But, as yet, there are
no tests of this function at all.)

The datasecs are created unsorted (to avoid variable addition becoming
O(n^2)) and sorted at serialization time, and when ctf_datasec_var_offset is
invoked.

We reuse the natural-alignment code from struct addition to get a plausible
offset in datasecs if an alignment of -1 is specified: maybe this is
unnecessary now (it was originally added when ctf_add_variable added
variables to a "default datasec", while now it just leaves them out of
all datasecs, like externs are).

One constraint of this is that we currently prohibit the addition of
nonrepresentable-typed variables, because we can't tell what their natural
alignment is: if we dropped the whole "align" and just required everyone
adding a variable to a datasec to specify an offset, we could drop that
restriction. WDYT?

One additional caveat: right now, ctf_lookup_variable() looks up the type of
a variable (because when it was invented, variables were not entities in
themselves that you could look up).  This name is confusing as hell as a
result.  It might be less confusing to make it return the CTF_K_VAR, but
that would be awful to adapt callers to, since both are represented with
ctf_id_t's, so the compiler wouldn't warn about the needed change at all...
I've vacillated on this three or four times now.
2025-04-25 18:07:43 +01:00
Nick Alcock
2c1a0a70d1 libctf, create, types: encoding, BTF floats
This adds support for the nearly useless BTF_KIND_FLOAT, under the name
CTF_K_BTF_FLOAT.  At the same time we fix up the ctf_add_encoding and
ctf_type_encoding machinery for the new API changes.

I expect this to change a bit: Ali Bahrami reckons I've oversimplified the
CTFv4 encoding representation and need to reintroduce at least a width.

New API:

ctf_id_t ctf_add_btf_float (ctf_dict_t *, uint32_t,
                            const char *, const ctf_encoding_t *);
2025-04-25 18:07:42 +01:00
Nick Alcock
03609073b0 libctf: create, types: enums and enum64s; type encoding
This commit adapts most aspects of enum handling: querying and iteration,
enumerator querying and iteration, ctf_type_add, etc.  We have to adapt to
enum64s and to signed versus unsigned enums, to our vlen and DTD changes and
other internal API changes to handle prefix types etc, and fix the types of
things to allow for 64-bit enumerators.  We can also (finally!) get useful
info on enum size rather than being restricted to a value hardwired into
libctf.

We also adjust all the type-encoding functions for the internal API changes,
since enums are the first encodable entities we have covered.

API changes:

-typedef int ctf_enum_f (const char *name, int val, void *arg);
+typedef int ctf_enum_f (const char *name, int64_t val, void *arg);
+typedef int ctf_unsigned_enum_f (const char *name, uint64_t val, void *arg);

-extern const char *ctf_enum_name (ctf_dict_t *, ctf_id_t, int);
-extern int ctf_enum_value (ctf_dict_t *, ctf_id_t, const char *, int *);
+extern const char *ctf_enum_name (ctf_dict_t *, ctf_id_t, int64_t);
+extern int ctf_enum_value (ctf_dict_t *, ctf_id_t, const char *, int64_t *);
+extern int ctf_enum_unsigned_value (ctf_dict_t *, ctf_id_t, const char *, uint64_t *);
+
+/* Return 1 if this enum's contents are unsigned, so you can tell which of the
+   above functions to use.  */
+
+extern int ctf_enum_unsigned (ctf_dict_t *, ctf_id_t);

-/* Return all enumeration constants in a given enum type.  */
-extern int ctf_enum_iter (ctf_dict_t *, ctf_id_t, ctf_enum_f *, void *);
+/* Return all enumeration constants in a given enum type.  The return value, and
+   VAL argument, may need to be cast to uint64_t: see ctf_enum_unsigned().  */
+extern int64_t ctf_enum_iter (ctf_dict_t *, ctf_id_t, ctf_enum_f *, void *);
 extern const char *ctf_enum_next (ctf_dict_t *, ctf_id_t, ctf_next_t **,
-				  int *);
+				  int64_t *);
+
+/* enums are created signed by default.  If you want an unsigned enum,
+   use ctf_add_enum_encoded() with an encoding of 0 (CTF_INT_SIGNED and
+   everything else off).  This will not create a slice, unlike all other
+   uses of ctf_add_enum_encoded(), and the result is still representable
+   as BTF.  */
+
+extern ctf_id_t ctf_add_enum64_encoded (ctf_dict_t *, uint32_t, const char *,
+					const ctf_encoding_t *);
+extern ctf_id_t ctf_add_enum64 (ctf_dict_t *, uint32_t, const char *);

-extern int ctf_add_enumerator (ctf_dict_t *, ctf_id_t, const char *, int);
+extern int ctf_add_enumerator (ctf_dict_t *, ctf_id_t, const char *, int64_t);

The only aspects of enums that are not now handled are forwards to enums,
dumping of enums, and deduplication of enums.
2025-04-25 18:07:42 +01:00
Nick Alcock
ceb15ece5e libctf: create: ctf_add_type modifications
This adapts ctf_add_type a little, adding support for prefix types, shifting
away from hardwired things towards API functions that can adapt to the CTFv4
changes, adapting to the structure/union API changes, and adding bitfielded
structures and structure members as needed.
2025-04-25 18:07:42 +01:00
Nick Alcock
20e6f72dc7 libctf: create: structure and union member addition
There is one API addition here:

int ctf_add_member_bitfield (ctf_dict_t *, ctf_id_t souid,
                             const char *, ctf_id_t type,
                             unsigned long bit_offset,
                             int bit_width);

SoU addition handles the representational changes for bitfields and for
CTF_K_BIG structs (i.e. all structs you can add members to), errors out if
you add bitfields to structs that aren't created with the
CTF_ADD_STRUCT_BITFIELDS flag, and arranges to add padding as needed if
there is too much of a gap for the offsets to encode in one hop (that
part is still untested).
2025-04-25 18:07:42 +01:00
Nick Alcock
cd8ea31666 libctf: create: struct/union addition
There's one API addition here: the existing CTF_ADD_ROOT / CTF_ADD_NONROOT
flags can have a new flag ORed with them, CTF_ADD_STRUCT_BITFIELDS,
indicating that the newly-added struct/union is capable of having bitfields
added to it via the new ctf_add_member_bitfield function (see a later
commit).

Without this, you can only add bitfields via the deprecated slice or base
type encoding representations (the former will force CTF output).

Implementation notes: structs and unions are always added with a CTF_K_BIG
prefix: if promoting from a forward, one is added.  These are elided at
serialization time if they are not needed to encode this size of struct /
this number of members.  (This means you don't have to figure out in advance
if your struct will be too big for BTF: you can just add members to it,
and libctf will figure it out and upgrade the dict as needed, or tell you
it can't if you've forbidden such things.)

We take advantage of this to merge a couple of very similar functions,
saving a bit of code.
2025-04-25 18:07:42 +01:00
Nick Alcock
d5bb2772c6 libctf: create: DTD addition and deletion; ctf_rollback
DTD deletion changes mostly relate to the changes to the ctf_dtdef_t, but
also we no longer nede to have special handling for forwards (we can just
use ctf_type_kind_forwarded like everyone else).

Rollback no longer needs to delete things by hand (it hasn't needed to for
years): it can just call ctf_dtd_delete.

ctf_add_generic changes substantially, mostly to allow for the ctf_dtdef_t
changes.  Rather than returning a type ID it now returns the DTD it just
allocated: it can also be asked to add some prefixes, and return the first
prefix added (which may not be the first prefix in the type, because if it
is asked to add a non-root-visible type it will additionally allocate a
CTF_K_CONFLICTING prefix to encode that).

Finally, duplicate name detection is suppressed for type and decl tags.
2025-04-25 18:07:42 +01:00
Nick Alcock
05a2970ad1 libctf: create, lookup: delete DVDs; ctf_lookup_by_kind
Variable handling in BTF and CTFv4 works quite differently from in CTFv3.
Rather than a separate section containing sorted, bsearchable variables,
they are simply named entities like types, stored in CTF_K_VARs.

As a first stage towards migrating to this, delete most references to
the ctf_varent_t and ctf_dvdef_t, including the DVD lookup code, all
the linking code, and quite a lot of the serialization code.

Note: CTF_LINK_OMIT_VARIABLES_SECTION, and the whole "delete variables that
already exist in the symtypetabs section" stuff, has yet to be
reimplemented.  We can implement CTF_LINK_OMIT_VARIABLES_SECTION by simply
excising all CTF_K_VARs at deduplication time if requested.  (Note:
symtypetabs should still point directly at the type, not at the CTF_K_VAR.)

(Symtypetabs in general need a bit more thought -- perhaps we can now store
them in a separate .ctf.symtypetab section with its own little four-entry
header for the symtypetabs and their indexes, making .ctf even more like
.BTF; the only difference would then be that .ctf could include prefix
types, CTF_K_FLOAT, and external string refs.  For later discussion.)

We also add ctf_lookup_by_kind() at this stage (because it is hopelessly
diff-entangled with ctf_lookup_variable): this looks up a type of a
particular kind, without needing a per-kind lookup function for it,
nor needing to hack around adding string prefixes (so you can do
ctf_lookup_by_kind (fp, CTF_K_STRUCT, "foo") rather than having to
do ctf_lookup_by_name (fp, "struct foo"): often this is more convenient, and
anything that reduces string buffer manipulation in C is good.)
2025-04-25 18:07:42 +01:00
Nick Alcock
a93ad066f7 libctf: create: vlen growth and prefix addition (NEEDS REVIEW)
This commit modifies ctf_grow_vlen to account for the recent changes to
ctf_dtdef_t, and adds a new ctf_add_prefix function to add a prefix to an
existing type, moving the dtd_data and dtd_vlen up accordinly.

It deserves close review, since this is probably the single greatest bug
cluster in libctf: the number of times I added to a variable of type
ctf_type_t and assumed it would move it in bytes rather than ctf_type_t
units is hard to believe.
2025-04-25 18:07:42 +01:00
Nick Alcock
ad13b7d44f libctf: CTFv4: type opening
The majority of this commit rejigs the core type table opening
code for CTFv4: there are a few ancillary bits it drags in,
indicated below.

The internal definition of a child dict (that may not have type or string
lookups performed in it until ctf_open time) used to be 'has a
cth_parent_name', but since BTF doesn't have one of those at all, we add
an additional check: a dict the first byte of whose strtab is not 0 must
be a child.  (If *either* is true, this is a child dict, which allows for
the possibility of CTF dicts with non-deduplicated strtabs -- thus with
leading \0's -- to exist in future.)

The initial sweep through the type table in init_static_types (to size
the name-table lookup hashes) also now checks for various types which
indicate that this must be a CTF dict, in addition to being adjusted
to cater for new CTFv4 representations of things like forwards.  (At
this early stage, we cannot rely on the functions in ctf-type.c to
abstract over this for us.)

We make some new hashtables for new namespace-like things: datasecs
and type and decl tags.

The main name-population loop in init_static_types_names_internal
takes prefixes into account, looking for the name on the suffix type
(where the name is always found).  LSTRUCT handling is removed (they
no longer exist); ENUM64s, enum forwards, VARs, datasecs, and type
and decl tags get their names suitably populated.  Some buggy code
which tried to populate the name tables for cvr-quals (which are
nameless) was dropped.

We add an extra pass which traverses all datasecs and keeps track of which
datasec each var is instantiated in (if any) in a new ctf_var_datasecs hash
table.  (This uses a number of type-querying functions which don't yet
exist: they'll be added in the upcoming commits.)

We handle the type 0 == void case by pointing the first element of
ctf_txlate at a type read in named "void" (making type 0 an alias to it),
or, if one doesn't exist, creating a new one (outside the type table and dtd
arrays), and pointing type 0 at that.  Since it is numbered 0 and not in the
type table or dtd arrays, it will never be written out at serialization
time, but since it is *present*, libctf consumers who expect the void type
to have an integral definition rather than being a magic number will get
what they expect.
2025-04-25 18:07:42 +01:00
Nick Alcock
456d5bedcc types: add some more error checking
A few places with inadequate error checking have fallen out of the
ctf_id_t work:

 - ctf_add_slice doesn't make sure that the type it is slicing
   actually exists
 - ctf_add_member_offset doesn't check that the type of the member
   exists (though it will often fail if it doesn't, it doesn't
   explicitly check, so if you're unlucky it can sometimes succeed,
   giving you a corrupted dict)
 - ctf_type_encoding doesn't check whether its slied type exists:
   it should verify it so it can return a decent error, rather than
   a thoroughly misleading one
 - ctf_type_compat has the same problem with respect to both of its
   arguments. It would definitely be nicer if we could call
   ctf_type_compat and just get a boolean answer, but it's not
   clear to me whether a type can be said to be compatible *or*
   incompatible with a nonexistent one, and we should probably alert
   the users to a likely bug regardless.  C error checking, sigh...
2025-03-16 15:25:28 +00:00
Nick Alcock
648f857144 Tiny stylistic spacing and comment tweaks 2025-03-16 15:25:28 +00:00
Nick Alcock
b5d3790c66 libctf: consecutive ctf_id_t assignment
This change modifies type ID assignment in CTF so that it works like BTF:
rather than flipping the high bit on for types in child dicts, types ascend
directly from IDs in the parent to IDs in the child, without interruption
(so type 0x4 in the parent is immediately followed by 0x5 in all children).

Doing this while retaining useful semantics for modification of parents is
challenging.  By definition, child type IDs are not known until the parent
is written out, but we don't want to find ourselves constrained to adding
types to the parent in one go, followed by all child types: that would make
the deduplicator a nightmare and would frankly make the entire ctf_add*()
interface next to useless: all existing clients that add types at all
add types to both parents and children without regard for ordering, and
breaking that would probably necessitate redesigning all of them.

So we have to be a litle cleverer.

We approach this the same way as we approach strings in the recent refs
rework: if a parent has children attached (or has ever had them attached
since it was created or last read in), any new types created in the parent
are assigned provisional IDs starting at the very top of the type space and
working down.  (Their indexes in the internal libctf arrays remain
unchanged, so we don't suddenly need multigigabyte indexes!).  At writeout
(preserialization) time, we traverse the type table (and all other table
containing type IDs) and assign refs to every type ID in exactly the same
way we assign refs to every string offset (just a different set of refs --
we don't want to update type IDs with string offset values!).

For a parent dict with children, these refs are real entities in memory:
pointers to the memory locations where type IDs are stored, tracked in the
DTD of each type.  As we traverse the type table, we assign real IDs to each
type (by simple incrementation), storing those IDs in a new dtd_final_type
field in the DTD for each type.  Once the type table and all other tables
containing type IDs are fully traversed, we update all the refs and
overwrite the IDs currently residing in each with the final IDs for each
type.

That fixes up IDs in the parent dict itself (including forward references in
structs and the like: that's why the ref updates only happen at the end);
but what about child dicts' references, both to parent types and to their
own?  We add armouring to enforce that parent dicts are always serialized
before their children (which ctf-link.c already does, because it's a
precondition for strtab deduplication), and then arrange that when a ref is
added to a type whose ID has been assigned (has a dtd_final_type), we just
immediately do an update rather than storing a ref for later updating.
Since the parent is already serialized, all parent type IDs have a
dtd_final_type by this point, and all parent IDs in the children are
properly updated. The child types can now be renumbered now we now the
number of types in the parent, and their refs updated identically to what
was just done with the parent.

One wrinkle: before the child refs are updated, while we are working over
the child's type section, the type IDs in the child start from 1 (or
something like that), which might seem to overlap the parent IDs.  But this
is not the case: when you serialize the parent, the IDs written out to disk
are changed, but the only change to the representation in memory is that we
remember a dtd_final_type for each type (and use it to update all the child
type refs): its ID in memory is the same as it always was, a nonoverlapping
provisional ID higher than any other valid ID.  We enforce all of this by
asserting that when you add a ref to a type, the memory location that is
modified must be in the buffer being serialized: the code will not let you
accidentally modify the actual DTDs in memory.

We track the number of types in the parent in a new CTFv4 (not BTF) header
field (the dumper is updated): we will also use this to open CTFv3 child
dicts without change by simply declaring for them that the parent dict has
2^31 types in it (or 2^15, for v2 and below): the IDs in the children then
naturally come out right with no other changes needed.  (Right now, opening
CTFv3 child dicts requires extra compatibility code that has not been
written, but that code will no longer need to worry about type ID
differences.)

Various things are newly forbidden:

 - you cannot ctf_import() a child into a parent if you already ctf_add()ed
   types to the child, because all its IDs would change (and since you
   already cannot ctf_add() types to a child that hasn't had its parent
   imported, this in practice means only that ctf_create() must be followed
   immediately by a ctf_import() if this is a new child, which all sane
   clients were doing anyway).

 - You cannot import a child into a parent which has the wrong number of
   (non-provisional) types, again because all its IDs would be wrong:
   because parents only add types in the provisional space if children are
   attached to it, this would break the not unknown case of opening an
   archive, adding types to the parent, and only then importing children
   into it, so we add a special case: archive members which are not children
   in an archive with more than one member always pretend to have at least
   one child, so type additions in them are always provisional even before
   you ctf_import anything. In practice, this does exactly what we want,
   since all archives so far are created by the linker and have one parent
   and N children of that parent.

Because this introduces huge gaps between index and type ID for provisional
types, some extra assertions are added to ensure that the internal
ctf_type_to_index() is only ever called on types in the current dict (never
a parent dict): before now, this was just taken on trust, and it was often
wrong (which at best led to wrong results, as wrong array indexes were used,
and at worst to a buffer overflow). When hash debugging is on (suggesting
that the user doesn't mind expensive checks), every ctf_type_to_index()
triggers a ctf_index_to_type() to make sure that the operations are proper
inverses.

Lots and lots of tests are added to verify that assignment works and that
updating of every type kind works fine -- existing tests suffice for
type IDs in the variable and symtypetab sections.

The ld-ctf tests get a bunch of largely display-based updates: various
tests refer to 0x8... type IDs, which no longer exist, and because the
IDs are shorter all the spacing and alignment has changed.
2025-03-16 15:25:27 +00:00
Nick Alcock
5a1d8eca5c libctf: fix slices of slices and of enums
Slices had a bunch of horrible usability problems.  In particular, while
towers of cv-quals are resolved away by functions that need to do it, towers
of cv-quals with slices in the middle are not resolved away by functions
like ctf_enum_value that can see through slices: resolving volatile -> slice
-> const -> enum will leave it with a 'const', which will error pointlessly,
annoying callers, who reasonably expect slices to be more invisible than
this.  (The user-callable ctf_type_resolve still does not resolve away
slices, because this is the only way users can see that the slices are there
at all.)

This is induced by a fix for another wart: ctf_add_enumerator does not
resolve anything away at all, so you can't even add enumerators to const or
volatile enums -- and more problematically, you can't add enumerators to
enums with an explicit encoding without resolving away the types by hand,
since ctf_add_enum_encoded works by returning a slice!  ctf_add_enumerator
now resolves away all of those, so any cvr-or-typedef-or-slice-qual
terminating in an enum can be added to, exactly as callers likely expect.

(New tests added.)

libctf/
	* ctf-create.c (ctf_add_enumerator): Resolve away cvr-qualness.
	* ctf-types.c (ctf_type_resolve_unsliced): Don't terminate at
	the first slice.
	* testsuite/libctf-writable/slice-of-slice.*: New test.
2025-02-28 15:13:24 +00:00
Nick Alcock
a480362d88 libctf: string: refs rework
This commit moves provisional (not-yet-serialized) string refs towards the
scheme to be used for CTF IDs in the future.  In particular

 - provisional string offsets now count downwards from just under the
   external string offset space (all bits on but the high bit).  This makes
   it possible to detect an overflowing strtab, and also makes it trivial to
   determine whether any string offset (ref) updates were missed -- where
   before we might get a slightly corrupted or incorrect string, we now get
   a huge high strtab offset corresponding to no string, and an error is
   emitted at read time.

 - refs are emitted at serialization time during the pass through the types.
   They are strictly associated with the newly-written-out buffer: the
   existing opened CTF dict is not changed, though it does still get the new
   strtab so that new refs to the same string can just refer directly to it.
   The provisional strtab hash table that contains these strings is not
   deleted after serialization (because we might serialize again): instead,
   we keep track in the parent of the lowest-yet-used ("latest") provisional
   strtab offset, and any strtab offset above that, but not external
   (high-bit-on) is considered provisional.

   This is sort-of-enforced by moving most of the ref-addition function
   declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is
   not included by ctf-create.c or ctf-open.c.

 - because we don't add refs when adding types, we don't need to handle the
   case where we add things to expanding vlens (enums, struct members) and
   have to realloc() them.  So the entire painful movable refs system can
   just be deleted, along with the ability to remove refs piecemeal at all
   (purging all of them is still possible).  Strings added during type
   addition are added via ctf_str_add(), which adds no refs: the strings are
   picked up at serialization time and refs to their final, serialized
   resting place added.  The DTDs never have any refs in them, and their
   provisional strtab offsets are never updated by the ref system.

This caused several bugs to fall out of the earlier work and get fixed.
In particular, attempts to look up a string in a child dict now search
the parent's provisional strtab too: we add some extra special casing
for the null string so we don't need to worry about deduplication
moving it somewhere other than offset zero.

Finally, the optimization that removes an unreferenced synthetic external
strtab (the record of the strings the linker has told us about, kept around
internally for lookup during late serialization) is faulty: references to a
strtab entry will only produce CTF-level refs if their value might change,
and an external string's offset won't change, so it produces no refs: worse
yet, even if we did get a ref (say, if the string was originally believed
to be internal and only later were we told that the linker knew about it
too), when we serialize a strtab, all its refs are dropped (since they've
been updated and can no longer change); so if we serialized it a second
time, its synthetic external strtab would be considered empty and dropped,
even though the same external strings as before still exist, referencing
it.  We must keep the synthetic external strtab around as long as external
strings exist that reference it, i.e. for the life of the dict.

One benefit of all this: now we're emitting provisional string offsets at
a really high value, it's out of the way of the consecutive, deduplicated
string offsets in child dicts.  So we can drop the constraint that you
cannot add strings to a dict with children, which allows us to add types
freely to parent dicts again.  What you can't do is write that dict out
again: when we serialize, we currently update the dict being serialized
with the updated strtabs: when you write a dict out, its provisional
strings become real strings, and suddenly the offsets would overlap once
more.  But opening a dict and its children, adding to it, and then
writing it out again is rare indeed, and we have a workaround: anyone
wanting to do this can just use ctf_link instead.
2025-02-28 15:13:24 +00:00
Nick Alcock
97a72b2a35 libctf: create: fix vlen / vbytes confusion
The initial_vlen parameter to ctf_add_generic is misnamed: it's not the
initial vlen (the initial number of members of a struct, etc), but rather
the initial size of the vlen region.  We have a term for that, vbytes: use
it.

Amazingly this doesn't seem to have caused any bugs to creep in.
2025-02-28 15:13:24 +00:00
Nick Alcock
dc93d01ff2 libctf: de-macroize LCTF_TYPE_TO_INDEX / LCTF_INDEX_TO_TYPE
Making these functions is unnecessary right now, but will become much
clearer shortly.

While we're at it, we can drop the third child argument to
LCTF_INDEX_TO_TYPE: it's only used for nontrivial purposes that aren't
literally the same as getting the result from the fp in one place,
in ctf_lookup_by_name_internal, and that place is easily fixed by just
looking in the right dictionary in the first place.
2025-02-28 15:13:24 +00:00
Nick Alcock
003f19bfa7 libctf: make ctf_dynamic_type() the inverse of ctf_static_type()
They're meant to be inverses, which makes it unfortunate that
they check different bounds.  No visible effect yet, since
ctf_typemax and ctf_stypes currently cover the entire type ID
space, but will have an effect shortly.
2025-02-28 15:13:24 +00:00
Nick Alcock
b875301e74 libctf: drop LCTF_TYPE_ISPARENT/LCTF_TYPE_ISCHILD
Parent/child determination is about to become rather more complex, making a
macro impractical.  Use the ctf_type_isparent/ischild function calls
everywhere and remove the macro.  Make them more const-correct too, to
make them more widely usable.

While we're about it, change several places that hand-implemented
ctf_get_dict() to call it instead, and armour several functions against
the null returns that were always possible in this case (but previously
unprotected-against).
2025-02-28 15:13:24 +00:00
Nick Alcock
9835747b21 libctf: generalize the ref system
Despite the removal of the separate movable ref list, the ref system as
a whole is more than complex enough to be worth generalizing now that
we are adding different kinds of ref.

Refs now are lists of uint32_t * which can be updated through the
pointer for all entries in the list and moved to new sites for all
pointers in a given range: they are no longer references to string
offsets in particular and can be references to other uint32_t-sized
things instead (note that ctf_id_t is a typedef to a uint32_t).

ctf-string.c has been adjusted accordingly (the adjustments are tiny,
more or less just turning a bunch of references to atom into
&atom->csa_refs).
2025-02-28 15:13:24 +00:00
Nick Alcock
06cefeee6b libctf: fix obsolete comment
Pending refs haven't been a thing for a while now.

libctf/
	* ctf-create.c (ctf_add_member_offset): Fix comment.
2025-02-28 14:47:24 +00:00
Nick Alcock
70d05ab0b2 libctf: add mechanism to prohibit most operations without a strtab
We are about to add machinery that deduplicates a child dict's strtab
against its parent.  Obviously if you open such a dict but do not import its
parent, all strtab lookups must fail: so add an LCTF_NO_STR flag that is set
in that window and make most operations fail if it's not set.  (Two more
that will be set in future commits are serialization and string lookup
itself.)

Notably, not all symbol lookup is impossible in this window: you can still
look up by symbol index, as long as this dict is not using an indexed
strtypetab (which obviously requires string lookups to get the symbol name).

include/
	* ctf-api.h (_CTF_ERRORS) [ECTF_HASPARENT]: New.
        [ECTF_WRONGPARENT]: Likewise.
	(ECTF_NERR): Update.
        Update comments to note the new limitations on ctf_import et al.

libctf/
	* ctf-impl.h (LCTF_NO_STR): New.
	* ctf-create.c (ctf_rollback): Error out when LCTF_NO_STR.
	(ctf_add_generic): Likewise.
	(ctf_add_struct_sized): Likewise.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_forward): Likewise.
	(ctf_add_unknown): Likewise.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.
	(ctf_add_variable): Likewise.
	(ctf_add_funcobjt_sym_forced): Likewise.
	(ctf_add_type): Likewise (on either dict).
	* ctf-dump.c (ctf_dump): Likewise.
	* ctf-lookup.c (ctf_lookup_by_name): Likewise.
	(ctf_lookup_variable): Likewise. Likewise.
	(ctf_lookup_enumerator): Likewise.
	(ctf_lookup_enumerator_next): Likewise.
	(ctf_symbol_next): Likewise.
	(ctf_lookup_by_sym_or_name): Likewise, if doing indexed lookups.
	* ctf-types.c (ctf_member_next): Likewise.
	(ctf_enum_next): Likewise.
	(ctf_type_aname): Likewise.
	(ctf_type_name_raw): Likewise.
	(ctf_type_compat): Likewise, for either dict.
	(ctf_member_info): Likewise.
	(ctf_enum_name): Likewise.
	(ctf_enum_value): Likewise.
	(ctf_type_rvisit): Likewise.
	(ctf_variable_next): Note that we don't need to test LCTF_NO_STR.
2025-02-28 14:47:24 +00:00
Alan Modra
e8e7cf2abe Update year range in copyright notice of binutils files 2025-01-01 18:29:57 +10:30
Nick Alcock
21397b78f9 libctf: fix ref leak of names of newly-inserted non-root-visible types
A bug in ctf_dtd_delete led to refs in the string table to the
names of non-root-visible types not being removed when the DTD
was.  This seems harmless, but actually it would lead to a write
down a pointer into freed memory if such a type was ctf_rollback()ed
over and then the dict was serialized (updating all the refs as the
strtab was serialized in turn).

Bug introduced in commit fe4c2d5563
("libctf: create: non-root-visible types should not appear in name tables")
which is included in binutils 2.35.

libctf/
	* ctf-create.c (ctf_dtd_delete): Remove refs for all types
	with names, not just root-visible ones.
2024-07-31 21:10:06 +01:00
Nick Alcock
6da9267482 libctf, include: add ctf_dict_set_flag: less enum dup checking by default
The recent change to detect duplicate enum values and return ECTF_DUPLICATE
when found turns out to perturb a great many callers.  In particular, the
pahole-created kernel BTF has the same problem we historically did, and
gleefully emits duplicated enum constants in profusion.  Handling the
resulting duplicate errors from BTF -> CTF converters reasonably is
unreasonably difficult (it amounts to forcing them to skip some types or
reimplement the deduplicator).

So let's step back a bit.  What we care about mostly is that the
deduplicator treat enums with conflicting enumeration constants as
conflicting types: programs that want to look up enumeration constant ->
value mappings using the new APIs to do so might well want the same checks
to apply to any ctf_add_* operations they carry out (and since they're
*using* the new APIs, added at the same time as this restriction was
imposed, there is likely to be no negative consequence of this).

So we want some way to allow processes that know about duplicate detection
to opt into it, while allowing everyone else to stay clear of it: but we
want ctf_link to get this behaviour even if its caller has opted out.

So add a new concept to the API: dict-wide CTF flags, set via
ctf_dict_set_flag, obtained via ctf_dict_get_flag.  They are not bitflags
but simple arbitrary integers and an on/off value, stored in an unspecified
manner (the one current flag, we translate into an LCTF_* flag value in the
internal ctf_dict ctf_flags word). If you pass in an invalid flag or value
you get a new ECTF_BADFLAG error, so the caller can easily tell whether
flags added in future are valid with a particular libctf or not.

We check this flag in ctf_add_enumerator, and set it around the link
(including on child per-CU dicts).  The newish enumerator-iteration test is
souped up to check the semantics of the flag as well.

The fact that the flag can be set and unset at any time has curious
consequences. You can unset the flag, insert a pile of duplicates, then set
it and expect the new duplicates to be detected, not only by
ctf_add_enumerator but also by ctf_lookup_enumerator.  This means we now
have to maintain the ctf_names and conflicting_enums enum-duplication
tracking as new enums are added, not purely as the dict is opened.
Move that code out of init_static_types_internal and into a new
ctf_track_enumerator function that addition can also call.

(None of this affects the file format or serialization machinery, which has
to be able to handle duplicate enumeration constants no matter what.)

include/
	* ctf-api.h (CTF_ERRORS) [ECTF_BADFLAG]: New.
	(ECTF_NERR): Update.
	(CTF_STRICT_NO_DUP_ENUMERATORS): New flag.
	(ctf_dict_set_flag): New function.
	(ctf_dict_get_flag): Likewise.

libctf/
	* ctf-impl.h (LCTF_STRICT_NO_DUP_ENUMERATORS): New flag.
	(ctf_track_enumerator): Declare.
	* ctf-dedup.c (ctf_dedup_emit_type): Set it.
	* ctf-link.c (ctf_create_per_cu): Likewise.
	(ctf_link_deduplicating_per_cu): Likewise.
	(ctf_link): Likewise.
	(ctf_link_write): Likewise.
	* ctf-subr.c (ctf_dict_set_flag): New function.
	(ctf_dict_get_flag): New function.
	* ctf-open.c (init_static_types_internal): Move enum tracking to...
	* ctf-create.c (ctf_track_enumerator): ... this new function.
	(ctf_add_enumerator): Call it.
	* libctf.ver: Add the new functions.
	* testsuite/libctf-lookup/enumerator-iteration.c: Test them.
2024-07-31 21:02:05 +01:00
Nick Alcock
6e09d4a6e6 libctf: prohibit addition of enums with overlapping enumerator constants
libctf has long prohibited addition of enums with overlapping constants in a
single enum, but now that we are properly considering enums with overlapping
constants to be conflciting types, we can go further and prohibit addition
of enumeration constants to a dict if they already exist in any enum in that
dict: the same rules as C itself.

We do this in a fashion vaguely similar to what we just did in the
deduplicator, by considering enumeration constants as identifiers and adding
them to the core type/identifier namespace, ctf_dict_t.ctf_names.  This is a
little fiddly, because we do not want to prohibit opening of existing dicts
into which the deduplicator has stuffed enums with overlapping constants!
We just want to prohibit the addition of *new* enumerators that violate that
rule.  Even then, it's fine to add overlapping enumerator constants as long
as at least one of them is in a non-root type.  (This is essential for
proper deduplicator operation in cu-mapped mode, where multiple compilation
units can be smashed into one dict, with conflicting types marked as
hidden: these types may well contain overlapping enumerators.)

So, at open time, keep track of all enums observed, then do a third pass
through the enums alone, adding each enumerator either to the ctf_names
table as a mapping from the enumerator name to the enum it is part of (if
not already present), or to a new ctf_conflicting_enums hashtable that
tracks observed duplicates. (The latter is not used yet, but will be soon.)

(We need to do a third pass because it's quite possible to have an enum
containing an enumerator FOO followed by a type FOO: since they're processed
in order, the enumerator would be processed before the type, and at that
stage it seems nonconflicting.  The easiest fix is to run through the
enumerators after all type names are interned.)

At ctf_add_enumerator time, if the enumerator to which we are adding a type
is root-visible, check for an already-present name and error out if found,
then intern the new name in the ctf_names table as is done at open time.

(We retain the existing code which scans the enum itself for duplicates
because it is still an error to add an enumerator twice to a
non-root-visible enum type; but we only need to do this if the enum is
non-root-visible, so the cost of enum addition is reduced.)

Tested in an upcoming commit.

libctf/
	* ctf-impl.h (ctf_dict_t) <ctf_names>: Augment comment.
        <ctf_conflicting_enums>: New.
	(ctf_dynset_elements): New.
	* ctf-hash.c (ctf_dynset_elements): Implement it.
	* ctf-open.c (init_static_types): Split body into...
        (init_static_types_internal): ... here.  Count enumerators;
        keep track of observed enums in pass 2; populate ctf_names and
        ctf_conflicting_enums with enumerators in a third pass.
	(ctf_dict_close): Free ctf_conflicting_enums.
	* ctf-create.c (ctf_add_enumerator): Prohibit addition of duplicate
        enumerators in root-visible enum types.

include/
	* ctf-api.h (CTF_ADD_NONROOT): Describe what non-rootness
        means for enumeration constants.
	(ctf_add_enumerator):  The name is not a misnomer.
        We now require that enumerators have unique names.
        Document the non-rootness of enumerators.
2024-06-18 13:20:32 +01:00
Nick Alcock
327356780a libctf: don't leak enums if ctf_add_type fails
If ctf_add_type failed in the middle of enumerator addition, the
destination would end up containing the source enum type and some
but not all of its enumerator constants.

Use snapshots to roll back the enum addition as a whole if this happens.
Before now, it's been pretty unlikely, but in an upcoming commit we will ban
addition of enumerators that already exist in a given dict, making failure
of ctf_add_enumerator and thus of this part of ctf_add_type much more
likely.

libctf/
	* ctf-create.c (ctf_add_type_internal): Roll back if enum or
          enumerator addition fails.
2024-06-18 13:20:31 +01:00
Nick Alcock
483546ce4f libctf: make ctf_serialize() actually serialize
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.

It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.

Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.

... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t.  This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly.  This simplifies most of its callers
significantly.

(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)

This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.

libctf/

	* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
	(ctf_type_sect_size): Add static type sizes too.
	(ctf_serialize): Return the new dict rather than updating the
	existing dict.  No longer fail for dicts with static types;
	copy them onto the start of the new types table.
	(ctf_gzwrite): Actually serialize before gzwriting.
	(ctf_write_mem): Improve forced (test-mode) endian-flipping:
	flip dicts even if they are too small to be compressed.
	Improve confusing variable naming.
	* ctf-archive.c (arc_write_one_ctf): Don't bother to call
	ctf_serialize: both the functions we call do so.
	* ctf-string.c (ctf_str_create_atoms): Drop serializing case
	(atoms arg).
	* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
	(ctf_simple_open_internal): Delete.
	(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
	longer bother with syn_ext_strtab or forced atoms table,
	serialization no longer needs them.
	* ctf-create.c (ctf_create): Call ctf_bufopen directly.
	* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
	(ctf_simple_open_internal): Delete.
	(ctf_bufopen_internal): Likewise.
	(ctf_serialize): Adjust.
	* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
	this is supposed to work.
2024-04-19 16:14:47 +01:00
Nick Alcock
cf9da3b0b6 libctf: rethink strtab writeout
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.

There are three intertwined changes here:

 - pull the contents of strtabs from newly ctf_bufopened dicts into the
   atoms table, so that future additions will reuse the existing offset etc
   rather than adding new identical strings
 - allow the internal ctf_bufopen done by serialization to contribute its
   existing atoms table, so that existing atoms can be used for the
   remainder of the open process (like name table construction): this atoms
   table currente gets thrown away in the mass reassignment done later in
   ctf_serialize in any case, but it needs to be there during the open.
 - rewrite ctf_str_write_strtab so that a) it uses iterators rather than
   ctf_*_iter, reducing pointless structures which serve no other purpose
   than to implement ordinary variable scope, but more clunkily, and b)
   retains the existing strtab on the front of the new one, with its sort
   retained, rather than resorting, so all existing already-written strtab
   offsets remain valid across the call.

This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.

(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize().  We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them.  This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)

libctf/

	* ctf-create.c (ctf_create): Add (temporary) atoms arg.
	* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
	(ctf_str_create_atoms): Adjust.
	(ctf_str_write_strtab): Likewise.
	(ctf_simple_open_internal): Likewise.
	* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
	(ctf_bufopen): Likewise.
	(ctf_bufopen_internal): Initialize just enough of an
	atoms table: pre-init from the atoms arg if supplied.
	(ctf_simple_open): Adjust.
	* ctf-serialize.c (ctf_serialize): Constify the strtab.
	Move ref list purging into ctf_str_write_strtab.
	Initialize the new dict with the old dict's atoms table.
	Accept the new strtab from ctf_str_write_strtab.
	Adjust for addition of ctf_dynstrtab.
	* ctf-string.c (ctf_strraw_explicit): Improve comments.
	(ctf_str_create_atoms): Prepopulate from an existing atoms table,
	or alternatively pull in all strings from the strtab and turn
	them into atoms.
	(ctf_str_free_atoms): Free the dynstrtab and its strtab.
	(struct ctf_strtab_write_state): Remove.
	(ctf_str_count_strtab): Fold this...
	(ctf_str_populate_sorttab): ... and this...
	(ctf_str_write_strtab): ... into this.  Prepend existing strings
	to the strtab rather than resorting them (and wrecking their
	offsets).  Keep the dynstrtab updated.  Update refs for all
	atoms with refs, whether or not they are strings newly added
	to the strtab.
2024-04-19 16:14:47 +01:00
Nick Alcock
149ce5c263 libctf: replace 'pending refs' abstraction
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.

This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again.  We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.

Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).

libctf/

	* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
        a movable ref.
	(ctf_add_member_offset): Likewise.
	* ctf-util.c (ctf_realloc): Delete.
	* ctf-serialize.c (ctf_serialize): No longer use it.  Adjust to
	new fields.
	* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
	(ctf_str_free_atom): Free freeable atoms' strings.
	(ctf_str_create_atoms): Create the movable refs dynhash if needed.
	(ctf_str_free_atoms): Destroy it.
	(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
	reversion).  Add new flag.
	(aref_create):  New, populate movable refs if need be.
	(ctf_str_add_ref_internal): Switch back to flags, update refs
	directly for nonprovisional strings (with already-known fixed offsets);
	create refs via aref_create.  Allocate strings only if not within an
	mmapped strtab.
	(ctf_str_add_movable_ref): New.
	(ctf_str_add): Adjust to CTF_STR_* reintroduction.
	(ctf_str_add_external): LIkewise.
	(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
	backpointer.
	(ctf_str_purge_refs): Drop ctf_str_num_refs.
	(ctf_str_update_refs): Fix indentation.
	* ctf-impl.h (struct ctf_str_atom_movable): New.
	(struct ctf_dict.ctf_str_num_refs): Drop.
	(struct ctf_dict.ctf_str_movable_refs): New.
	(ctf_str_add_movable_ref): Declare.
	(ctf_str_move_refs): Likewise.
	(ctf_realloc): Drop.
2024-04-19 16:14:46 +01:00
Nick Alcock
3301ddba1b Revert "libctf: do not corrupt strings across ctf_serialize"
This reverts commit 986e9e3aa0.

(We do not revert the testcase -- it remains valid -- but we are
taking a different, less complex and more robust approach.)

This also deletes the pending refs abstraction without (yet)
replacing it, so some tests will fail for a commit or two.
2024-04-19 16:14:46 +01:00
Nick Alcock
4fa4e3d92a libctf: delete LCTF_DIRTY
This flag was meant as an optimization to avoid reserializing dicts
unnecessarily.  It was critically necessary back when serialization was
done by ctf_update() and you had to call that every time you wanted any
new modifications to the type table to be usable by other types, but
that has been unnecessary for years now, and serialization is only done
once when writing out, which one would naturally assume would always
serialize the dict.  Worse, it never really worked: it only tracked
newly-added types, not things like added symbols which might equally
well require reserialization, and it gets in the way of an upcoming
change.  Delete entirely.

libctf/

	* ctf-create.c (ctf_create): Drop LCTF_DIRTY.
	(ctf_discard): Likewise.
	(ctf_rollback): Likewise.
	(ctf_add_generic): Likewise.
	(ctf_set_array): Likewise.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.
	(ctf_add_variable_forced): Likewise.
	* ctf-link.c (ctf_link_intern_extern_string): Likewise.
	(ctf_link_add_strtab): Likewise.
	* ctf-serialize.c (ctf_serialize): Likewise.
	* ctf-impl.h (LCTF_DIRTY): Likewise.
	(LCTF_LINKING): Renumber.
2024-04-19 16:14:46 +01:00
Nick Alcock
8a60c93096 libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.

But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.

So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them.  (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)

This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account.  Some of these irregularities were hard to define as
anything but bugs.

Notably:

 - The symbol handling was assuming that symbols only needed to be
   looked for in dynamic hashtabs or static linker-laid-out indexed/
   nonindexed layouts, but now we want to check both in case people
   added more symbols to a dict they opened.

 - The code that handles type additions wasn't checking to see if types
   with the same name existed *at all* (so you could do
   ctf_add_typedef (fp, "foo", bar) repeatedly without error).  This
   seems reasonable for types you just added, but we probably *do* want
   to ban addition of types with names that override names we already
   used in the ctf_open()ed portion, since that would probably corrupt
   existing type relationships.  (Doing things this way also avoids
   causing new errors for any existing code that was doing this sort of
   thing.)

 - ctf_lookup_variable entirely failed to work for variables just added
   by ctf_add_variable: you had to write the dict out and read it back
   in again before they appeared.

 - The symbol handling remembered what symbols you looked up but didn't
   remember their types, so you could look up an object symbol and then
   find it popping up when you asked for function symbols, which seems
   less than ideal.  Since we had to rejig things enough to be able to
   distinguish function and object symbols internally anyway (in order
   to give suitable errors if you try to add a symbol with a name that
   already existed in the ctf_open()ed dict), this bug suddenly became
   more visible and was easily fixed.

We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time).  This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).

There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.

libctf/

	* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
	(ctf_dict.ctf_symhash_func): ... this and...
	(ctf_dict.ctf_symhash_objt): ... this.
	(ctf_dict.ctf_stypes): New, counts static types.
	(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
	(LCTF_RDWR): Deleted.
	(LCTF_DIRTY): Renumbered.
	(LCTF_LINKING): Likewise.
	(ctf_lookup_variable_here): New.
	(ctf_lookup_by_sym_or_name): Likewise.
	(ctf_symbol_next_static): Likewise.
	(ctf_add_variable_forced): Likewise.
	(ctf_add_funcobjt_sym_forced): Likewise.
	(ctf_simple_open_internal): Adjust.
	(ctf_bufopen_internal): Likewise.
	* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
	(ctf_create): Migrate a bunch of initializations into bufopen.
	Force recreation of name tables.  Do not forcibly override the
	model, let ctf_bufopen do it.
	(ctf_static_type): New.
	(ctf_update): Drop LCTF_RDWR check.
	(ctf_dynamic_type): Likewise.
	(ctf_add_function): Likewise.
	(ctf_add_type_internal): Likewise.
	(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
	(ctf_set_array): Likewise.
	(ctf_add_struct_sized): Likewise.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_enumerator): Likewise (only on the target dict).
	(ctf_add_member_offset): Likewise.
	(ctf_add_generic): Drop LCTF_RDWR check.  Ban addition of types
	with colliding names.
	(ctf_add_forward): Note safety under the new rules.
	(ctf_add_variable): Split all but the existence check into...
	(ctf_add_variable_forced): ... this new function.
	(ctf_add_funcobjt_sym): Likewise...
	(ctf_add_funcobjt_sym_forced): ... for this new function.
	* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
	with any stypes.
	(ctf_link_add_strtab): Likewise.
	(ctf_link_shuffle_syms): Likewise.
	(ctf_link_intern_extern_string): Note pre-existing prohibition.
	* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
	(ctf_lookup_variable): Split out looking in a dict but not
	its parent into...
	(ctf_lookup_variable_here): ... this new function.
	(ctf_lookup_symbol_idx): Track whether looking up a function or
	object: cache them separately.
	(ctf_symbol_next): Split out looking in non-dynamic symtypetab
	entries to...
	(ctf_symbol_next_static): ... this new function.  Don't get confused
	by the simultaneous presence of static and dynamic symtypetab entries.
	(ctf_try_lookup_indexed):  Don't waste time looking up symbols by
	index before there can be any idea how symbols are numbered.
	(ctf_lookup_by_sym_or_name): Distinguish between function and
	data object lookups.  Drop LCTF_RDWR.
	(ctf_lookup_by_symbol): Adjust.
	(ctf_lookup_by_symbol_name): Likewise.
	* ctf-open.c (init_types): Rename to...
	(init_static_types): ... this.  Drop LCTF_RDWR.  Populate ctf_stypes.
	(ctf_simple_open): Drop writable arg.
	(ctf_simple_open_internal): Likewise.
	(ctf_bufopen): Likewise.
	(ctf_bufopen_internal): Populate fields only used for writable dicts.
	Drop LCTF_RDWR.
	(ctf_dict_close): Cater for symhash cache split.
	* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
	* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
	* testsuite/libctf-lookup/add-to-opened*: New test.
2024-04-19 16:14:46 +01:00
Nick Alcock
54a0219150 libctf: remove static/dynamic name lookup distinction
libctf internally maintains a set of hash tables for type name lookups,
one for each valid C type namespace (struct, union, enum, and everything
else).

Or, rather, it maintains *two* sets of hash tables: one, a ctf_hash *,
is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the
other, a ctf_dynhash *, is meant for lookups in ctf_create()d dicts.

This distinction was somewhat valuable in the far pre-binutils past when
two different hashtable implementations were used (one expanding, the
other fixed-size), but those days are long gone: the hash table
implementations are almost identical, both wrappers around the libiberty
hashtab. The ctf_dynhash has many more capabilities than the ctf_hash
(iteration, deletion, etc etc) and has no downsides other than starting
at a fixed, arbitrary small size.

That limitation is easy to lift (via a new ctf_dynhash_create_sized()),
following which we can throw away nearly all the ctf_hash
implementation, and all the code to choose between readable and writable
hashtabs; the few convenience functions that are still useful (for
insertion of name -> type mappings) can also be generalized a bit so
that the extra string verification they do is potentially available to
other string lookups as well.

(libctf still has two hashtable implementations, ctf_dynhash, above,
and ctf_dynset, which is a key-only hashtab that can avoid a great many
malloc()s, used for high-volume applications in the deduplicator.)

libctf/

	* ctf-create.c (ctf_create): Eliminate ctn_writable.
	(ctf_dtd_insert): Likewise.
	(ctf_dtd_delete): Likewise.
	(ctf_rollback): Likewise.
	(ctf_name_table): Eliminate ctf_names_t.
	* ctf-hash.c (ctf_dynhash_create): Comment update.
        Reimplement in terms of...
	(ctf_dynhash_create_sized): ... this new function.
	(ctf_hash_create): Remove.
	(ctf_hash_size): Remove.
	(ctf_hash_define_type): Remove.
	(ctf_hash_destroy): Remove.
	(ctf_hash_lookup_type): Rename to...
	(ctf_dynhash_lookup_type): ... this.
	(ctf_hash_insert_type): Rename to...
	(ctf_dynhash_insert_type): ... this, moving validation to...
	* ctf-string.c (ctf_strptr_validate): ... this new function.
	* ctf-impl.h (struct ctf_names): Extirpate.
	(struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t.
	(struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t.
	(ctf_name_table): Now returns a ctf_dynhash_t.
	(ctf_lookup_by_rawhash): Remove.
	(ctf_hash_create): Likewise.
	(ctf_hash_insert_type): Likewise.
	(ctf_hash_define_type): Likewise.
	(ctf_hash_lookup_type): Likewise.
	(ctf_hash_size): Likewise.
	(ctf_hash_destroy): Likewise.
	(ctf_dynhash_create_sized): New.
	(ctf_dynhash_insert_type): New.
	(ctf_dynhash_lookup_type): New.
	(ctf_strptr_validate): New.
	* ctf-lookup.c (ctf_lookup_by_name_internal): Adapt.
	* ctf-open.c (init_types): Adapt.
	(ctf_set_ctl_hashes): Adapt.
	(ctf_dict_close): Adapt.
	* ctf-serialize.c (ctf_serialize): Adapt.
	* ctf-types.c (ctf_lookup_by_rawhash): Remove.
2024-04-19 16:14:46 +01:00
Alan Modra
59497587af libctf warnings
Seen with every compiler I have if using -fno-inline:
home/alan/src/binutils-gdb/libctf/ctf-create.c: In function ‘ctf_add_encoded’:
/home/alan/src/binutils-gdb/libctf/ctf-create.c:555:3: warning: ‘encoding’ may be used uninitialized [-Wmaybe-uninitialized]
  555 |   memcpy (dtd->dtd_vlen, &encoding, sizeof (encoding));

Seen with gcc-4.9 and probably others at lower optimisation levels:
home/alan/src/binutils-gdb/libctf/ctf-serialize.c: In function 'symtypetab_density':
/home/alan/src/binutils-gdb/libctf/ctf-serialize.c:211:18: warning: 'sym' may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (*max < sym->st_symidx)

Seen with gcc-4.5 and probably others at lower optimisation levels:
/home/alan/src/binutils-gdb/libctf/ctf-types.c:1649:21: warning: 'tp' may be used uninitialized in this function
/home/alan/src/binutils-gdb/libctf/ctf-link.c:765:16: warning: 'parent_i' may be used uninitialized in this function

Also with gcc-4.5:
In file included from /home/alan/src/binutils-gdb/libctf/ctf-endian.h:25:0,
                 from /home/alan/src/binutils-gdb/libctf/ctf-archive.c:24:
/home/alan/src/binutils-gdb/libctf/swap.h:70:0: warning: "_Static_assert" redefined
/usr/include/sys/cdefs.h:568:0: note: this is the location of the previous definition

	* swap.h (_Static_assert): Don't define if already defined.
	* ctf-serialize.c (symtypetab_density): Merge two
	CTF_SYMTYPETAB_FORCE_INDEXED blocks.
	* ctf-create.c (ctf_add_encoded): Avoid "encoding" may be used
	uninitialized warning.
	* ctf-link.c (ctf_link_deduplicating_open_inputs): Avoid
	"parent_i" may be used uninitialized warning.
	* ctf-types.c (ctf_type_rvisit): Avoid "tp" may be used
	uninitialized warning.
2024-04-17 09:24:36 +09:30
Alan Modra
fd67aa1129 Update year range in copyright notice of binutils files
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:

1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
   author I haven't committed, 'Kalray SA.', to cover gas testsuite
   files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
2024-01-04 22:58:12 +10:30
Nick Alcock
cd4a8fc4f0 libctf: fix creation-time parent/child dict confusions
The fixes applied a few years ago to resolve confusions between parent and
child dicts at lookup time also apply in various forms to creation.  In
general, if you have a type in a parent dict ctf_imported into a child and
you do something to it, and the parent dict is writable (created via
ctf_create, not opened via ctf_open*) it should work just the same to make
changes to that type via a child dict as it does to make the change
to the parent dict directly -- and nothing you're prohibited from doing
to the parent dict when done directly should be allowed just because
you're doing it via a child.

Specifically, the following don't work when doing things from the child, but
should:

 - adding a member of a type in the parent to a struct or union in the
   parent via ctf_add_member or ctf_add_member_offset: this yields
   ECTF_BADID

 - adding a member of a type in the parent to a struct or union in the
   parent via ctf_add_member_encoded: this dumps core (!).

 - adding an enumerand to an enumerator in the parent: this yields
   ECTF_BADID

 - setting the properties of an array in the parent via ctf_set_array;
   this yields ECTF_BADID

Relatedly, some things work when doing things via a child that should fail,
yielding a CTF dictionary with invalid content (readable, but meaningless):
in particular, you can add a child type to a struct in the parent via
any of the ctf_add_member* family and nothing complains at all, even though
you should never be able to add references to children to parents (since any
given parent can be associated with many different children).

A family of tests is added to check each of these cases independently, since
some can result in coredumps and it would be nice to test the other cases
even if some dump core.  They use a common library to do all the actual
work.  The set of affected API calls was determined by code inspection
(auditing all calls to ctf_dtd_lookup): it's possible that I missed a few,
but I doubt it, since other cases use ctf_lookup* functions, which already
climb to the parent where appropriate.

libctf/ChangeLog:

	PR libctf/30985
	* ctf-create.c (ctf_dtd_lookup): Traverse to parents if necessary.
	(ctf_set_array): Likewise.  Report errors on the child; require
	both parent and child to be writable.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.  Prohibit addition of child types
	to structs in the parent.
	(ctf_add_member_encoded): Do not dereference a NULL dtd: report
	ECTF_BADID instead.
	* ctf-string.c (ctf_str_add_ref_internal): Report ENOMEM on the
	dict if addition of a string ref fails.
	* testsuite/libctf-writable/parent-child-dtd-crash-lib.c: New library.
	* testsuite/libctf-writable/parent-child-dtd-enum.*: New test.
	* testsuite/libctf-writable/parent-child-dtd-enumerator.*: New test.
	* testsuite/libctf-writable/parent-child-dtd-member-encoded.*: New test.
	* testsuite/libctf-writable/parent-child-dtd-member-offset.*: New test.
	* testsuite/libctf-writable/parent-child-dtd-set-array.*: New test.
	* testsuite/libctf-writable/parent-child-dtd-struct.*: New test.
	* testsuite/libctf-writable/parent-child-dtd-union.*: New test.
2023-10-20 18:09:54 +01:00
Torbjörn SVENSSON
998a4f589d libctf: Sanitize error types for PR 30836
Made sure there is no implicit conversion between signed and unsigned
return value for functions setting the ctf_errno value.
An example of the problem is that in ctf_member_next, the "offset" value
is either 0L or (ctf_id_t)-1L, but it should have been 0L or -1L.
The issue was discovered while building a 64 bit ld binary to be
executed on the Windows platform.
Example object file that demonstrates the issue is attached in the PR.

libctf/
	Affected functions adjusted.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
2023-10-17 17:31:20 +02:00
Nick Alcock
d7474051e8 libctf: propagate errors from parents correctly
CTF dicts have per-dict errno values: as with other errno values these
are set on error and left unchanged on success.  This means that all
errors *must* set the CTF errno: if a call leaves it unchanged, the
caller is apt to find a previous, lingering error and misinterpret
it as the real error.

There are many places in libctf where we carry out operations on parent
dicts as a result of carrying out other user-requested operations on
child dicts (e.g. looking up information on a pointer to a type will
look up the type as well: the pointer might well be in a child and the
type it's a pointer to in the parent).  Those operations on the parent
might fail; if they do, the error must be correctly reflected on the
child that the user-visible operation was carried out on.  In many
places this was not happening.

So, audit and fix all those places.  Add tests for as many of those
cases as possible so they don't regress.

libctf/
	* ctf-create.c (ctf_add_slice): Use the original dict.
	* ctf-lookup.c (ctf_lookup_variable): Propagate errors.
	(ctf_lookup_symbol_idx): Likewise.
	* ctf-types.c (ctf_member_next): Likewise.
	(ctf_type_resolve_unsliced): Likewise.
	(ctf_type_aname): Likewise.
	(ctf_member_info): Likewise.
	(ctf_type_rvisit): Likewise.
	(ctf_func_type_info): Set the error on the right dict.
	(ctf_type_encoding): Use the original dict.
	* testsuite/libctf-writable/error-propagation.*: New test.
2023-04-08 16:07:17 +01:00
Alan Modra
d87bef3a7b Update year range in copyright notice of binutils files
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
2023-01-01 21:50:11 +10:30
Alan Modra
a2c5833233 Update year range in copyright notice of binutils files
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.

The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
2022-01-02 12:04:28 +10:30
Nick Alcock
49da556c65 libctf, include: support an alternative encoding for nonrepresentable types
Before now, types that could not be encoded in CTF were represented as
references to type ID 0, which does not itself appear in the
dictionary. This choice is annoying in several ways, principally that it
forces generators and consumers of CTF to grow special cases for types
that are referenced in valid dicts but don't appear.

Allow an alternative representation (which will become the only
representation in format v4) whereby nonrepresentable types are encoded
as actual types with kind CTF_K_UNKNOWN (an already-existing kind
theoretically but not in practice used for padding, with value 0).
This is backward-compatible, because CTF_K_UNKNOWN was not used anywhere
before now: it was used in old-format function symtypetabs, but these
were never emitted by any compiler and the code to handle them in libctf
likely never worked and was removed last year, in favour of new-format
symtypetabs that contain only type IDs, not type kinds.

In order to link this type, we need an API addition to let us add types
of unknown kind to the dict: we let them optionally have names so that
GCC can emit many different unknown types and those types with identical
names will be deduplicated together.  There are also small tweaks to the
deduplicator to actually dedup such types, to let opening of dicts with
unknown types with names work, to return the ECTF_NONREPRESENTABLE error
on resolution of such types (like ID 0), and to print their names as
something useful but not a valid C identifier, mostly for the sake of
the dumper.

Tests added in the next commit.

include/ChangeLog
2021-05-06  Nick Alcock  <nick.alcock@oracle.com>

	* ctf.h (CTF_K_UNKNOWN): Document that it can be used for
	nonrepresentable types, not just padding.
	* ctf-api.h (ctf_add_unknown): New.

libctf/ChangeLog
2021-05-06  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-open.c (init_types): Unknown types may have names.
	* ctf-types.c (ctf_type_resolve): CTF_K_UNKNOWN is as
	non-representable as type ID 0.
	(ctf_type_aname): Print unknown types.
	* ctf-dedup.c (ctf_dedup_hash_type): Do not early-exit for
	CTF_K_UNKNOWN types: they have real hash values now.
	(ctf_dedup_rwalk_one_output_mapping): Treat CTF_K_UNKNOWN types
	like other types with no referents: call the callback and do not
	skip them.
	(ctf_dedup_emit_type): Emit via...
	* ctf-create.c (ctf_add_unknown): ... this new function.
	* libctf.ver (LIBCTF_1.2): Add it.
2021-05-06 09:30:59 +01:00
Nick Alcock
08c428aff4 libctf: eliminate dtd_u, part 5: structs / unions
Eliminate the dynamic member storage for structs and unions as we have
for other dynamic types.  This is much like the previous enum
elimination, except that structs and unions are the only types for which
a full-sized ctf_type_t might be needed.  Up to now, this decision has
been made in the individual ctf_add_{struct,union}_sized functions and
duplicated in ctf_add_member_offset.  The vlen machinery lets us
simplify this, always allocating a ctf_lmember_t and setting the
dtd_data's ctt_size to CTF_LSIZE_SENT: we figure out whether this is
really justified and (almost always) repack things down into a
ctf_stype_t at ctf_serialize time.

This allows us to eliminate the dynamic member paths from the iterators and
query functions in ctf-types.c in favour of always using the large-structure
vlen stuff for dynamic types (the diff is ugly but that's just because of the
volume of reindentation this calls for).  This also means the large-structure
vlen stuff gets more heavily tested, which is nice because it was an almost
totally unused code path before now (it only kicked in for structures of size
>4GiB, and how often do you see those?)

The only extra complexity here is ctf_add_type.  Back in the days of the
nondeduplicating linker this was called a ridiculous number of times for
countless identical copies of structures: eschewing the repeated lookups of the
dtd in ctf_add_member_offset and adding the members directly saved an amazing
amount of time.  Now the nondeduplicating linker is gone, this is extreme
overoptimization: we can rip out the direct addition and use ctf_member_next and
ctf_add_member_offset, just like ctf_dedup_emit does.

We augment a ctf_add_type test to try adding a self-referential struct, the only
thing the ctf_add_type part of this change really perturbs.

This completes the elimination of dtd_u.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtu_members>: Remove.
	<dtd_u>: Likewise.
	(ctf_dmdef_t): Remove.
	(struct ctf_next) <u.ctn_dmd>: Remove.
	* ctf-create.c (INITIAL_VLEN): New, more-or-less arbitrary initial
	vlen size.
	(ctf_add_enum): Use it.
	(ctf_dtd_delete): Do not free the (removed) dmd; remove string
	refs from the vlen on struct deletion.
	(ctf_add_struct_sized): Populate the vlen: do it by hand if
	promoting forwards.  Always populate the full-size
	lsizehi/lsizelo members.
	(ctf_add_union_sized): Likewise.
	(ctf_add_member_offset): Set up the vlen rather than the dmd.
	Expand it as needed, repointing string refs via
	ctf_str_move_pending. Add the member names as pending strings.
	Always populate the full-size lsizehi/lsizelo members.
	(membadd): Remove, folding back into...
	(ctf_add_type_internal): ... here, adding via an ordinary
	ctf_add_struct_sized and _next iteration rather than doing
	everything by hand.
	* ctf-serialize.c (ctf_copy_smembers): Remove this...
	(ctf_copy_lmembers): ... and this...
	(ctf_emit_type_sect): ... folding into here. Figure out if a
	ctf_stype_t is needed here, not in ctf_add_*_sized.
	(ctf_type_sect_size): Figure out the ctf_stype_t stuff the same
	way here.
	* ctf-types.c (ctf_member_next): Remove the dmd path and always
	use the vlen.  Force large-structure usage for dynamic types.
	(ctf_type_align): Likewise.
	(ctf_member_info): Likewise.
	(ctf_type_rvisit): Likewise.
	* testsuite/libctf-regression/type-add-unnamed-struct-ctf.c: Add a
	self-referential type to this test.
	* testsuite/libctf-regression/type-add-unnamed-struct.c: Adjusted
	accordingly.
	* testsuite/libctf-regression/type-add-unnamed-struct.lk: Likewise.
2021-03-18 12:40:40 +00:00
Nick Alcock
77d724a7ec libctf: eliminate dtd_u, part 4: enums
This is the first tricky one, the first complex multi-entry vlen
containing strings.  To handle this in vlen form, we have to handle
pending refs moving around on realloc.

We grow vlen regions using a new ctf_grow_vlen function, and iterate
through the existing enums every time a grow happens, telling the string
machinery the distance between the old and new vlen region and letting
it adjust the pending refs accordingly.  (This avoids traversing all
outstanding refs to find the refs that need adjusting, at the cost of
having to traverse one enum: an obvious major performance win.)

Addition of enums themselves (and also structs/unions later) is a bit
trickier than earlier forms, because the type might be being promoted
from a forward, and forwards have no vlen: so we have to spot that and
create it if needed.

Serialization of enums simplifies down to just telling the string
machinery about the string refs; all the enum type-lookup code loses all
its dynamic member lookup complexity entirely.

A new test is added that iterates over (and gets values of) an enum with
enough members to force a round of vlen growth.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_vlen_alloc>: New.
	(ctf_str_move_pending): Declare.
	* ctf-string.c (ctf_str_add_ref_internal): Fix error return.
	(ctf_str_move_pending): New.
	* ctf-create.c (ctf_grow_vlen): New.
	(ctf_dtd_delete): Zero out the vlen_alloc after free.  Free the
	vlen later: iterate over it and free enum name refs first.
	(ctf_add_generic): Populate dtd_vlen_alloc from vlen.
	(ctf_add_enum): populate the vlen; do it by hand if promoting
	forwards.
	(ctf_add_enumerator): Set up the vlen rather than the dmd.  Expand
	it as needed, repointing string refs via ctf_str_move_pending. Add
	the enumerand names as pending strings.
	* ctf-serialize.c (ctf_copy_emembers): Remove.
	(ctf_emit_type_sect): Copy the vlen into place and ref the
	strings.
	* ctf-types.c (ctf_enum_next): The dynamic portion now uses
	the same code as the non-dynamic.
	(ctf_enum_name): Likewise.
	(ctf_enum_value): Likewise.
	* testsuite/libctf-lookup/enum-many-ctf.c: New test.
	* testsuite/libctf-lookup/enum-many.lk: New test.
2021-03-18 12:40:40 +00:00
Nick Alcock
986e9e3aa0 libctf: do not corrupt strings across ctf_serialize
The preceding change revealed a new bug: the string table is sorted for
better compression, so repeated serialization with type (or member)
additions in the middle can move strings around.  But every
serialization flushes the set of refs (the memory locations that are
automatically updated with a final string offset when the strtab is
updated), so if we are not to have string offsets go stale, we must do
all ref additions within the serialization code (which walks the
complete set of types and symbols anyway). Unfortunately, we were adding
one ref in another place: the type name in the dynamic type definitions,
which has a ref added to it by ctf_add_generic.

So adding a type, serializing (via, say, one of the ctf_write
functions), adding another type with a name that sorts earlier, and
serializing again will corrupt the name of the first type because it no
longer had a ref pointing to its dtd entry's name when its string offset
was shifted later in the strtab to mae way for the other type.

To ensure that we don't miss strings, we also maintain a set of *pending
refs* that will be added later (during serialization), and remove
entries from that set when the ref is finally added.  We always use
ctf_str_add_pending outside ctf-serialize.c, ensure that ctf_serialize
adds all strtab offsets as refs (even those in the dtds) on every
serialization, and mandate that no refs are live on entry to
ctf_serialize and that all pending refs are gone before strtab
finalization.  (Of necessity ctf_serialize has to traverse all strtab
offsets in the dtds in order to serialize them, so adding them as refs
at the same time is easy.)

(Note that we still can't erase unused atoms when we roll back, though
we can erase unused refs: members and enums are still not removed by
rollbacks and might reference strings added after the snapshot.)

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-hash.c (ctf_dynset_elements): New.
	* ctf-impl.h (ctf_dynset_elements): Declare it.
	(ctf_str_add_pending): Likewise.
	(ctf_dict_t) <ctf_str_pending_ref>: New, set of refs that must be
	added during serialization.
	* ctf-string.c (ctf_str_create_atoms): Initialize it.
	(CTF_STR_ADD_REF): New flag.
	(CTF_STR_MAKE_PROVISIONAL): Likewise.
	(CTF_STR_PENDING_REF): Likewise.
	(ctf_str_add_ref_internal): Take a flags word rather than int
	params.  Populate, and clear out, ctf_str_pending_ref.
	(ctf_str_add): Adjust accordingly.
	(ctf_str_add_external): Likewise.
	(ctf_str_add_pending): New.
	(ctf_str_remove_ref): Also remove the potential ref if it is a
	pending ref.
	* ctf-serialize.c (ctf_serialize): Prohibit addition of strings
	with ctf_str_add_ref before serialization.  Ensure that the
	ctf_str_pending_ref set is empty before strtab finalization.
	(ctf_emit_type_sect): Add a ref to the ctt_name.
	* ctf-create.c (ctf_add_generic): Add the ctt_name as a pending
	ref.
	* testsuite/libctf-writable/reserialize-strtab-corruption.*: New test.
2021-03-18 12:40:40 +00:00
Nick Alcock
81982d20fa libctf: eliminate dtd_u, part 3: functions
One more member vanishes from the dtd_u, leaving only the member for
struct/union/enum members.

There's not much to do here, since as of commit afd78bd6f0 we use
the same representation (type sizes, etc) in the dtu_argv as we will
use in the final vlen, with one exception: the vlen has alignment
padding, and the dtu_argv did not.  Simplify things by adding suitable
padding in both cases.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_argv>: Remove.
	* ctf-create.c (ctf_dtd_delete): No longer free it.
	(ctf_add_function): Use the dtd_vlen, not dtu_argv.  Properly align.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-types.c (ctf_func_type_info): Just use the vlen.
	(ctf_func_type_args): Likewise.
2021-03-18 12:40:40 +00:00
Nick Alcock
534444b1ee libctf: eliminate dtd_u, part 2: arrays
This is even simpler than ints, floats and slices, with the only extra
complication being the need to manually transfer the array parameter in
the rarely-used function ctf_set_array.  (Arrays are unique in libctf in
that they can be modified post facto, not just created and appended to.
I'm not sure why they got this exemption, but it's easy to maintain.)

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_arr>: Remove.
	* ctf-create.c (ctf_add_array): Use the dtd_vlen, not dtu_arr.
	(ctf_set_array): Likewise.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-types.c (ctf_array_info): Just use the vlen.
2021-03-18 12:40:40 +00:00
Nick Alcock
7879dd88ef libctf: eliminate dtd_u, part 1: int/float/slice
This series eliminates a lot of special-case code to handle dynamic
types (types added to writable dicts and not yet serialized).

Historically, when such types have variable-length data in their final
CTF representations, libctf has always worked by adding such types to a
special union (ctf_dtdef_t.dtd_u) in the dynamic type definition
structure, then picking the members out of this structure at
serialization time and packing them into their final form.

This has the advantage that the ctf_add_* code doesn't need to know
anything about the final CTF representation, but the significant
disadvantage that all code that looks up types in any way needs two code
paths, one for dynamic types, one for all others.  Historically libctf
"handled" this by not supporting most type lookups on dynamic types at
all until ctf_update was called to do a complete reserialization of the
entire dict (it didn't emit an error, it just emitted wrong results).
Since commit 676c3ecbad, which eliminated ctf_update in favour of
the internal-only ctf_serialize function, all the type-lookup paths
grew an extra branch to handle dynamic types.

We can eliminate this branch again by dropping the dtd_u stuff and
simply writing out the vlen in (close to) its final form at ctf_add_*
time: type lookup for types using this approach is then identical for
types in writable dicts and types that are in read-only ones, and
serialization is also simplified (we just need to write out the vlen
we already created).

The only complexity lies in type kinds for which multiple
vlen representations are valid depending on properties of the type,
e.g. structures.  But we can start simple, adjusting ints, floats,
and slices to work this way, and leaving everything else as is.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_enc>: Remove.
	<dtd_u.dtu_slice>: Likewise.
	<dtd_vlen>: New.
	* ctf-create.c (ctf_add_generic): Perhaps allocate it.  All
	callers adjusted.
	(ctf_dtd_delete): Free it.
	(ctf_add_slice): Use the dtd_vlen, not dtu_enc.
	(ctf_add_encoded): Likewise.  Assert that this must be an int or
	float.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-dedup.c (ctf_dedup_rhash_type): Use the dtd_vlen, not
	dtu_slice.
	* ctf-types.c (ctf_type_reference): Likewise.
	(ctf_type_encoding): Remove most dynamic-type-specific code: just
	get the vlen from the right place.  Report failure to look up the
	underlying type's encoding.
2021-03-18 12:40:36 +00:00