Commit Graph

75 Commits

Author SHA1 Message Date
Nick Alcock
4837852527 libctf: types: access to raw type data
This new API lets users ask for the raw type data associated with a type
(either the whole lot including prefixes, or just the suffix if this is not
a CTF_K_BIG type), and then they can manipulate it using ctf.h functions
or whatever else they like.  Doing this does not preclude using libctf
querying functions at the same time (just don't change the type!  It's
const for a reason).

New API:

+const ctf_type_t *ctf_type_data (ctf_dict_t *, ctf_id_t, int prefix);

This function was unimplementable before the DTD changes, because the
ctf_type_t and vlen were separated in memory: but now they're always stored
in a single buffer, it's reliable and simple, indeed trivial.
2025-04-25 18:07:43 +01:00
Nick Alcock
33326f571f libctf: types: recursive type visiting
ctf_type_visit and ctf_type_rvisit have to adapt to the internal
API changes, but also to the change in the representation of
structures.  The new code is quite a lot simpler than the old,
because we don't need to roll our own iterator but can just use
ctf_member_next.

API changes, the usual for the *_f typedefs and anything to do with
structures:

-typedef int ctf_visit_f (const char *name, ctf_id_t type, unsigned long offset,
-			 int depth, void *arg);
+typedef int ctf_visit_f (ctf_dict_t *, const char *name, ctf_id_t type,
+			 size_t offset, int bit_width, int depth,
 			 void *arg);
2025-04-25 18:07:43 +01:00
Nick Alcock
bd0c033b29 libctf, create, types: slices
Nothing difficult for this CTF-specific type kind, just the usual adjustment
to internal API changes.
2025-04-25 18:07:43 +01:00
Nick Alcock
0cd5118024 libctf: create, types: arrays
The same internal API changes for arrays.  There is one ABI change here,
to ctf_arinfo_t:

-  uint32_t ctr_nelems;		/* Number of elements.  */
+  size_t ctr_nelems;		/* Number of elements.  */
2025-04-25 18:07:43 +01:00
Nick Alcock
d65d03bec4 libctf: create, types: reftypes and pointers
This is pure adjustment for internal API changes, and a change to the
type-compatibility of pointers to type 0 now that it can be void as well as
"unrepresentable".

By now this dance should be quite familiar.
2025-04-25 18:07:43 +01:00
Nick Alcock
d5dd8997b3 libctf: create, types: conflicting types
The conflicting type kind is a CTF-specific prefix kind consisting purely of
an optional translation unit name.  It takes the place of the old hidden
bit: we have already seen it used to prefix types added with a
CTF_ADD_NONROOT flag.  The deduplicator will also use them to label
conflicting types from different TUs smushed into the same dict by the
CU-mapping mechanism: unlike the hidden bit, with this scheme users can tell
which CUs the conflicting types came from.

New API:

+int ctf_type_conflicting (ctf_dict_t *, ctf_id_t, const char **cuname);
+int ctf_set_conflicting (ctf_dict_t *, ctf_id_t, const char *);

(Frankly I expect ctf_set_conflicting to be used only by deduplicators and
things like that, but if we provide an option to query something we should
also provide an option to produce it...)
2025-04-25 18:07:43 +01:00
Nick Alcock
fb8917ac21 libctf, create, types: type and decl tags
These are a little more fiddly than previous kinds, because their
namespacing rules are odd: they have names (so presumably we want an API to
look them up by name), but the names are not unique (they don't need to be,
because they are not entities you can refer to from C), so many distinct
tags in the same TU can have the same name.  Type tags only refer to a type
ID: decl tags refer to a specific function parameter or structure member via
a zero-indexed "component index".

The name tables for these things are a hash of name to a set of type IDs;
rather different from all the other named entities in libctf.  As a
consequence, they can presently be looked up only using their own dedicated
functions, not using ctf_lookup_by_name et al.  (It's not clear if this
restriction could ever be lifted: ctf_lookup_by_name and friends return a
type ID, not a set of them.)

They are similar enough to each other that we can at least have one function
to look up both type and decl tags if you don't care about their
component_idx and only want a type ID: ctf_tag.  (And one to iterate over
them, ctf_tag_next).

(A caveat: because tags aren't widely used or generated yet, much of this is
more or less untested and/or supposition and will need testing later.)

New API, more or less the minimum needed because it's not entirely clear how
these things will be used:

+ctf_id_t ctf_tag (ctf_dict_t *, ctf_id_t tag);
+ctf_id_t ctf_decl_tag (ctf_dict_t *, ctf_id_t decl_tag,
+		       int64_t *component_idx);
+ctf_id_t ctf_tag_next (ctf_dict_t *, const char *tag, ctf_next_t **);
+ctf_id_t ctf_add_type_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *);
+ctf_id_t ctf_add_decl_type_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *);
+ctf_id_t ctf_add_decl_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *,
+			   int component_idx);
2025-04-25 18:07:43 +01:00
Nick Alcock
39cdb3e395 libctf: create, types: functions, linkage, arg names (API ADVICE)
Functions change in CTFv4 by growing argument names as well as argument
types; the representation changes into a two-element array of (type, string
offset) rather than a simple array of arg types.  Functions also gain an
explicit linkage in a different type kind (CTF_K_FUNC_LINKAGE, which
corresponds to BTF_KIND_FUNC).

New API:

 typedef struct ctf_funcinfo {
 /* ... */
-  uint32_t ctc_argc;		/* Number of typed arguments to function.  */
+  size_t ctc_argc;		/* Number of typed arguments to function.  */
};

int ctf_func_arg_names (ctf_dict_t *, unsigned long, uint32_t, const char **);
int ctf_func_type_arg_names (ctf_dict_t *, ctf_id_t, uint32_t,
		 	    const char **names);
+extern int ctf_type_linkage (ctf_dict_t *, ctf_id_t);
-extern ctf_id_t ctf_add_function (ctf_dict_t *, uint32_t,
-				  const ctf_funcinfo_t *, const ctf_id_t *);
+extern ctf_id_t ctf_add_function (ctf_dict_t *, uint32_t,
+				  const ctf_funcinfo_t *, const ctf_id_t *,
+				  const char **arg_names);
+extern ctf_id_t ctf_add_function_linkage (ctf_dict_t *, uint32_t,
+					  ctf_id_t, const char *, int linkage);

Adding this is fairly straightforward; the only annoying part is the way the
callers need to allocate space for the arg name and type arrays.  Maybe we
should rethink these into something like ctf_type_aname(), allocating
space for the caller so the caller doesn't need to?  It would certainly
make all the callers in libctf much less complex...

While we're at it, adjust ctf_type_reference, ctf_type_align, and
ctf_type_size for the new internal API changes (they also all have
special-case code for functions).
2025-04-25 18:07:43 +01:00
Nick Alcock
a632f3ed33 libctf, types: ctf_type_kind_{iter,next} et al
These new functions let you iterate over types by kind, letting you get all
variables, all enums, all datasecs, etc.  (This is amenable to future
optimization, and some is expected shortly.)

We also add new iternal functions ctf_type_kind_{forwarded_,unsliced_,}tp
which are like the corresponding non-_tp functions except that they
take a ctf_type_t rather than a type ID: doing this allows the deduplicator
to use these nearly-public functions more.  The public ctf_type_kind*
functions are reimplemented in terms of these.

This machinery is the principal place where the magic encoding of forwards
is encoded.
2025-04-25 18:07:43 +01:00
Nick Alcock
ea21a1b2ae libctf: create, types: variables and datasecs (REVIEW NEEDED)
This is an area of significant difference from CTFv3.  The API changes
significantly, with quite a few additions to allow creation and querying of
these new datasec entities:

-typedef int ctf_variable_f (const char *name, ctf_id_t type, void *arg);
+typedef int ctf_variable_f (ctf_dict_t *, const char *name, ctf_id_t type,
+			    void *arg);
+typedef int ctf_datasec_var_f (ctf_dict_t *fp, ctf_id_t type, size_t offset,
+			       size_t datasec_size, void *arg);

+/* Search a datasec for a variable covering a given offset.
+
+   Errors with ECTF_NODATASEC if not found.  */
+
+ctf_id_t ctf_datasec_var_offset (ctf_dict_t *fp, ctf_id_t datasec,
+				 uint32_t offset);
+
+/* Return the datasec that a given variable appears in, or ECTF_NODATASEC if
+   none.  */
+
+ctf_id_t ctf_variable_datasec (ctf_dict_t *fp, ctf_id_t var);

+int ctf_datasec_var_iter (ctf_dict_t *, ctf_id_t, ctf_datasec_var_f *,
+			  void *);
+ctf_id_t ctf_datasec_var_next (ctf_dict_t *, ctf_id_t, ctf_next_t **,
+			       size_t *size, size_t *offset);

-int ctf_add_variable (ctf_dict_t *, const char *, ctf_id_t);
+/* ctf_add_variable adds variables to no datasec at all;
+   ctf_add_section_variable adds them to the given datasec, or to no datasec at
+   all if the datasec is NULL.  */
+
+ctf_id_t ctf_add_variable (ctf_dict_t *, const char *, int linkage, ctf_id_t);
+ctf_id_t ctf_add_section_variable (ctf_dict_t *, uint32_t,
+				   const char *datasec, const char *name,
+				   int linkage, ctf_id_t type,
+				   size_t size, size_t offset);

We tie datasecs quite closely to variables at addition (and, as should
become clear later, dedup) time: you never create datasecs, you only create
variables *in* datasecs, and the datasec springs into existence when you do
so: datasecs are always found in the same dict as the variables they contain
(the variables are never in the parent if the datasec is in a child or
anything).  We keep track of the variable->datasec mapping in
ctf_var_datasecs (populating it at addition and open time), to allow
ctf_variable_datasec to work at reasonable speed.  (But, as yet, there are
no tests of this function at all.)

The datasecs are created unsorted (to avoid variable addition becoming
O(n^2)) and sorted at serialization time, and when ctf_datasec_var_offset is
invoked.

We reuse the natural-alignment code from struct addition to get a plausible
offset in datasecs if an alignment of -1 is specified: maybe this is
unnecessary now (it was originally added when ctf_add_variable added
variables to a "default datasec", while now it just leaves them out of
all datasecs, like externs are).

One constraint of this is that we currently prohibit the addition of
nonrepresentable-typed variables, because we can't tell what their natural
alignment is: if we dropped the whole "align" and just required everyone
adding a variable to a datasec to specify an offset, we could drop that
restriction. WDYT?

One additional caveat: right now, ctf_lookup_variable() looks up the type of
a variable (because when it was invented, variables were not entities in
themselves that you could look up).  This name is confusing as hell as a
result.  It might be less confusing to make it return the CTF_K_VAR, but
that would be awful to adapt callers to, since both are represented with
ctf_id_t's, so the compiler wouldn't warn about the needed change at all...
I've vacillated on this three or four times now.
2025-04-25 18:07:43 +01:00
Nick Alcock
097ff012e4 libctf: decl, types: revise ctf_decl*, ctf_type_*name
These all need fairly trivial revisions for prefix types.  While
we're at it, we can add explicit handling of nonrepresentable types,
returning CTF_K_UNKNOWN for such types rather than throwing an error,
so that type printing prints (nonrepresentable type) for such types
as it always intended to.
2025-04-25 18:07:42 +01:00
Nick Alcock
2c1a0a70d1 libctf, create, types: encoding, BTF floats
This adds support for the nearly useless BTF_KIND_FLOAT, under the name
CTF_K_BTF_FLOAT.  At the same time we fix up the ctf_add_encoding and
ctf_type_encoding machinery for the new API changes.

I expect this to change a bit: Ali Bahrami reckons I've oversimplified the
CTFv4 encoding representation and need to reintroduce at least a width.

New API:

ctf_id_t ctf_add_btf_float (ctf_dict_t *, uint32_t,
                            const char *, const ctf_encoding_t *);
2025-04-25 18:07:42 +01:00
Nick Alcock
03609073b0 libctf: create, types: enums and enum64s; type encoding
This commit adapts most aspects of enum handling: querying and iteration,
enumerator querying and iteration, ctf_type_add, etc.  We have to adapt to
enum64s and to signed versus unsigned enums, to our vlen and DTD changes and
other internal API changes to handle prefix types etc, and fix the types of
things to allow for 64-bit enumerators.  We can also (finally!) get useful
info on enum size rather than being restricted to a value hardwired into
libctf.

We also adjust all the type-encoding functions for the internal API changes,
since enums are the first encodable entities we have covered.

API changes:

-typedef int ctf_enum_f (const char *name, int val, void *arg);
+typedef int ctf_enum_f (const char *name, int64_t val, void *arg);
+typedef int ctf_unsigned_enum_f (const char *name, uint64_t val, void *arg);

-extern const char *ctf_enum_name (ctf_dict_t *, ctf_id_t, int);
-extern int ctf_enum_value (ctf_dict_t *, ctf_id_t, const char *, int *);
+extern const char *ctf_enum_name (ctf_dict_t *, ctf_id_t, int64_t);
+extern int ctf_enum_value (ctf_dict_t *, ctf_id_t, const char *, int64_t *);
+extern int ctf_enum_unsigned_value (ctf_dict_t *, ctf_id_t, const char *, uint64_t *);
+
+/* Return 1 if this enum's contents are unsigned, so you can tell which of the
+   above functions to use.  */
+
+extern int ctf_enum_unsigned (ctf_dict_t *, ctf_id_t);

-/* Return all enumeration constants in a given enum type.  */
-extern int ctf_enum_iter (ctf_dict_t *, ctf_id_t, ctf_enum_f *, void *);
+/* Return all enumeration constants in a given enum type.  The return value, and
+   VAL argument, may need to be cast to uint64_t: see ctf_enum_unsigned().  */
+extern int64_t ctf_enum_iter (ctf_dict_t *, ctf_id_t, ctf_enum_f *, void *);
 extern const char *ctf_enum_next (ctf_dict_t *, ctf_id_t, ctf_next_t **,
-				  int *);
+				  int64_t *);
+
+/* enums are created signed by default.  If you want an unsigned enum,
+   use ctf_add_enum_encoded() with an encoding of 0 (CTF_INT_SIGNED and
+   everything else off).  This will not create a slice, unlike all other
+   uses of ctf_add_enum_encoded(), and the result is still representable
+   as BTF.  */
+
+extern ctf_id_t ctf_add_enum64_encoded (ctf_dict_t *, uint32_t, const char *,
+					const ctf_encoding_t *);
+extern ctf_id_t ctf_add_enum64 (ctf_dict_t *, uint32_t, const char *);

-extern int ctf_add_enumerator (ctf_dict_t *, ctf_id_t, const char *, int);
+extern int ctf_add_enumerator (ctf_dict_t *, ctf_id_t, const char *, int64_t);

The only aspects of enums that are not now handled are forwards to enums,
dumping of enums, and deduplication of enums.
2025-04-25 18:07:42 +01:00
Nick Alcock
0a3ee49dd0 libctf: types: add ctf_struct_bitfield (NEEDS REVIEW)
This new public API function allows you to find out if a struct has the
bitfield flag set or not.  (There are no other properties specific to a
struct, so we needed a new function for it.  I am open to a
ctf_struct_info() function handing back a struct if people prefer.)

New API:

int ctf_struct_bitfield (ctf_dict_t *, ctf_id_t);
2025-04-25 18:07:42 +01:00
Nick Alcock
4a4312b684 libctf: types: ctf_type_resolve_nonrepresentable
This new internal function allows us to say "resolve a type to its base
type, but treat type 0 like BTF, returning 0 if it is found rather than
erroring with ECTF_NONREPRESENTABLE".  Used in the next commit.
2025-04-25 18:07:42 +01:00
Nick Alcock
20e6f72dc7 libctf: create: structure and union member addition
There is one API addition here:

int ctf_add_member_bitfield (ctf_dict_t *, ctf_id_t souid,
                             const char *, ctf_id_t type,
                             unsigned long bit_offset,
                             int bit_width);

SoU addition handles the representational changes for bitfields and for
CTF_K_BIG structs (i.e. all structs you can add members to), errors out if
you add bitfields to structs that aren't created with the
CTF_ADD_STRUCT_BITFIELDS flag, and arranges to add padding as needed if
there is too much of a gap for the offsets to encode in one hop (that
part is still untested).
2025-04-25 18:07:42 +01:00
Nick Alcock
64b65a0a34 libctf: types: struct/union member querying and iteration
This commit revises ctf_member_next, ctf_member_iter, ctf_member_count, and
ctf_member_info for the new CTFv4 world.  This also pulls in a bunch of
infrastructure used by most of the type querying functions, and fundamental
changes to the way DTD records are represented in libctf (ctf-create not yet
adjusted).  Other type querying functions affected by changes in struct
representation are also changed.

There are some API changes here: new bit-width fields in ctf_member_f,
ctf_membinfo_t and ctf_member_next, and a fix to the type of the offset in
ctf_member_f, ctf_membinfo_t and and ctf_member_count.  (ctf_member_next got
the offset type right already.)

ctf_member_f also gets a new ctf_dict_t arg so that you can actually use
the member type it passes in without having to package up and pass in the
dict type yourself (a frequent need).  This change is later echoed in most
of the rest of the *_f typedefs.

 typedef struct ctf_membinfo
 {
   ctf_id_t ctm_type;		/* Type of struct or union member.  */
-  unsigned long ctm_offset;	/* Offset of member in bits.  */
+  size_t ctm_offset;		/* Offset of member in bits.  */
+  int ctm_bit_width;		/* Width of member in bits: -1: not bitfield */
 } ctf_membinfo_t;

-typedef int ctf_member_f (const char *name, ctf_id_t membtype,
-			  unsigned long offset, void *arg);
+typedef int ctf_member_f (ctf_dict_t *, const char *name, ctf_id_t membtype,
+			  size_t offset, int bit_width, void *arg);

 extern ssize_t ctf_member_next (ctf_dict_t *, ctf_id_t, ctf_next_t **,
 				const char **name, ctf_id_t *membtype,
-				int flags);
+				int *bit_width, int flags);

-int ctf_member_count (ctf_dict_t *, ctf_id_t);
+ssize_t ctf_member_count (ctf_dict_t *, ctf_id_t);

The DTD changes are that where before the ctf_dtdef_t had a dtd_data which
was the ctf_type_t type node for a type, and a separate dtd_vlen which was
the vlen buffer which (in the final serialized representation) would
directly follow that type, now it has one single buffer, dtd_buf, which
consists of a stream of one or more ctf_type_t nodes, followed by a vlen,
as it will appear in the final serialized form.  This buffer has internal
pointers into it: dtd_data is a pointer to the last ctf_type_t in the stream
(the true type node, after all prefixes), and dtd_vlen is a pointer to the
vlen (precisely one ctf_type_t after the dtd_data).  This representation is
nice because it means there is even less distinction between a dynamic type
added by ctf_add_*() and a static one read directly out of a dict: you can
traverse the entire type without caring where it came from, simplifying
most of the type querying functions.

(There are a few more things in there which will be useful mostly when
adding new types: their uses will be seen later.)

Two new nontrivial functions exist (one of which is annoyingly tangled up in
the diff, sorry about that): ctf_find_prefix, which hunts down a given
prefix (if it exists) among the possibly many that may exist on a type (so
you can ask it to find the CTF_K_BIG prefix for a type if it exists, and
it'll return you a pointer to its ctf_type_t record), and ctf_vlen, which
you hand a type ID and its ctf_type_t *, and it gives you back a pointer to
its vlen and tells you how long it is.  (This is one of only two places left
in ctf-types.c which cares whether a type is dynamic or not.  The other has
yet to be added).  Almost every function in ctf-types.c will end up calling
ctf_lookup_by_id and ctf_vlen in turn.

ctf_next_t has changed significantly: the ctn_type member is split in two so
that we can tell whether a given iterator works using types or indexes, and
we gain the ability to iterate over enum64s, DTDs themselves, and datasecs
(most of this will only be used in later commits).

The old internal function ctf_struct_member, which handled the distinction
between ctf_member_t and ctf_lmember_t, is gone.  Instead we have new code
that handles the different representation of bitfield versus non-bitfield
structs and unions, and more code to handle the different representation of
CTF_K_BIG structs and unions (their offsets are the distance from the last
offset, rather than the distance from the start of the structure).
2025-04-25 18:07:42 +01:00
Nick Alcock
456d5bedcc types: add some more error checking
A few places with inadequate error checking have fallen out of the
ctf_id_t work:

 - ctf_add_slice doesn't make sure that the type it is slicing
   actually exists
 - ctf_add_member_offset doesn't check that the type of the member
   exists (though it will often fail if it doesn't, it doesn't
   explicitly check, so if you're unlucky it can sometimes succeed,
   giving you a corrupted dict)
 - ctf_type_encoding doesn't check whether its slied type exists:
   it should verify it so it can return a decent error, rather than
   a thoroughly misleading one
 - ctf_type_compat has the same problem with respect to both of its
   arguments. It would definitely be nicer if we could call
   ctf_type_compat and just get a boolean answer, but it's not
   clear to me whether a type can be said to be compatible *or*
   incompatible with a nonexistent one, and we should probably alert
   the users to a likely bug regardless.  C error checking, sigh...
2025-03-16 15:25:28 +00:00
Nick Alcock
b5d3790c66 libctf: consecutive ctf_id_t assignment
This change modifies type ID assignment in CTF so that it works like BTF:
rather than flipping the high bit on for types in child dicts, types ascend
directly from IDs in the parent to IDs in the child, without interruption
(so type 0x4 in the parent is immediately followed by 0x5 in all children).

Doing this while retaining useful semantics for modification of parents is
challenging.  By definition, child type IDs are not known until the parent
is written out, but we don't want to find ourselves constrained to adding
types to the parent in one go, followed by all child types: that would make
the deduplicator a nightmare and would frankly make the entire ctf_add*()
interface next to useless: all existing clients that add types at all
add types to both parents and children without regard for ordering, and
breaking that would probably necessitate redesigning all of them.

So we have to be a litle cleverer.

We approach this the same way as we approach strings in the recent refs
rework: if a parent has children attached (or has ever had them attached
since it was created or last read in), any new types created in the parent
are assigned provisional IDs starting at the very top of the type space and
working down.  (Their indexes in the internal libctf arrays remain
unchanged, so we don't suddenly need multigigabyte indexes!).  At writeout
(preserialization) time, we traverse the type table (and all other table
containing type IDs) and assign refs to every type ID in exactly the same
way we assign refs to every string offset (just a different set of refs --
we don't want to update type IDs with string offset values!).

For a parent dict with children, these refs are real entities in memory:
pointers to the memory locations where type IDs are stored, tracked in the
DTD of each type.  As we traverse the type table, we assign real IDs to each
type (by simple incrementation), storing those IDs in a new dtd_final_type
field in the DTD for each type.  Once the type table and all other tables
containing type IDs are fully traversed, we update all the refs and
overwrite the IDs currently residing in each with the final IDs for each
type.

That fixes up IDs in the parent dict itself (including forward references in
structs and the like: that's why the ref updates only happen at the end);
but what about child dicts' references, both to parent types and to their
own?  We add armouring to enforce that parent dicts are always serialized
before their children (which ctf-link.c already does, because it's a
precondition for strtab deduplication), and then arrange that when a ref is
added to a type whose ID has been assigned (has a dtd_final_type), we just
immediately do an update rather than storing a ref for later updating.
Since the parent is already serialized, all parent type IDs have a
dtd_final_type by this point, and all parent IDs in the children are
properly updated. The child types can now be renumbered now we now the
number of types in the parent, and their refs updated identically to what
was just done with the parent.

One wrinkle: before the child refs are updated, while we are working over
the child's type section, the type IDs in the child start from 1 (or
something like that), which might seem to overlap the parent IDs.  But this
is not the case: when you serialize the parent, the IDs written out to disk
are changed, but the only change to the representation in memory is that we
remember a dtd_final_type for each type (and use it to update all the child
type refs): its ID in memory is the same as it always was, a nonoverlapping
provisional ID higher than any other valid ID.  We enforce all of this by
asserting that when you add a ref to a type, the memory location that is
modified must be in the buffer being serialized: the code will not let you
accidentally modify the actual DTDs in memory.

We track the number of types in the parent in a new CTFv4 (not BTF) header
field (the dumper is updated): we will also use this to open CTFv3 child
dicts without change by simply declaring for them that the parent dict has
2^31 types in it (or 2^15, for v2 and below): the IDs in the children then
naturally come out right with no other changes needed.  (Right now, opening
CTFv3 child dicts requires extra compatibility code that has not been
written, but that code will no longer need to worry about type ID
differences.)

Various things are newly forbidden:

 - you cannot ctf_import() a child into a parent if you already ctf_add()ed
   types to the child, because all its IDs would change (and since you
   already cannot ctf_add() types to a child that hasn't had its parent
   imported, this in practice means only that ctf_create() must be followed
   immediately by a ctf_import() if this is a new child, which all sane
   clients were doing anyway).

 - You cannot import a child into a parent which has the wrong number of
   (non-provisional) types, again because all its IDs would be wrong:
   because parents only add types in the provisional space if children are
   attached to it, this would break the not unknown case of opening an
   archive, adding types to the parent, and only then importing children
   into it, so we add a special case: archive members which are not children
   in an archive with more than one member always pretend to have at least
   one child, so type additions in them are always provisional even before
   you ctf_import anything. In practice, this does exactly what we want,
   since all archives so far are created by the linker and have one parent
   and N children of that parent.

Because this introduces huge gaps between index and type ID for provisional
types, some extra assertions are added to ensure that the internal
ctf_type_to_index() is only ever called on types in the current dict (never
a parent dict): before now, this was just taken on trust, and it was often
wrong (which at best led to wrong results, as wrong array indexes were used,
and at worst to a buffer overflow). When hash debugging is on (suggesting
that the user doesn't mind expensive checks), every ctf_type_to_index()
triggers a ctf_index_to_type() to make sure that the operations are proper
inverses.

Lots and lots of tests are added to verify that assignment works and that
updating of every type kind works fine -- existing tests suffice for
type IDs in the variable and symtypetab sections.

The ld-ctf tests get a bunch of largely display-based updates: various
tests refer to 0x8... type IDs, which no longer exist, and because the
IDs are shorter all the spacing and alignment has changed.
2025-03-16 15:25:27 +00:00
Nick Alcock
274cc1f13d libctf: fix ctf_type_pointer on parent dicts, etc
Before now, ctf_type_pointer was crippled: it returned some type (if any)
that was a pointer to the type passed in, but only if both types were in the
current dict: if either (or both) was in the parent dict, it said there was
no pointer though there was.  This breaks real users: it's past time to lift
the restriction.

WIP (complete, but not yet tested).
2025-02-28 15:13:24 +00:00
Nick Alcock
5a1d8eca5c libctf: fix slices of slices and of enums
Slices had a bunch of horrible usability problems.  In particular, while
towers of cv-quals are resolved away by functions that need to do it, towers
of cv-quals with slices in the middle are not resolved away by functions
like ctf_enum_value that can see through slices: resolving volatile -> slice
-> const -> enum will leave it with a 'const', which will error pointlessly,
annoying callers, who reasonably expect slices to be more invisible than
this.  (The user-callable ctf_type_resolve still does not resolve away
slices, because this is the only way users can see that the slices are there
at all.)

This is induced by a fix for another wart: ctf_add_enumerator does not
resolve anything away at all, so you can't even add enumerators to const or
volatile enums -- and more problematically, you can't add enumerators to
enums with an explicit encoding without resolving away the types by hand,
since ctf_add_enum_encoded works by returning a slice!  ctf_add_enumerator
now resolves away all of those, so any cvr-or-typedef-or-slice-qual
terminating in an enum can be added to, exactly as callers likely expect.

(New tests added.)

libctf/
	* ctf-create.c (ctf_add_enumerator): Resolve away cvr-qualness.
	* ctf-types.c (ctf_type_resolve_unsliced): Don't terminate at
	the first slice.
	* testsuite/libctf-writable/slice-of-slice.*: New test.
2025-02-28 15:13:24 +00:00
Nick Alcock
dc93d01ff2 libctf: de-macroize LCTF_TYPE_TO_INDEX / LCTF_INDEX_TO_TYPE
Making these functions is unnecessary right now, but will become much
clearer shortly.

While we're at it, we can drop the third child argument to
LCTF_INDEX_TO_TYPE: it's only used for nontrivial purposes that aren't
literally the same as getting the result from the fp in one place,
in ctf_lookup_by_name_internal, and that place is easily fixed by just
looking in the right dictionary in the first place.
2025-02-28 15:13:24 +00:00
Nick Alcock
b875301e74 libctf: drop LCTF_TYPE_ISPARENT/LCTF_TYPE_ISCHILD
Parent/child determination is about to become rather more complex, making a
macro impractical.  Use the ctf_type_isparent/ischild function calls
everywhere and remove the macro.  Make them more const-correct too, to
make them more widely usable.

While we're about it, change several places that hand-implemented
ctf_get_dict() to call it instead, and armour several functions against
the null returns that were always possible in this case (but previously
unprotected-against).
2025-02-28 15:13:24 +00:00
Nick Alcock
70d05ab0b2 libctf: add mechanism to prohibit most operations without a strtab
We are about to add machinery that deduplicates a child dict's strtab
against its parent.  Obviously if you open such a dict but do not import its
parent, all strtab lookups must fail: so add an LCTF_NO_STR flag that is set
in that window and make most operations fail if it's not set.  (Two more
that will be set in future commits are serialization and string lookup
itself.)

Notably, not all symbol lookup is impossible in this window: you can still
look up by symbol index, as long as this dict is not using an indexed
strtypetab (which obviously requires string lookups to get the symbol name).

include/
	* ctf-api.h (_CTF_ERRORS) [ECTF_HASPARENT]: New.
        [ECTF_WRONGPARENT]: Likewise.
	(ECTF_NERR): Update.
        Update comments to note the new limitations on ctf_import et al.

libctf/
	* ctf-impl.h (LCTF_NO_STR): New.
	* ctf-create.c (ctf_rollback): Error out when LCTF_NO_STR.
	(ctf_add_generic): Likewise.
	(ctf_add_struct_sized): Likewise.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_forward): Likewise.
	(ctf_add_unknown): Likewise.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.
	(ctf_add_variable): Likewise.
	(ctf_add_funcobjt_sym_forced): Likewise.
	(ctf_add_type): Likewise (on either dict).
	* ctf-dump.c (ctf_dump): Likewise.
	* ctf-lookup.c (ctf_lookup_by_name): Likewise.
	(ctf_lookup_variable): Likewise. Likewise.
	(ctf_lookup_enumerator): Likewise.
	(ctf_lookup_enumerator_next): Likewise.
	(ctf_symbol_next): Likewise.
	(ctf_lookup_by_sym_or_name): Likewise, if doing indexed lookups.
	* ctf-types.c (ctf_member_next): Likewise.
	(ctf_enum_next): Likewise.
	(ctf_type_aname): Likewise.
	(ctf_type_name_raw): Likewise.
	(ctf_type_compat): Likewise, for either dict.
	(ctf_member_info): Likewise.
	(ctf_enum_name): Likewise.
	(ctf_enum_value): Likewise.
	(ctf_type_rvisit): Likewise.
	(ctf_variable_next): Note that we don't need to test LCTF_NO_STR.
2025-02-28 14:47:24 +00:00
Alan Modra
e8e7cf2abe Update year range in copyright notice of binutils files 2025-01-01 18:29:57 +10:30
Nick Alcock
8a60c93096 libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.

But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.

So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them.  (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)

This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account.  Some of these irregularities were hard to define as
anything but bugs.

Notably:

 - The symbol handling was assuming that symbols only needed to be
   looked for in dynamic hashtabs or static linker-laid-out indexed/
   nonindexed layouts, but now we want to check both in case people
   added more symbols to a dict they opened.

 - The code that handles type additions wasn't checking to see if types
   with the same name existed *at all* (so you could do
   ctf_add_typedef (fp, "foo", bar) repeatedly without error).  This
   seems reasonable for types you just added, but we probably *do* want
   to ban addition of types with names that override names we already
   used in the ctf_open()ed portion, since that would probably corrupt
   existing type relationships.  (Doing things this way also avoids
   causing new errors for any existing code that was doing this sort of
   thing.)

 - ctf_lookup_variable entirely failed to work for variables just added
   by ctf_add_variable: you had to write the dict out and read it back
   in again before they appeared.

 - The symbol handling remembered what symbols you looked up but didn't
   remember their types, so you could look up an object symbol and then
   find it popping up when you asked for function symbols, which seems
   less than ideal.  Since we had to rejig things enough to be able to
   distinguish function and object symbols internally anyway (in order
   to give suitable errors if you try to add a symbol with a name that
   already existed in the ctf_open()ed dict), this bug suddenly became
   more visible and was easily fixed.

We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time).  This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).

There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.

libctf/

	* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
	(ctf_dict.ctf_symhash_func): ... this and...
	(ctf_dict.ctf_symhash_objt): ... this.
	(ctf_dict.ctf_stypes): New, counts static types.
	(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
	(LCTF_RDWR): Deleted.
	(LCTF_DIRTY): Renumbered.
	(LCTF_LINKING): Likewise.
	(ctf_lookup_variable_here): New.
	(ctf_lookup_by_sym_or_name): Likewise.
	(ctf_symbol_next_static): Likewise.
	(ctf_add_variable_forced): Likewise.
	(ctf_add_funcobjt_sym_forced): Likewise.
	(ctf_simple_open_internal): Adjust.
	(ctf_bufopen_internal): Likewise.
	* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
	(ctf_create): Migrate a bunch of initializations into bufopen.
	Force recreation of name tables.  Do not forcibly override the
	model, let ctf_bufopen do it.
	(ctf_static_type): New.
	(ctf_update): Drop LCTF_RDWR check.
	(ctf_dynamic_type): Likewise.
	(ctf_add_function): Likewise.
	(ctf_add_type_internal): Likewise.
	(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
	(ctf_set_array): Likewise.
	(ctf_add_struct_sized): Likewise.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_enumerator): Likewise (only on the target dict).
	(ctf_add_member_offset): Likewise.
	(ctf_add_generic): Drop LCTF_RDWR check.  Ban addition of types
	with colliding names.
	(ctf_add_forward): Note safety under the new rules.
	(ctf_add_variable): Split all but the existence check into...
	(ctf_add_variable_forced): ... this new function.
	(ctf_add_funcobjt_sym): Likewise...
	(ctf_add_funcobjt_sym_forced): ... for this new function.
	* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
	with any stypes.
	(ctf_link_add_strtab): Likewise.
	(ctf_link_shuffle_syms): Likewise.
	(ctf_link_intern_extern_string): Note pre-existing prohibition.
	* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
	(ctf_lookup_variable): Split out looking in a dict but not
	its parent into...
	(ctf_lookup_variable_here): ... this new function.
	(ctf_lookup_symbol_idx): Track whether looking up a function or
	object: cache them separately.
	(ctf_symbol_next): Split out looking in non-dynamic symtypetab
	entries to...
	(ctf_symbol_next_static): ... this new function.  Don't get confused
	by the simultaneous presence of static and dynamic symtypetab entries.
	(ctf_try_lookup_indexed):  Don't waste time looking up symbols by
	index before there can be any idea how symbols are numbered.
	(ctf_lookup_by_sym_or_name): Distinguish between function and
	data object lookups.  Drop LCTF_RDWR.
	(ctf_lookup_by_symbol): Adjust.
	(ctf_lookup_by_symbol_name): Likewise.
	* ctf-open.c (init_types): Rename to...
	(init_static_types): ... this.  Drop LCTF_RDWR.  Populate ctf_stypes.
	(ctf_simple_open): Drop writable arg.
	(ctf_simple_open_internal): Likewise.
	(ctf_bufopen): Likewise.
	(ctf_bufopen_internal): Populate fields only used for writable dicts.
	Drop LCTF_RDWR.
	(ctf_dict_close): Cater for symhash cache split.
	* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
	* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
	* testsuite/libctf-lookup/add-to-opened*: New test.
2024-04-19 16:14:46 +01:00
Nick Alcock
54a0219150 libctf: remove static/dynamic name lookup distinction
libctf internally maintains a set of hash tables for type name lookups,
one for each valid C type namespace (struct, union, enum, and everything
else).

Or, rather, it maintains *two* sets of hash tables: one, a ctf_hash *,
is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the
other, a ctf_dynhash *, is meant for lookups in ctf_create()d dicts.

This distinction was somewhat valuable in the far pre-binutils past when
two different hashtable implementations were used (one expanding, the
other fixed-size), but those days are long gone: the hash table
implementations are almost identical, both wrappers around the libiberty
hashtab. The ctf_dynhash has many more capabilities than the ctf_hash
(iteration, deletion, etc etc) and has no downsides other than starting
at a fixed, arbitrary small size.

That limitation is easy to lift (via a new ctf_dynhash_create_sized()),
following which we can throw away nearly all the ctf_hash
implementation, and all the code to choose between readable and writable
hashtabs; the few convenience functions that are still useful (for
insertion of name -> type mappings) can also be generalized a bit so
that the extra string verification they do is potentially available to
other string lookups as well.

(libctf still has two hashtable implementations, ctf_dynhash, above,
and ctf_dynset, which is a key-only hashtab that can avoid a great many
malloc()s, used for high-volume applications in the deduplicator.)

libctf/

	* ctf-create.c (ctf_create): Eliminate ctn_writable.
	(ctf_dtd_insert): Likewise.
	(ctf_dtd_delete): Likewise.
	(ctf_rollback): Likewise.
	(ctf_name_table): Eliminate ctf_names_t.
	* ctf-hash.c (ctf_dynhash_create): Comment update.
        Reimplement in terms of...
	(ctf_dynhash_create_sized): ... this new function.
	(ctf_hash_create): Remove.
	(ctf_hash_size): Remove.
	(ctf_hash_define_type): Remove.
	(ctf_hash_destroy): Remove.
	(ctf_hash_lookup_type): Rename to...
	(ctf_dynhash_lookup_type): ... this.
	(ctf_hash_insert_type): Rename to...
	(ctf_dynhash_insert_type): ... this, moving validation to...
	* ctf-string.c (ctf_strptr_validate): ... this new function.
	* ctf-impl.h (struct ctf_names): Extirpate.
	(struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t.
	(struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t.
	(ctf_name_table): Now returns a ctf_dynhash_t.
	(ctf_lookup_by_rawhash): Remove.
	(ctf_hash_create): Likewise.
	(ctf_hash_insert_type): Likewise.
	(ctf_hash_define_type): Likewise.
	(ctf_hash_lookup_type): Likewise.
	(ctf_hash_size): Likewise.
	(ctf_hash_destroy): Likewise.
	(ctf_dynhash_create_sized): New.
	(ctf_dynhash_insert_type): New.
	(ctf_dynhash_lookup_type): New.
	(ctf_strptr_validate): New.
	* ctf-lookup.c (ctf_lookup_by_name_internal): Adapt.
	* ctf-open.c (init_types): Adapt.
	(ctf_set_ctl_hashes): Adapt.
	(ctf_dict_close): Adapt.
	* ctf-serialize.c (ctf_serialize): Adapt.
	* ctf-types.c (ctf_lookup_by_rawhash): Remove.
2024-04-19 16:14:46 +01:00
Alan Modra
59497587af libctf warnings
Seen with every compiler I have if using -fno-inline:
home/alan/src/binutils-gdb/libctf/ctf-create.c: In function ‘ctf_add_encoded’:
/home/alan/src/binutils-gdb/libctf/ctf-create.c:555:3: warning: ‘encoding’ may be used uninitialized [-Wmaybe-uninitialized]
  555 |   memcpy (dtd->dtd_vlen, &encoding, sizeof (encoding));

Seen with gcc-4.9 and probably others at lower optimisation levels:
home/alan/src/binutils-gdb/libctf/ctf-serialize.c: In function 'symtypetab_density':
/home/alan/src/binutils-gdb/libctf/ctf-serialize.c:211:18: warning: 'sym' may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (*max < sym->st_symidx)

Seen with gcc-4.5 and probably others at lower optimisation levels:
/home/alan/src/binutils-gdb/libctf/ctf-types.c:1649:21: warning: 'tp' may be used uninitialized in this function
/home/alan/src/binutils-gdb/libctf/ctf-link.c:765:16: warning: 'parent_i' may be used uninitialized in this function

Also with gcc-4.5:
In file included from /home/alan/src/binutils-gdb/libctf/ctf-endian.h:25:0,
                 from /home/alan/src/binutils-gdb/libctf/ctf-archive.c:24:
/home/alan/src/binutils-gdb/libctf/swap.h:70:0: warning: "_Static_assert" redefined
/usr/include/sys/cdefs.h:568:0: note: this is the location of the previous definition

	* swap.h (_Static_assert): Don't define if already defined.
	* ctf-serialize.c (symtypetab_density): Merge two
	CTF_SYMTYPETAB_FORCE_INDEXED blocks.
	* ctf-create.c (ctf_add_encoded): Avoid "encoding" may be used
	uninitialized warning.
	* ctf-link.c (ctf_link_deduplicating_open_inputs): Avoid
	"parent_i" may be used uninitialized warning.
	* ctf-types.c (ctf_type_rvisit): Avoid "tp" may be used
	uninitialized warning.
2024-04-17 09:24:36 +09:30
Alan Modra
fd67aa1129 Update year range in copyright notice of binutils files
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:

1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
   author I haven't committed, 'Kalray SA.', to cover gas testsuite
   files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
2024-01-04 22:58:12 +10:30
Torbjörn SVENSSON
0f79aa900f libctf: Return CTF_ERR in ctf_type_resolve_unsliced PR 30836
In commit 998a4f589d, all but one return
statement was updated to return the error proper value. This commit
rectifies that missed return statement.

libctf/
	ctf-types.c (ctf_type_resolve_unsliced): Return CTF_ERR on error.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2023-10-18 09:39:59 +02:00
Torbjörn SVENSSON
998a4f589d libctf: Sanitize error types for PR 30836
Made sure there is no implicit conversion between signed and unsigned
return value for functions setting the ctf_errno value.
An example of the problem is that in ctf_member_next, the "offset" value
is either 0L or (ctf_id_t)-1L, but it should have been 0L or -1L.
The issue was discovered while building a 64 bit ld binary to be
executed on the Windows platform.
Example object file that demonstrates the issue is attached in the PR.

libctf/
	Affected functions adjusted.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
2023-10-17 17:31:20 +02:00
Nick Alcock
d7474051e8 libctf: propagate errors from parents correctly
CTF dicts have per-dict errno values: as with other errno values these
are set on error and left unchanged on success.  This means that all
errors *must* set the CTF errno: if a call leaves it unchanged, the
caller is apt to find a previous, lingering error and misinterpret
it as the real error.

There are many places in libctf where we carry out operations on parent
dicts as a result of carrying out other user-requested operations on
child dicts (e.g. looking up information on a pointer to a type will
look up the type as well: the pointer might well be in a child and the
type it's a pointer to in the parent).  Those operations on the parent
might fail; if they do, the error must be correctly reflected on the
child that the user-visible operation was carried out on.  In many
places this was not happening.

So, audit and fix all those places.  Add tests for as many of those
cases as possible so they don't regress.

libctf/
	* ctf-create.c (ctf_add_slice): Use the original dict.
	* ctf-lookup.c (ctf_lookup_variable): Propagate errors.
	(ctf_lookup_symbol_idx): Likewise.
	* ctf-types.c (ctf_member_next): Likewise.
	(ctf_type_resolve_unsliced): Likewise.
	(ctf_type_aname): Likewise.
	(ctf_member_info): Likewise.
	(ctf_type_rvisit): Likewise.
	(ctf_func_type_info): Set the error on the right dict.
	(ctf_type_encoding): Use the original dict.
	* testsuite/libctf-writable/error-propagation.*: New test.
2023-04-08 16:07:17 +01:00
Nick Alcock
3672e32622 libctf: get the offsets of fields of unnamed structs/unions right
We were failing to add the offsets of the containing struct/union
in this case, leading to all offsets being relative to the unnamed
struct/union itself.

libctf/
	PR libctf/30264
	* ctf-types.c (ctf_member_info): Add the offset of the unnamed
	member of the current struct as necessary.
	* testsuite/libctf-lookup/unnamed-field-info*: New test.
2023-03-24 13:37:32 +00:00
Alan Modra
d87bef3a7b Update year range in copyright notice of binutils files
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
2023-01-01 21:50:11 +10:30
Alan Modra
a2c5833233 Update year range in copyright notice of binutils files
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.

The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
2022-01-02 12:04:28 +10:30
Nick Alcock
eb5323fdf8 libctf, ld: handle nonrepresentable types better
ctf_type_visit (used, among other things, by the type dumping code) was
aborting when it saw a nonrepresentable type anywhere: even a single
structure member with a nonrepresentable type caused an abort with
ECTF_NONREPRESENTABLE.  This is not useful behaviour, given that the
abort comes from a type-resolution we are only doing in order to
determine whether the type is a structure or union.  We know
nonrepresentable types can't be either, so handle that case and
pass the nonrepresentable type down.

(The added test verifies that the dumper now handles this case and
prints nonrepresentable structure members as it already does
nonrepresentable top-level types, rather than skipping the whole
structure -- or, without the previous commit, skipping the whole types
section.)

ld/ChangeLog
2021-10-25  Nick Alcock  <nick.alcock@oracle.com>

	* testsuite/ld-ctf/nonrepresentable-member.*: New test.

libctf/ChangeLog
2021-10-25  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-types.c (ctf_type_rvisit): Handle nonrepresentable types.
2021-10-25 11:17:05 +01:00
Nick Alcock
49da556c65 libctf, include: support an alternative encoding for nonrepresentable types
Before now, types that could not be encoded in CTF were represented as
references to type ID 0, which does not itself appear in the
dictionary. This choice is annoying in several ways, principally that it
forces generators and consumers of CTF to grow special cases for types
that are referenced in valid dicts but don't appear.

Allow an alternative representation (which will become the only
representation in format v4) whereby nonrepresentable types are encoded
as actual types with kind CTF_K_UNKNOWN (an already-existing kind
theoretically but not in practice used for padding, with value 0).
This is backward-compatible, because CTF_K_UNKNOWN was not used anywhere
before now: it was used in old-format function symtypetabs, but these
were never emitted by any compiler and the code to handle them in libctf
likely never worked and was removed last year, in favour of new-format
symtypetabs that contain only type IDs, not type kinds.

In order to link this type, we need an API addition to let us add types
of unknown kind to the dict: we let them optionally have names so that
GCC can emit many different unknown types and those types with identical
names will be deduplicated together.  There are also small tweaks to the
deduplicator to actually dedup such types, to let opening of dicts with
unknown types with names work, to return the ECTF_NONREPRESENTABLE error
on resolution of such types (like ID 0), and to print their names as
something useful but not a valid C identifier, mostly for the sake of
the dumper.

Tests added in the next commit.

include/ChangeLog
2021-05-06  Nick Alcock  <nick.alcock@oracle.com>

	* ctf.h (CTF_K_UNKNOWN): Document that it can be used for
	nonrepresentable types, not just padding.
	* ctf-api.h (ctf_add_unknown): New.

libctf/ChangeLog
2021-05-06  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-open.c (init_types): Unknown types may have names.
	* ctf-types.c (ctf_type_resolve): CTF_K_UNKNOWN is as
	non-representable as type ID 0.
	(ctf_type_aname): Print unknown types.
	* ctf-dedup.c (ctf_dedup_hash_type): Do not early-exit for
	CTF_K_UNKNOWN types: they have real hash values now.
	(ctf_dedup_rwalk_one_output_mapping): Treat CTF_K_UNKNOWN types
	like other types with no referents: call the callback and do not
	skip them.
	(ctf_dedup_emit_type): Emit via...
	* ctf-create.c (ctf_add_unknown): ... this new function.
	* libctf.ver (LIBCTF_1.2): Add it.
2021-05-06 09:30:59 +01:00
Nick Alcock
69a284867c libctf: support encodings for enums
The previous commit started to error-check the lookup of
ctf_type_encoding for the underlying type that is internally done when
carrying out a ctf_type_encoding on a slice.

Unfortunately, enums have no encoding, so this has historically been
returning an error (which is ignored) and then populating the cte_format
with uninitialized data.  Now the error is not ignored, this is
returning an error, which breaks linking of CTF containing bitfields of
enumerated type.

CTF format v3 does not record the actual underlying type of a enum, but
we can mock up something that is not *too* wrong, and that is at any
rate better than uninitialized data.

ld/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* testsuite/ld-ctf/slice.c: Check slices of enums too.
	* testsuite/ld-ctf/slice.d: Results adjusted.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-types.c (ctf_type_encoding): Support, after a fashion, for enums.
	* ctf-dump.c (ctf_dump_format_type): Do not report enums' degenerate
	encoding.
2021-03-18 12:40:41 +00:00
Nick Alcock
d7b1416ef2 libctf: types: unify code dealing with small-vs-large struct members
This completes the job of unifying what was once three separate code
paths full of duplication for every function dealing with querying the
properties of struct and union members.  The dynamic code path was
already removed: this change removes the distinction between small and
large members, by adding a helper that copies out members from the vlen,
expanding small members into large ones as it does so.

This makes it possible to have *more* representations of things like
structure members without needing to change the querying functions at
all.  It also lets us check for buffer overruns more effectively,
verifying that we don't accidentally overrun the end of the vlen in
either the dynamic or static type case.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_next_t) <ctn_tp>: New.
	<u.ctn_mp>: Remove.
	<u.ctn_lmp>: Remove.
	<u.ctn_vlen>: New.
	* ctf-types.c (ctf_struct_member): New.
	(ctf_member_next): Use it, dropping separate large/small code paths.
	(ctf_type_align): Likewise.
	(ctf_member_info): Likewise.
	(ctf_type_rvisit): Likewise.
2021-03-18 12:40:41 +00:00
Nick Alcock
08c428aff4 libctf: eliminate dtd_u, part 5: structs / unions
Eliminate the dynamic member storage for structs and unions as we have
for other dynamic types.  This is much like the previous enum
elimination, except that structs and unions are the only types for which
a full-sized ctf_type_t might be needed.  Up to now, this decision has
been made in the individual ctf_add_{struct,union}_sized functions and
duplicated in ctf_add_member_offset.  The vlen machinery lets us
simplify this, always allocating a ctf_lmember_t and setting the
dtd_data's ctt_size to CTF_LSIZE_SENT: we figure out whether this is
really justified and (almost always) repack things down into a
ctf_stype_t at ctf_serialize time.

This allows us to eliminate the dynamic member paths from the iterators and
query functions in ctf-types.c in favour of always using the large-structure
vlen stuff for dynamic types (the diff is ugly but that's just because of the
volume of reindentation this calls for).  This also means the large-structure
vlen stuff gets more heavily tested, which is nice because it was an almost
totally unused code path before now (it only kicked in for structures of size
>4GiB, and how often do you see those?)

The only extra complexity here is ctf_add_type.  Back in the days of the
nondeduplicating linker this was called a ridiculous number of times for
countless identical copies of structures: eschewing the repeated lookups of the
dtd in ctf_add_member_offset and adding the members directly saved an amazing
amount of time.  Now the nondeduplicating linker is gone, this is extreme
overoptimization: we can rip out the direct addition and use ctf_member_next and
ctf_add_member_offset, just like ctf_dedup_emit does.

We augment a ctf_add_type test to try adding a self-referential struct, the only
thing the ctf_add_type part of this change really perturbs.

This completes the elimination of dtd_u.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtu_members>: Remove.
	<dtd_u>: Likewise.
	(ctf_dmdef_t): Remove.
	(struct ctf_next) <u.ctn_dmd>: Remove.
	* ctf-create.c (INITIAL_VLEN): New, more-or-less arbitrary initial
	vlen size.
	(ctf_add_enum): Use it.
	(ctf_dtd_delete): Do not free the (removed) dmd; remove string
	refs from the vlen on struct deletion.
	(ctf_add_struct_sized): Populate the vlen: do it by hand if
	promoting forwards.  Always populate the full-size
	lsizehi/lsizelo members.
	(ctf_add_union_sized): Likewise.
	(ctf_add_member_offset): Set up the vlen rather than the dmd.
	Expand it as needed, repointing string refs via
	ctf_str_move_pending. Add the member names as pending strings.
	Always populate the full-size lsizehi/lsizelo members.
	(membadd): Remove, folding back into...
	(ctf_add_type_internal): ... here, adding via an ordinary
	ctf_add_struct_sized and _next iteration rather than doing
	everything by hand.
	* ctf-serialize.c (ctf_copy_smembers): Remove this...
	(ctf_copy_lmembers): ... and this...
	(ctf_emit_type_sect): ... folding into here. Figure out if a
	ctf_stype_t is needed here, not in ctf_add_*_sized.
	(ctf_type_sect_size): Figure out the ctf_stype_t stuff the same
	way here.
	* ctf-types.c (ctf_member_next): Remove the dmd path and always
	use the vlen.  Force large-structure usage for dynamic types.
	(ctf_type_align): Likewise.
	(ctf_member_info): Likewise.
	(ctf_type_rvisit): Likewise.
	* testsuite/libctf-regression/type-add-unnamed-struct-ctf.c: Add a
	self-referential type to this test.
	* testsuite/libctf-regression/type-add-unnamed-struct.c: Adjusted
	accordingly.
	* testsuite/libctf-regression/type-add-unnamed-struct.lk: Likewise.
2021-03-18 12:40:40 +00:00
Nick Alcock
77d724a7ec libctf: eliminate dtd_u, part 4: enums
This is the first tricky one, the first complex multi-entry vlen
containing strings.  To handle this in vlen form, we have to handle
pending refs moving around on realloc.

We grow vlen regions using a new ctf_grow_vlen function, and iterate
through the existing enums every time a grow happens, telling the string
machinery the distance between the old and new vlen region and letting
it adjust the pending refs accordingly.  (This avoids traversing all
outstanding refs to find the refs that need adjusting, at the cost of
having to traverse one enum: an obvious major performance win.)

Addition of enums themselves (and also structs/unions later) is a bit
trickier than earlier forms, because the type might be being promoted
from a forward, and forwards have no vlen: so we have to spot that and
create it if needed.

Serialization of enums simplifies down to just telling the string
machinery about the string refs; all the enum type-lookup code loses all
its dynamic member lookup complexity entirely.

A new test is added that iterates over (and gets values of) an enum with
enough members to force a round of vlen growth.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_vlen_alloc>: New.
	(ctf_str_move_pending): Declare.
	* ctf-string.c (ctf_str_add_ref_internal): Fix error return.
	(ctf_str_move_pending): New.
	* ctf-create.c (ctf_grow_vlen): New.
	(ctf_dtd_delete): Zero out the vlen_alloc after free.  Free the
	vlen later: iterate over it and free enum name refs first.
	(ctf_add_generic): Populate dtd_vlen_alloc from vlen.
	(ctf_add_enum): populate the vlen; do it by hand if promoting
	forwards.
	(ctf_add_enumerator): Set up the vlen rather than the dmd.  Expand
	it as needed, repointing string refs via ctf_str_move_pending. Add
	the enumerand names as pending strings.
	* ctf-serialize.c (ctf_copy_emembers): Remove.
	(ctf_emit_type_sect): Copy the vlen into place and ref the
	strings.
	* ctf-types.c (ctf_enum_next): The dynamic portion now uses
	the same code as the non-dynamic.
	(ctf_enum_name): Likewise.
	(ctf_enum_value): Likewise.
	* testsuite/libctf-lookup/enum-many-ctf.c: New test.
	* testsuite/libctf-lookup/enum-many.lk: New test.
2021-03-18 12:40:40 +00:00
Nick Alcock
81982d20fa libctf: eliminate dtd_u, part 3: functions
One more member vanishes from the dtd_u, leaving only the member for
struct/union/enum members.

There's not much to do here, since as of commit afd78bd6f0 we use
the same representation (type sizes, etc) in the dtu_argv as we will
use in the final vlen, with one exception: the vlen has alignment
padding, and the dtu_argv did not.  Simplify things by adding suitable
padding in both cases.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_argv>: Remove.
	* ctf-create.c (ctf_dtd_delete): No longer free it.
	(ctf_add_function): Use the dtd_vlen, not dtu_argv.  Properly align.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-types.c (ctf_func_type_info): Just use the vlen.
	(ctf_func_type_args): Likewise.
2021-03-18 12:40:40 +00:00
Nick Alcock
534444b1ee libctf: eliminate dtd_u, part 2: arrays
This is even simpler than ints, floats and slices, with the only extra
complication being the need to manually transfer the array parameter in
the rarely-used function ctf_set_array.  (Arrays are unique in libctf in
that they can be modified post facto, not just created and appended to.
I'm not sure why they got this exemption, but it's easy to maintain.)

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_arr>: Remove.
	* ctf-create.c (ctf_add_array): Use the dtd_vlen, not dtu_arr.
	(ctf_set_array): Likewise.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-types.c (ctf_array_info): Just use the vlen.
2021-03-18 12:40:40 +00:00
Nick Alcock
7879dd88ef libctf: eliminate dtd_u, part 1: int/float/slice
This series eliminates a lot of special-case code to handle dynamic
types (types added to writable dicts and not yet serialized).

Historically, when such types have variable-length data in their final
CTF representations, libctf has always worked by adding such types to a
special union (ctf_dtdef_t.dtd_u) in the dynamic type definition
structure, then picking the members out of this structure at
serialization time and packing them into their final form.

This has the advantage that the ctf_add_* code doesn't need to know
anything about the final CTF representation, but the significant
disadvantage that all code that looks up types in any way needs two code
paths, one for dynamic types, one for all others.  Historically libctf
"handled" this by not supporting most type lookups on dynamic types at
all until ctf_update was called to do a complete reserialization of the
entire dict (it didn't emit an error, it just emitted wrong results).
Since commit 676c3ecbad, which eliminated ctf_update in favour of
the internal-only ctf_serialize function, all the type-lookup paths
grew an extra branch to handle dynamic types.

We can eliminate this branch again by dropping the dtd_u stuff and
simply writing out the vlen in (close to) its final form at ctf_add_*
time: type lookup for types using this approach is then identical for
types in writable dicts and types that are in read-only ones, and
serialization is also simplified (we just need to write out the vlen
we already created).

The only complexity lies in type kinds for which multiple
vlen representations are valid depending on properties of the type,
e.g. structures.  But we can start simple, adjusting ints, floats,
and slices to work this way, and leaving everything else as is.

libctf/ChangeLog
2021-03-18  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_enc>: Remove.
	<dtd_u.dtu_slice>: Likewise.
	<dtd_vlen>: New.
	* ctf-create.c (ctf_add_generic): Perhaps allocate it.  All
	callers adjusted.
	(ctf_dtd_delete): Free it.
	(ctf_add_slice): Use the dtd_vlen, not dtu_enc.
	(ctf_add_encoded): Likewise.  Assert that this must be an int or
	float.
	* ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen.
	* ctf-dedup.c (ctf_dedup_rhash_type): Use the dtd_vlen, not
	dtu_slice.
	* ctf-types.c (ctf_type_reference): Likewise.
	(ctf_type_encoding): Remove most dynamic-type-specific code: just
	get the vlen from the right place.  Report failure to look up the
	underlying type's encoding.
2021-03-18 12:40:36 +00:00
Nick Alcock
ac36e134d9 libctf: reimplement many _iter iterators in terms of _next
Ever since the generator-style _next iterators were introduced, there
have been separate implementations of the functional-style _iter
iterators that do the same thing as _next.

This is annoying and adds more dependencies on the internal guts of the
file format.  Rip them all out and replace them with the corresponding
_next iterators.  Only ctf_archive_raw_iter and ctf_label_iter survive,
the former because there is no access to the raw binary data of archives
via any _next iterator, and the latter because ctf_label_next hasn't
been implemented (because labels are currently not used for anything).

Tested by reverting the change (already applied) that reimplemented
ctf_member_iter in terms of ctf_member_next, then verifying that the
_iter and _next iterators produced the same results for every iterable
entity within a large type archive.

libctf/ChangeLog
2021-03-02  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-types.c (ctf_member_iter): Move 'rc' to an inner scope.
	(ctf_enum_iter): Reimplement in terms of ctf_enum_next.
	(ctf_type_iter): Reimplement in terms of ctf_type_next.
	(ctf_type_iter_all): Likewise.
	(ctf_variable_iter): Reimplement in terms of ctf_variable_next.
	* ctf-archive.c (ctf_archive_iter_internal): Remove.
	(ctf_archive_iter): Reimplement in terms of ctf_archive_next.
2021-03-02 15:09:18 +00:00
Nick Alcock
ee87f50b8d libctf: always name nameless types "", never NULL
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the
raw, unadorned name of CTF types, have one unfortunate wrinkle: they
return NULL not only on error but when returning the name of types
without a name in writable dicts.  This was unintended: it not only
makes it impossible to reliably tell if a given call to
ctf_type_name_raw failed (due to a bad string offset say), but also
complicates all its callers, who now have to check for both NULL and "".

The written-out form of CTF has no concept of a NULL pointer instead of
a string: all null strings are strtab offset 0, "".  So the more we can
do to remove this distinction from the writable form, the less complex
the rest of our code needs to be.

Armour against NULL in multiple places, arranging to return "" from
ctf_type_name_raw if offset 0 is passed in, and removing a risky
optimization from ctf_str_add* that avoided doing anything if a NULL was
passed in: this added needless irregularity to the functions' API
surface, since "" and NULL should be treated identically, and in the
case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to
the list of references to be updated no matter what the content of the
string happens to be.

This means we can simplify the deduplicator a tiny bit, also fixing a
bug (latent when used by ld) where if the input dict was writable,
we failed to realise when types were nameless and could end up creating
deeply unhelpful synthetic forwards with no name, which we just banned
a few commits ago, so the link failed.

libctf/ChangeLog
2021-01-27  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-string.c (ctf_str_add): Treat adding a NULL as adding "".
	(ctf_str_add_ref): Likewise.
	(ctf_str_add_external): Likewise.
	* ctf-types.c (ctf_type_name_raw): Always return "" for offset 0.
	* ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour
	against NULL name.
	(ctf_dedup_maybe_synthesize_forward): Likewise.
2021-02-04 16:01:53 +00:00
Nick Alcock
b4b6ea4680 libctf, ld: fix formatting of forwards to unions and enums
The type printer was unconditionally printing these as if they were
forwards to structs, even if they were forwards to unions or enums.

ld/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* testsuite/ld-ctf/enum-forward.c: New test.
	* testsuite/ld-ctf/enum-forward.c: New results.

libctf/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-types.c (ctf_type_aname): Print forwards to unions and enums
	properly.
2021-01-05 14:53:40 +00:00
Nick Alcock
6c3a38777b libctf, include: support unnamed structure members better
libctf has no intrinsic support for the GCC unnamed structure member
extension.  This principally means that you can't look up named members
inside unnamed struct or union members via ctf_member_info: you have to
tiresomely find out the type ID of the unnamed members via iteration,
then look in each of these.

This is ridiculous.  Fix it by extending ctf_member_info so that it
recurses into unnamed members for you: this is still unambiguous because
GCC won't let you create ambiguously-named members even in the presence
of this extension.

For consistency, and because the release hasn't happened and we can
still do this, break the ctf_member_next API and add flags: we specify
one flag, CTF_MN_RECURSE, which if set causes ctf_member_next to
automatically recurse into unnamed members for you, returning not only
the members themselves but all their contained members, so that you can
use ctf_member_next to identify every member that it would be valid to
call ctf_member_info with.

New lookup tests are added for all of this.

include/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (CTF_MN_RECURSE): New.
	(ctf_member_next): Add flags argument.

libctf/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.h (struct ctf_next) <u.ctn_next>: Move to...
	<ctn_next>: ... here.
	* ctf-util.c (ctf_next_destroy): Unconditionally destroy it.
	* ctf-lookup.c (ctf_symbol_next): Adjust accordingly.
	* ctf-types.c (ctf_member_iter): Reimplement in terms of...
	(ctf_member_next): ... this.  Support recursive unnamed member
	iteration (off by default).
	(ctf_member_info): Look up members in unnamed sub-structs.
	* ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_member_next call.
	(ctf_dedup_emit_struct_members): Likewise.
	* testsuite/libctf-lookup/struct-iteration-ctf.c: Test empty unnamed
	members, and a normal member after the end.
	* testsuite/libctf-lookup/struct-iteration.c: Verify that
	ctf_member_count is consistent with the number of successful returns
	from a non-recursive ctf_member_next.
	* testsuite/libctf-lookup/struct-iteration-*: New, test iteration
	over struct members.
	* testsuite/libctf-lookup/struct-lookup.c: New test.
	* testsuite/libctf-lookup/struct-lookup.lk: New test.
2021-01-05 14:53:40 +00:00
Nick Alcock
37002871ac libctf, ld: dump enums: generally improve dump formatting
This commit adds dumping of enumerands in this general form:

    0x3: (kind 8) enum eleven_els (size 0x4) (aligned at 0x4)
         ELEVEN_ONE: 10
         ELEVEN_TWO: 11
         ELEVEN_THREE: -256
         ELEVEN_FOUR: -255
         ELEVEN_FIVE: -254
         ...
         ELEVEN_SEVEN: -252
         ELEVEN_EIGHT: -251
         ELEVEN_NINE: -250
         ELEVEN_TEN: -249
         ELEVEN_ELEVEN: -248

The first and last enumerands in the enumerated type are printed so that
you can tell if they've been cut off at one end or the other.  (For now,
there is no way to control how many enumerands are printed.)

The dump output in general is improved, from this sort of thing a few
days ago:

     4c: char [0x0:0x8] (size 0x1)
        [0x0] (ID 0x4c) (kind 1) char:8 (aligned at 0x1, format 0x3, offset:bits 0x0:0x8)
     4d: char * (size 0x8) -> 4c: char [0x0:0x8] (size 0x1)
        [0x0] (ID 0x4d) (kind 3) char * (aligned at 0x8)
[...]
     5a: struct _IO_FILE (size 0xd8)
        [0x0] (ID 0x5a) (kind 6) struct _IO_FILE (aligned at 0x4)
            [0x0] (ID 0x3) (kind 1) int _flags:32 (aligned at 0x4, format 0x1, offset:bits 0x0:0x20)
            [0x40] (ID 0x4d) (kind 3) char * _IO_read_ptr (aligned at 0x8)
            [0x80] (ID 0x4d) (kind 3) char * _IO_read_end (aligned at 0x8)
            [0xc0] (ID 0x4d) (kind 3) char * _IO_read_base (aligned at 0x8)
     5b: __FILE (size 0xd8) -> 5a: struct _IO_FILE (size 0xd8)
        [0x0] (ID 0x5b) (kind 10) __FILE (aligned at 0x4)
            [0x0] (ID 0x3) (kind 1) int _flags:32 (aligned at 0x4, format 0x1, offset:bits 0x0:0x20)
            [0x40] (ID 0x4d) (kind 3) char * _IO_read_ptr (aligned at 0x8)
            [0x80] (ID 0x4d) (kind 3) char * _IO_read_end (aligned at 0x8)
            [0xc0] (ID 0x4d) (kind 3) char * _IO_read_base (aligned at 0x8)
[...]
     406: struct coff_link_hash_entry (size 0x60)
        [0x0] (ID 0x406) (kind 6) struct coff_link_hash_entry (aligned at 0x8)
            [0x0] (ID 0x2b3) (kind 6) struct bfd_link_hash_entry root (aligned at 0x8)
                [0x0] (ID 0x1d6) (kind 6) struct bfd_hash_entry root (aligned at 0x8)
                    [0x0] (ID 0x1d7) (kind 3) struct bfd_hash_entry * next (aligned at 0x8)
                    [0x40] (ID 0x61) (kind 3) const char * string (aligned at 0x8)
                    [0x80] (ID 0x1) (kind 1) long unsigned int hash:64 (aligned at 0x8, format 0x0, offset:bits 0x0:0x40)
                [0xc0] (ID 0x397) (kind 8) enum bfd_link_hash_type  type:8 (aligned at 0x1, format 0x0, offset:bits 0x0:0x8)
                [0xc8] (ID 0x1c7) (kind 1) unsigned int  non_ir_ref_regular:1 (aligned at 0x1, format 0x0, offset:bits 0x8:0x1)
                [0xc9] (ID 0x1c8) (kind 1) unsigned int  non_ir_ref_dynamic:1 (aligned at 0x1, format 0x0, offset:bits 0x9:0x1)
                [0xca] (ID 0x1c9) (kind 1) unsigned int  linker_def:1 (aligned at 0x1, format 0x0, offset:bits 0xa:0x1)
                [0xcb] (ID 0x1ca) (kind 1) unsigned int  ldscript_def:1 (aligned at 0x1, format 0x0, offset:bits 0xb:0x1)
                [0xcc] (ID 0x1cb) (kind 1) unsigned int  rel_from_abs:1 (aligned at 0x1, format 0x0, offset:bits 0xc:0x1)

... to this:

    0x4c: (kind 1) char (format 0x3) (size 0x1) (aligned at 0x1)
    0x4d: (kind 3) char * (size 0x8) (aligned at 0x8) -> 0x4c: (kind 1) char (format 0x3) (size 0x1) (aligned at 0x1)
    0x5a: (kind 6) struct _IO_FILE (size 0xd8) (aligned at 0x4)
          [0x0] _flags: ID 0x3: (kind 1) int (format 0x1) (size 0x4) (aligned at 0x4)
          [0x40] _IO_read_ptr: ID 0x4d: (kind 3) char * (size 0x8) (aligned at 0x8)
          [0x80] _IO_read_end: ID 0x4d: (kind 3) char * (size 0x8) (aligned at 0x8)
          [0xc0] _IO_read_base: ID 0x4d: (kind 3) char * (size 0x8) (aligned at 0x8)
          [0x100] _IO_write_base: ID 0x4d: (kind 3) char * (size 0x8) (aligned at 0x8)
    0x5b: (kind 10) __FILE (size 0xd8) (aligned at 0x4) -> 0x5a: (kind 6) struct _IO_FILE (size 0xd8) (aligned at 0x4)
[...]
    0x406: (kind 6) struct coff_link_hash_entry (size 0x60) (aligned at 0x8)
           [0x0] root: ID 0x2b3: (kind 6) struct bfd_link_hash_entry (size 0x38) (aligned at 0x8)
               [0x0] root: ID 0x1d6: (kind 6) struct bfd_hash_entry (size 0x18) (aligned at 0x8)
                   [0x0] next: ID 0x1d7: (kind 3) struct bfd_hash_entry * (size 0x8) (aligned at 0x8)
                   [0x40] string: ID 0x61: (kind 3) const char * (size 0x8) (aligned at 0x8)
                   [0x80] hash: ID 0x1: (kind 1) long unsigned int (format 0x0) (size 0x8) (aligned at 0x8)
               [0xc0] type: ID 0x397: (kind 8) enum bfd_link_hash_type (format 0x7f2e) (size 0x1) (aligned at 0x1)
               [0xc8] non_ir_ref_regular: ID 0x1c7: (kind 1) unsigned int:1 [slice 0x8:0x1] (format 0x0) (size 0x1) (aligned at 0x1)
               [0xc9] non_ir_ref_dynamic: ID 0x1c8: (kind 1) unsigned int:1 [slice 0x9:0x1] (format 0x0) (size 0x1) (aligned at 0x1)
               [0xca] linker_def: ID 0x1c9: (kind 1) unsigned int:1 [slice 0xa:0x1] (format 0x0) (size 0x1) (aligned at 0x1)
               [0xcb] ldscript_def: ID 0x1ca: (kind 1) unsigned int:1 [slice 0xb:0x1] (format 0x0) (size 0x1) (aligned at 0x1)
               [0xcc] rel_from_abs: ID 0x1cb: (kind 1) unsigned int:1 [slice 0xc:0x1] (format 0x0) (size 0x1) (aligned at 0x1)
[...]

In particular, indented subsections are only present for actual structs
and unions, not forwards to them, and the structure itself doesn't add a
spurious level of indentation; structure field names are easier to spot
(at the cost of not making them look so much like C field declarations
any more, but they weren't always shown in valid decl syntax even before
this change) the size, type kind, and alignment are shown for all types
for which they are meaningful; bitfield info is only shown for actual
bitfields within structures and not ordinary integral fields; and type
IDs are never omitted.  Type printing is in general much more consistent
and there is much less duplicated code in the type dumper.

There is one user-visible effect outside the dumper: ctf_type_(a)name
was erroneously emitting a trailing space on the name of slice types,
even though a slice of an int and an int with the corresponding encoding
represent the same type and should have the same print form.  This
trailing space is now gone.

ld/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* testsuite/ld-ctf/array.d: Adjust for dumper changes.
	* testsuite/ld-ctf/conflicting-cycle-1.B-1.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-1.B-2.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-1.parent.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-2.A-1.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-2.A-2.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-2.parent.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-3.C-1.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-3.C-2.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-3.parent.d: Likewise.
	* testsuite/ld-ctf/conflicting-enums.d: Likewise.
	* testsuite/ld-ctf/conflicting-typedefs.d: Likewise.
	* testsuite/ld-ctf/cross-tu-cyclic-conflicting.d: Likewise.
	* testsuite/ld-ctf/cross-tu-cyclic-nonconflicting.d: Likewise.
	* testsuite/ld-ctf/cross-tu-into-cycle.d: Likewise.
	* testsuite/ld-ctf/cross-tu-noncyclic.d: Likewise.
	* testsuite/ld-ctf/cycle-1.d: Likewise.
	* testsuite/ld-ctf/cycle-2.A.d: Likewise.
	* testsuite/ld-ctf/cycle-2.B.d: Likewise.
	* testsuite/ld-ctf/cycle-2.C.d: Likewise.
	* testsuite/ld-ctf/data-func-conflicted.d: Likewise.
	* testsuite/ld-ctf/diag-cttname-null.d: Likewise.
	* testsuite/ld-ctf/diag-cuname.d: Likewise.
	* testsuite/ld-ctf/diag-parlabel.d: Likewise.
	* testsuite/ld-ctf/diag-wrong-magic-number-mixed.d: Likewise.
	* testsuite/ld-ctf/forward.d: Likewise.
	* testsuite/ld-ctf/function.d: Likewise.
	* testsuite/ld-ctf/slice.d: Likewise.
	* testsuite/ld-ctf/super-sub-cycles.d: Likewise.
	* testsuite/ld-ctf/enums.c: New test.
	* testsuite/ld-ctf/enums.d: New test.

libctf/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-decl.c (ctf_decl_push): Exclude slices from the decl stack.
	* ctf-types.c (ctf_type_aname): No longer deal with slices here.
	* ctf-dump.c (ctf_dump_membstate_t) <cdm_toplevel_indent>: Constify.
	(CTF_FT_REFS): New.
	(CTF_FT_BITFIELD): Likewise.
	(CTF_FT_ID): Likewise.
	(ctf_dump_member): Do not do indentation here. Migrate the
	type-printing parts of this into...
	(ctf_dump_format_type): ... here, to be shared by all type printers.
	Get the errno value for non-representable types right.  Do not print
	bitfield info for non-bitfields.  Improve the format and indentation
	of other type output.  Shuffle spacing around to make all indentation
	either 'width of column' or 4 chars.
	(ctf_dump_label): Pass CTF_FT_REFS to ctf_dump_format_type.
	(ctf_dump_objts): Likewise.  Spacing shuffle.
	(ctf_dump_var): Likewise.
	(type_hex_digits): Migrate down in the file, to above its new user.
	(ctf_dump_type): Indent here instead.  Pass CTF_FT_REFS to
	ctf_dump_format_type. Don't trim off excess linefeeds now we no
	longer generate them.  Dump enumerated types.
2021-01-05 14:53:39 +00:00
Nick Alcock
ffeece6ac2 libctf, ld: prohibit getting the size or alignment of forwards
C allows you to do only a very few things with entities of incomplete
type (as opposed to pointers to them): make pointers to them and give
them cv-quals, roughly. In particular you can't sizeof them and you
can't get their alignment.

We cannot impose all the requirements the standard imposes on CTF users,
because the deduplicator can transform any structure type into a forward
for the purposes of breaking cycles: so CTF type graphs can easily
contain things like arrays of forward type (if you want to figure out
their size or alignment, you need to chase down the types this forward
might be a forward to in child TU dicts: we will soon add API functions
to make doing this much easier).

Nonetheless, it is still meaningless to ask for the size or alignment of
forwards: but libctf didn't prohibit this and returned nonsense from
internal implementation details when you asked (it returned the kind of
the pointed-to type as both the size and alignment, because forwards
reuse ctt_type as a type kind, and ctt_type and ctt_size overlap).  So
introduce a new error, ECTF_INCOMPLETE, which is returned when you try
to get the size or alignment of forwards: we also return it when you try
to do things that require libctf itself to get the size or alignment of
a forward, notably using a forward as an array index type (which C
should never do in any case) or adding forwards to structures without
specifying their offset explicitly.

The dumper will not emit size or alignment info for forwards any more.

(This should not be an API break since ctf_type_size and ctf_type_align
could both return errors before now: any code that isn't expecting error
returns is already potentially broken.)

include/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (ECTF_INCOMPLETE): New.
	(ECTF_NERR): Adjust.

ld/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* testsuite/ld-ctf/conflicting-cycle-1.parent.d: Adjust for dumper
	changes.
	* testsuite/ld-ctf/cross-tu-cyclic-conflicting.d: Likewise.
	* testsuite/ld-ctf/forward.c: New test...
	* testsuite/ld-ctf/forward.d: ... and results.

libctf/ChangeLog
2021-01-05  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-types.c (ctf_type_resolve): Improve comment.
	(ctf_type_size): Yield ECTF_INCOMPLETE when applied to forwards.
	Emit errors into the right dict.
	(ctf_type_align): Likewise.
	* ctf-create.c (ctf_add_member_offset): Yield ECTF_INCOMPLETE
	when adding a member without explicit offset when this member, or
	the previous member, is incomplete.
	* ctf-dump.c (ctf_dump_format_type): Do not try to print the size of
	forwards.
	(ctf_dump_member): Do not try to print their alignment.
2021-01-05 14:53:39 +00:00