Commit Graph

483 Commits

Author SHA1 Message Date
Nick Alcock
13c3c2087a Revert "libctf: fix linking of non-root-visible types"
This reverts commit 87b2f67310.

It is based on a misconception, that hidden types in the deduplicator
input should always be hidden in the output.  For cu-mapped links,
and final links following cu-mapped links, this is not true: we want
to hide inputs if they were conflicting on the output and no more.

We will reintroduce the testcase once a better fix is found.

libctf/
	PR libctf/33047
	* ctf-dedup.c (ctf_dedup_emit_type): Don't respect the nonroot flag.
	* testsuite/libctf-writable/ctf-nonroot-linking.c: Removed.
	* testsuite/libctf-writable/ctf-nonroot-linking.lk: Removed.
2025-06-25 12:27:06 +01:00
Nick Alcock
4bdc7aed03 libctf: testsuite fixes for datasec size changes 2025-05-20 14:35:37 +01:00
Nick Alcock
a9f8ddc5ae libctf: archive, open: when opening, always set errp to something
ctf_arc_import_parent, called by the cached-opening machinery used by
ctf_archive_next and archive-wide lookup functions like
ctf_arc_lookup_symbol, has an err-pointer parameter like all other opening
functions.  Unfortunately it unconditionally initializes it whenever
provided, even if there was no error, which can lead to its being
initialized to an uninitialized value.  This is not technically an
API-contract violation, since we don't define what happens to the error
value except when an error happens, but it is still unpleasant.

Initialize it only when there is an actual error, so we never initialize it
to an uninitialized value.

While we're at it, improve all the opening pathways: on success, set errp to
0, rather than leaving it what it was, reducing the likelihood of
uninitialized error param returns in callers too.  (This is inconsistent
with the treatment of ctf_errno(), but the err value being a parameter
passed in from outside makes the divergence acceptable: in open functions,
you're never going to be overwriting some old error value someone might want
to keep around across multiple calls, some of which are successful and some
of which are not.)

Soup up existing tests to verify all this.

Thanks to Bruce McCulloch for the original patch, and Stephen Brennan for
the report.

libctf/
	PR libctf/32903
	* ctf-archive.c (ctf_arc_open_internal): Zero errp on success.
	(ctf_dict_open_sections): Zero errp at the start.
	(ctf_arc_import_parent): Intialize err.
	* ctf-open.c (ctf_bufopen): Zero errp at the start.
	* testsuite/libctf-lookup/add-to-opened.c: Make sure one-element
	archive opens update errp.
	* testsuite/libctf-writable/ctf-compressed.c: Make sure real archive
	opens update errp.
2025-05-20 14:34:55 +01:00
Nick Alcock
93ae1ab31e libctf: spec: be more specific about Solaris CTF versions
Solaris has a CTFv3 now, modelled on FreeBSD's: be explicit that we
are derived from Solaris CTFv2, not v3.

(The spec is not updated for CTFv4/BTF at all yet.)
2025-04-25 21:55:35 +01:00
Nick Alcock
71e1cc6fba libctf: API change documentation (NOT FOR UPSTREAMING)
These probably need to be turned into libctf/NEWS content once we decide (if
we decide) that these changes are good.  (I do hope we don't make too many
changes because it'll be horribly disruptive, but I wouldn't be surprised to
see a few...)
2025-04-25 21:54:28 +01:00
Nick Alcock
4a4fbfd42e libctf: by-kind tests
These tiny testcases test opening-and-dumping of single type kinds,
and also linking and then opening-and-dumping.
2025-04-25 21:50:14 +01:00
Nick Alcock
66bc737718 libctf: run_lookup_test: force BTF emission (NOT FOR UPSTREAMING)
Pro tem as a hack until GCC supports -gctf for v4, or v3 upgrading
is supported, or direct CTF-then-BTF tests are written, just emit
BTF for test purposes.

This breaks most of the tests: DO NOT UPSTREAM.
2025-04-25 21:49:22 +01:00
Nick Alcock
0ad8ddc8b4 libctf: run_lookup_test: support per-test options
This lets you say e.g.

    run_lookup_test [file rootname $ctf_test ] {link: on}

to turn on the {link: on} option for all tests, as if specified in every
test file.
2025-04-25 21:48:22 +01:00
Nick Alcock
1ec572d423 libctf: dump: dump conflicting CUs, when declared 2025-04-25 21:23:08 +01:00
Nick Alcock
8586d4d1fd libctf: dump: dump struct-based bitfields 2025-04-25 21:23:08 +01:00
Nick Alcock
fa2ed703f7 libctf: dump: dump variables and datasecs 2025-04-25 21:23:08 +01:00
Nick Alcock
a72c896298 libctf: dump: dump the header; dump enum64s; adapt to API changes
A bunch of dumper changes.  Most importantly, adapt to the changes in the _f
iteration function prototypes by no longer carrying around our own cds_fp
dict pointer everywhere but just using the one we are given by the iteration
function.

But also, dump the v3 and v4/BTF headers separately, using the stored
original v3-pre-upgrade header copy if present.  The v3 dumper is not tested
yet, of course, but is more or less unchanged from the old code, so probably
nearly works.  The v4 dumper is tested.

Add enum64 support (basically just a bit of extra code to print the
signedness of enums).
2025-04-25 21:23:08 +01:00
Nick Alcock
918e356b18 libctf: archive: allow opening BTF dicts in archives (not for upstreaming)
BTF dicts are normally suppressed in archives, but it is possible
to create them with enough cunning.  If such an archive is
encountered, the BTF dicts in it have no parent name, which
means that ctf_arc_import_parent (used by ctf_dict_open_cached,
ctf_archive_next, and all the ctf_arc_lookup functions) fails
to figure out what parent to import, and fails.

Kludge around it by relying on our secret knowledge that ctf_link_write
always emits the parent dict into the archive first.  If no name is set,
import the parent dict for now.  (Before upstreaming, a new archive format
with a dedicated parent dict field will turn up, obviating this kludge.)
2025-04-25 21:23:08 +01:00
Nick Alcock
88f2c13d1c libctf: archive: fix ctf_dict_open_cached error handling
We were misreporting a failure to ctf_dict_open the dict as
an out-of-memory error.
2025-04-25 21:23:08 +01:00
Nick Alcock
02bfc04f73 libctf: link: improve BTF child dict naming
BTF dicts don't have a cuname, which means that when the deduplicator runs
over them any child dicts that result from conflicted types found in those
CUs end up with no name either.  Detect such unnamed dicts and propagate
in the name the linker gave them at input time instead.  (There is always
*some* such name, even if it's something totally useless like "#1"; usually
it's much more useful.)
2025-04-25 21:23:08 +01:00
Nick Alcock
3aacd0f9c0 libctf: ctf-link: minor comment improvements 2025-04-25 21:23:07 +01:00
Nick Alcock
7bea1097ec libctf: dedup: conflicting CU names and merging into the parent
The last two dedup changes are, firstly, to use ctf_add_conflicting() to
arrange that conflicting types that are hidden because they are added to the
same dict as the types they conflict with (e.g. conflicting types in
modules) are properly marked with the CU name that the type comes from.
This could of course not be done with the old non-root flag, but now that we
have proper prefix types, we can record it, and consumers can find out what
CU any type comes from via ctf_type_conflicting (or, for non-kernel CTF
generated by GNU ld, via the ctf_cuname of the per-cu dict).

Secondly, we add a new kind of CU mapping for cu-mapped (two-stage) links
(as a reminder, these carry out a second stage of dedupping in which they
squash specific CUs down to a named set of child dicts, fusing named inputs
into particular named outputs: the kernel linker uses this to make child
dicts that represent modules rather than translation units). You can now map
any CU name to "" (the null string).  This indicates that types that would
land in the CU in question should not be emitted into any sort of per-module
dict but should instead just be emitted into the shared dict, possibly being
marked conflicting as they do so.  The usual popcount mechanism will be used
to pick the type which is left unhidden.  The usual forwarding stubs you
would expect to find for conflicting structs and unions will not be emitted:
instead, real structs and unions will take their place.  Consumers must take
care when chasing parent types that point to tagged structs to make sure
that there isn't a correspondingly-named struct in the child they're looking
at (but this is generally a problem with type chasing in children anyway,
which I have a TODO open to find some sort of solution to: this should be
being done automatically, and isn't).
2025-04-25 21:23:07 +01:00
Nick Alcock
f38832b398 libctf: dedup: decl tag support.
Decl tags to types and to functions and function arguments are relatively
straightforward, as are decl tags to structures as a whole or to members of
untagged structures; but decl tags to specific members of tagged structs and
unions have two separate nasty problems, entirely down to the use of tagged
structures to break cycles in the type graph.

The first is that we have to mark decl tags conflicting if their associated
struct is conflicting, but traversal from types to their parents halts at
tagged structs and unions, because the type graph is sharded via stubs at
those points and conflictedness ceases.  But we don't want to do that here:
a decl_tag to member 10 of some struct is only valid if that struct *has*
ten members, and if the struct is conflicted, some may have only one.  The
decl tag is only valid for the specific struct-with-ten-members it was
originally pointing at, anyway: other structs-with-ten-members may have
entirely different members there, which are not tagged or which are tagged
with something else.

So we track this by keeping track of the only thing that is knowable about
struct/union stubs: their decorated name.  The citers graph gains mappings
from decorated SoU names to decl tags (where the decl tag has a
component_idx), and conflictedness marking chases that and marks
accordingly, via the new ctf_dedup_mark_conflicting_hash_citers.

The second problem is that we have to emit decl tags to struct members of
all kinds after the members are emitted, but the members are emitted later
than core type deduplication because they might refer to any types in the
dict, including types added after the struct was added.  So we need to
accumulate decl tags to struct members in a new hashtab
(cd_emission_struct_decl_tags) and add yet *another* pass that traverses
that and emits all the decl tags in it.  (If it turns out that decl tags to
other things can similarly appear before the type they refer to, we'll
either have to sort them earlier or emit them at the end as well -- but this
seems unlikely.)

None of this complexity is properly tested, because we're not yet emitting
decl tags (as far as I know).  But at least it doesn't break anything else,
and it's somewhere to start.
2025-04-25 21:23:07 +01:00
Nick Alcock
bf735030ac libctf: dedup: type tags
Another trivial case: they're just like pointers except that they have a
name (and we don't need to care about that, because names are hashed in, if
present, anyway).
2025-04-25 21:23:07 +01:00
Nick Alcock
4db605353c libctf: dedup: datasecs and vars
These are a bit trickier than previous things.  Datasecs are unusual: the
content they contain for a given variable is conceptually part of that
variable, in that a variable can only appear in one datasec: so if two TUs
have different datasec values for a variable, you'll want to emit two
conflicting variables with different datasec entries.  Equally, if they
have entries in different datasecs, they're conflicting.  But the *index*
of a variable in a datasec has nothing to do with the variable: it's just
a property of how many other variables are in the datasec.

So we turn the type graph upside down for them.  We track the variable ->
datasec mappings for every variable we are dedupping, and use this to hash
variables with datasec entries *twice*: firstly, as purely variable type,
name, and promoted-to-non-extern linkage, and secondly with all of that plus
the datasec name, offset and size: we indicate that the non-extern hash
*replaces* the extern one, and use this later on.  The datasec itself is not
hashed at all!  We skip it at both hashing and emission time (without
breaking anything else, because nothing points at datasecs, so nothing will
ever recurse down into one).

The popcount code (used to find the "most popular" type, the one to put in
the shared dict) changes to say that replaced types (extern vars) popcounts
are added to the counts of the types that replace them (the corresponding
non-extern vars).

At emission time, replaced variables (extern variables) are skipped,
ensuring that extern vars with non-conflicting non-extern counterparts are
skipped in favour of the non-extern ones.  ctf_add_section_variable then
takes care of emitting both the var and its corresponding datasec for us.
2025-04-25 21:23:07 +01:00
Nick Alcock
6b8885cfc9 libctf: dedup: structs with bitfields, BTF floats
The last two trivial cases.  Hash in the bitfieldness of structs and the
bit-width of members (their bit-offset is already being hashed in), and emit
them accordingly.

BTF floats hardly have any state: emitting them is even easier.
2025-04-25 21:23:07 +01:00
Nick Alcock
95eb77bddb libctf: dedup: enums, enum64s, functions, func linkage
These are all fairly simple and are handled together because some of the
diffs are annoyingly entwined.

enum and enum64 are trivial: it's just like enums used to be, except that we
hash in the unsignedness value, and emit signed or unsigned enums or enum64s
appropriately.  (The signedness stuff on the emission side is fairly
invisible: it's automatically handled for us by ctf_type_encoding and
ctf_add_enum*_encoded, via the CTF_INT_SIGNED encoding.)

Functions are also fairly simple: we hash in all the parameter names as well
as the args, and emit them accordingly.

Linkage is more difficult.  We want to deduplicate extern and non-extern
declarations together, while leaving static ones separate.  We do this by
promoting extern linkage to global at hashing time, and maintaining a
cd_linkages hashmap which maps from type hash values of func linkages (and
vars) to the best linkage known so far, then updating it if a better one
("less extern") comes along (relying on the fact that we are already
unifying the hashes of otherwise-identical extern and non-extern types).  At
emission time, we use this hashtab to figure out what linkage to emit.
2025-04-25 21:23:07 +01:00
Nick Alcock
81b9312ac4 libctf: dedup: comment fixes, debug indentation changes, and a tiny leak
Getting these out of the way to avoid them wrecking the diffs for the next
commits.
2025-04-25 21:23:07 +01:00
Nick Alcock
adc6ca003a libctf: dedup: fix a broken error path in string dedup
If we run out of memory updating the string counts, set the right errno:
ctf_dynhash_insert returns a *negative* error value, and we want a positive
one in the ctf_errno.
2025-04-25 21:23:07 +01:00
Nick Alcock
3a6e1f87e7 libctf: dedup: chase API changes: use the public API more
To get ready for the deduplicator changes, we chase the API changes to
things like ctf_member_next, and add support for prefix types (using the
suffix where appropriate, etc).  We use the ctf-types API for things like
forward lookup, using the private _tp functions to reduce overhead while
centralizing knowledge of things like the encoding of enum forwards outside
the deduplicator.

No functional changes yet.
2025-04-25 21:23:07 +01:00
Nick Alcock
f170154176 libctf: drop unnecessary macro
Every use of this macro has been deleted.
2025-04-25 21:23:07 +01:00
Nick Alcock
27d5d0ccc7 libctf: open-bfd: open BTF dicts
Teaching ctf_open and ctf_fdopen to open BTF dicts if passed is quite
simple: we just need to check the magic number and allow BTF dicts
into the lower-level ctf_simple_open machinery (which ultimately
calls ctf_bufopen).
2025-04-25 21:23:07 +01:00
Nick Alcock
0a283f3d7a libctf: link: drop unnecessary back-compatibility code
We no longer need to ensure that inputs have a new-format func info
section: no such sections exist in CTFv4 (and the v3 compatibility
code will throw away old-format sections).
2025-04-25 21:23:07 +01:00
Nick Alcock
9ea8bea7f0 libctf: link: BTF support
This is in two parts, one new API function and one change.

New API:
+int ctf_link_output_is_btf (ctf_dict_t *);

Changed API:
unsigned char *ctf_link_write (ctf_dict_t *, size_t *size,
-			      size_t threshold);
+			      size_t threshold, int *is_btf);

The idea here is that callers can call ctf_link_output_is_btf on a
ctf_link()ed (deduplicated) dict to tell whether a link will yield
BTF-compatible output before actually generating that output, so
they can e.g. decide whether to avoid trying to compress the dict
if they know it would be BTF otherwise (since compressing a dict
renders it non-BTF-compatible).

ctf_link_write() gains an optional is_btf output parameter that
reports whether the dict that was finally generated is actually BTF
after all, perhaps because the caller didn't call
ctf_link_output_is_btf or wants to be robust against possible future
changes that may add other reasons why a written-out dict can't be BTF
at the last minute.

These are simple wrappers around already-existing machinery earlier in
this series.
2025-04-25 21:23:07 +01:00
Nick Alcock
343de78445 libctf: strings: don't check for non-deduplicable atoms in the parent
Callers of ctf_str_add_no_dedup_ref are indicating that they would like the
string they have added a reference to to appear in the current dict and not
be deduplicated into the parent.  This is true even if the string already
exists in the parent, so we should not check for strings in the parent and
reuse them in this case.
2025-04-25 18:17:33 +01:00
Nick Alcock
3520fb4568 libctf: serialize: finish off the serializer
The only remaining parts of serialization that need fixing up is
ctf_preserialize, which despite its name does nearly all the work of
serialization: the only bit it doesn't do is write the string tables
(since that has to happen across dicts after all the dicts have otherwise
been laid out, in order to deduplicate the strtabs).

As usual in this series, there's adjustment for various field name changes
(maxtypes -> ntypes, the move into ctf_serialize, etc), and extra work to
figure out whether we're emitting BTF or not and to handle the distinction
between CTF and BTF headers, and not try to emit CTF-only stuff like the
symtypetabs into BTF dicts; we can also throw out a bunch of old code that
sets compatibility flags, everything to do with forcing variables into the
dynamic state in case they changed (we're going to handle that more
generally for everything in the types table at a later date, outside
serialization), and everything to do with special handling of variables in
general.

But much of that is only a couple of lines each, and most of the changes are
mechanical: this is probably the simplest serialization commit in this
series.
2025-04-25 18:12:47 +01:00
Nick Alcock
176afc3c8b libctf: open: fix closing of children with imported parents
Closing a parent dict for the last time erases all its types and strings,
which makes type and string lookups in any surviving children impossible
from then on.  Since children hold a reference to their parent, this can
only happen in ctf_dict_close of the last child, after the parent has
been closed by the caller as well.  Since DTD deletion now involves
doing type and string lookups in order to clean out the name tables,
close the parent only after the child DTDs have been deleted.
2025-04-25 18:09:02 +01:00
Nick Alcock
908a7e7167 libctf: open, types: ctf_import for BTF
ctf_import needs a bunch of fixes to work with pure BTF dicts -- and, for
that matter, importing newly-created parent dicts that have never been
written out, which may have a bunch of nonprovisional types (if types were
added to it before any imports were done) or may not (if at least one
ctf_import into it was done before any types were added).

So we adjust things so that the values that are checked against are the
nonprovisional-types values: the header revisions actually changed the name
of cth_parent_typemax to cth_parent_ntypes to make this clearer, so catch up
with that.  In the parent, we have to use ctf_idmax, not ctf_typemax.

One thing we must prohibit is that you cannot add a bunch of types to a
child and then import a parent into it: the type IDs will all be wrong
and the string offsets more so.  This was partly prohibited: prohibit it
entirely (excepting only that the not-actually-written-out void type
we might add to new BTF dicts does not influence this check).

Since BTF children don't have a cth_parent_ntypes or a cth_parent_strlen, we
cannot check this stuff, but just set them and hope.
2025-04-25 18:07:44 +01:00
Nick Alcock
d5012389a4 libctf: serialize: handle CTF-versus-BTF output format checks
The internal function ctf_serialize_output_format centralizes all the checks
for BTF-versus-CTF, checking to see if the type section, active
suppressions, and BTF-emission mode permit BTF emission, setting
ctf_serialize.cs_is_btf if we are actually BTF, and raising ECTF_NOTBTF if
we are requiring BTF emission but the type section is such that we can't
emit it.

(There is a forcing parameter in place, as with most of these serialization
functions, to allow for the caller to force CTF emission if it knows the
output will be compressed or will be part of multi-member archives or
something else external to the type section that BTF does not support.)
2025-04-25 18:07:44 +01:00
Nick Alcock
585f569a2d libctf: serialize: size and emit the type section
As with sizing, this needs to support type suppression and CTF_K_BIG
elision, and adapt to the DTD representation changes.  Those changes cause a
general complexity reduction because we no longer have to memcpy the vlen
into place separately for every type kind, but can do it all at once using
shared code above the per-kind switch statement.  That statement's only job
now is generating refs out of type IDs and string offsets, and translating
the struct offset from gap- into non-gap representation for non-big structs.

We do three distinct things:

 - check whether all the types in a section are BTF-compatible, after
   suppression of unwanted type kinds (including types with unwanted
   prefixes), and elision of unneeded struct/union CTF_K_BIGs

 - size the type section, taking suppression and CTF_K_BIG elision into
   account

 - actually emit it, again taking all the above into account

These all have to come to the same conclusions for every type: if the first
one gets things wrong we might try to emit something as BTF when we can't;
if the latter two are inconsistent, we might have a buffer overrun.

So the type emission code double-checks BTF-compatibility and raises
ECTF_NOTBTF if necessary; we also aggressively check for potential overruns
before every memcpy() into the buffer and raise an ECTF_INTERNAL assertion
failure if need be.  Thankfully there are a lot fewer memcpy()s than there
used to be: there are only four places we need to check, all close to each
other, which is pretty maintainable.

We add a bit of debugging when --enable-libctf-hash-debugging is on,
printing the translation from provisional to final type ID so that you can
use it to map back to the provisional ID again when trying to track down
deduplicator problems, since the IDs the deduplicator will report at its
emission time are only provisional (the final parent-relative IDs are not
assigned until now).
2025-04-25 18:07:44 +01:00
Nick Alcock
67cd167767 libctf: serialize: type section sizing
This is made much simpler by the fact that the DTD representation
now tracks the size of each vlen, so we don't need per-type-kind
code to track it ourselves any more.  There's extra code to handle
type suppression, CTF_K_BIG elision, and prefixes.
2025-04-25 18:07:44 +01:00
Nick Alcock
db98972145 libctf: serialize: check the type section for BTF-incompatible types
We add a new ctf_type_sect_is_btf function (internal to ctf-serialize.c) to
check the type section against the write prohibitions list and (after
write-suppression) against the set of types allowed in BTF, and determine
whether this type section contains any types BTF does not allow.

CTF-specific type kinds like CTF_K_FLOAT are obviously prohibited in BTF, as
are CTF-specific prefixes, except that CTF_K_BIG is allowed if and only if
both its ctt_size and vlen are still zero: in that case it will be elided by
type section writeout and will never appear in the BTF at all.

Structs are checked to make sure they don't use any nameless padding members
and that (if they are bitfields) all their offsets will still fit after
conversion from CTF_K_BIG gap-between-struct-members representation (if they
are not bitfields, we know they will fit, but for bitfields, they might be
too big).
2025-04-25 18:07:44 +01:00
Nick Alcock
5ec23dfb74 libctf: strings: no external strings in BTF
One of the things BTF doesn't have is the concept of external strings which
can be shared with the ELF strtab.  Therefore, even if the linker has
reported strings which the dict is reusing, when we generate the strtab for
a BTF dict we should emit those strings into it (and we should certainly
not cause the presence of external strings to prevent BTF emission!)

Note that since already-written strtab entries are never erased, writing a
dict as BTF and then CTF will cause external strings to be emitted even for
the CTF.  This sort of repeated writing in different formats seems to be
very rare: in any case, the problem can be avoided by simply doing the CTF
writeout first (the following BTF writeout will spot the missing external-
in-CTF strings and add them).

We also throw away the internal-only function ctf_strraw_explicit(), which
was used to add strings with a hardwired strtab: it was only ever used to
write out the variable section, which is gone in v4.
2025-04-25 18:07:44 +01:00
Nick Alcock
c14bdfc7a4 libctf: serialize: kind suppression and prohibition
The CTF serialization machinery decides whether to write out a dict as BTF
or CTF (or, in LIBCTF_BTM_BTF mode, whether to write out a dict or fail with
ECTF_NOTBTF) in part by looking at the type kinds in the dictionary.

It is possible that you'd like to extend this check and ban specific type
kinds from the dictionary (possibly even if it's CTF); it's also possible
that you'd like to *not* fail even if a CTF-only kind is found, but rather
replace it with a still-valid stub (CTF_K_UNKNOWN / BTF_KIND_UNKNOWN) and
keep going.  (The kernel's btfarchive machinery does this to ensure that
the compiler and previous link stages have emitted only valid BTF type
kinds.)

ctf_write_suppress_kind supports both these use cases:

+int ctf_write_suppress_kind (ctf_dict_t *fp, int kind, int prohibited);

This commit adds only the core population code: the actual suppression is
spread across the serializer and will be added in the next commits.
2025-04-25 18:07:44 +01:00
Nick Alcock
2c5f74300a libctf: serialize: user control over BTF-versus-CTF writeout
We need some way for users to declare that they want BTF or CTF in
particular to be written out when they ask for it, or that they don't mind
which.  Adding this to all the ctf_write functions (like the compression
threshold already is) would be a bit of a nightmare: there are a great many
of them and this doesn't seem like something people would want to change
on a per-dict basis (even if we did, we'd need to think about archives and
linking, which work on a higher level than single dicts).

So we repurpose an unused, vestigial existing function, ctf_version(), which
was originally intended to do some sort of rather unclear API switching at
runtime, to allow switching between different CTF file format versions (not
yet supported, you have to pass CTF_VERSION) and BTF writeout modes:

/* BTF/CTF writeout version info.

   ctf_btf_mode has three levels:

   - LIBCTF_BTM_ALWAYS writes out full-blown CTFv4 at all times
   - LIBCTF_BTM_POSSIBLE writes out CTFv4 if needed to avoid information loss,
     BTF otherwise.  If compressing, the same as LIBCTF_BTM_ALWAYS.
   - LIBCTF_BTM_BTF writes out BTF always, and errors otherwise.

   Note that no attempt is made to downgrade existing CTF dicts to BTF: if you
   read in a CTF dict and turn on LIBCTF_BTM_POSSIBLE, you'll get a CTF dict; if
   you turn on LIBCTF_BTM_BTF, you'll get an unconditional error.  Thus, this is
   really useful only when reading in BTF dicts or when creating new dicts.  */

typedef enum ctf_btf_mode
{
  LIBCTF_BTM_BTF = 0,
  LIBCTF_BTM_POSSIBLE = 1,
  LIBCTF_BTM_ALWAYS = 2
} ctf_btf_mode_t;

/* Set the CTF library client version to the specified version: this is the
   version of dicts written out by the ctf_write* functions.  If version is
   zero, we just return the default library version number.  The BTF version
   (for CTFv4 and above) is indicated via btf_hdr_len, also zero for "no
   change".

    You can influence what type kinds are written out to a CTFv4 dict via the
    ctf_write_suppress_kind() function.  */

extern int ctf_version (int ctf_version_, size_t btf_hdr_len,
			ctf_btf_mode_t btf_mode);

(We retain the ctf_version_ stuff to leave space in the API to let the
library possibly do file format downgrades in future, since we've already
had requests for such things from users.)
2025-04-25 18:07:44 +01:00
Nick Alcock
f782340ba5 libctf, serialize: preparatory steps
The new serializer is quite a lot more customizable than the old, because it
can write out BTF as well as CTF: you can ask to write out BTF or fail,
write out CTF if required to avoid information loss, otherwise BTF, or
always write out CTF.

Callers often need to find out whether a dict could be written out as BTF
before deciding how to write it out (because a dict can never be written out
as BTF if it is compressed, a caller might well want to ask if there is
anything else that prevents BTF writeout -- say, slices, conflicting types,
or CTF_K_BIG -- before deciding whether to compress it).  GNU ld will do
this whenever it is passed only BTF sections on the input.

Figuring out whether a dict can be written out as BTF is quite expensive: we
have to traverse all the types and check them, including every member of
every struct.  So we'd rather do that work only once.  This means making a
lot of state once private to ctf_preserialize public enough that another
function can initialize it; and since the whole API is available after
calling this function and before serializing, we should probably arrange
that if we do things we know will invalidate the results of all this
checking, we are forced to do it again.

This commit does that, moving all the existing serialization state into a
new ctf_serialize_t and adding to it.  Several functions grow force_ctf
arguments that allow the caller to force CTF emission even if the type
section looks BTFish: the writeout code and archive creation use this to
force CTF emission if we are compressing, and archive creation uses it
to force CTF emission if a CTF multi-member archive is in use, because
BTF doesn't support archives at all so there's no point maintaining
BTF compatibility in that case.  The ctf_write* functions gain support for
writing out BTF headers as well as CTF, depending on whether what was
ultimately written out was actually BTF or not.

Even more than most commits in this series, there is no way this is
going to compile right now: we're in the middle of a major transition,
completed in the next few commits.
2025-04-25 18:07:44 +01:00
Nick Alcock
3c5eb5b20a libctf: lookup, open: chase header field changes
Nothing exciting here, just header fields slightly changing name
and a couple of new comments and indentation fixes.
2025-04-25 18:07:43 +01:00
Nick Alcock
f7f72bcca6 libctf, open: new API for getting the size of CTF/BTF file sections
I wrote this for BTF type size querying programs, but it might be
of more general use and it's impossible to get this info in any
other way, so we might want to keep it.

New API:
+size_t ctf_sect_size (ctf_dict_t *, ctf_sect_names_t sect);
2025-04-25 18:07:43 +01:00
Nick Alcock
4837852527 libctf: types: access to raw type data
This new API lets users ask for the raw type data associated with a type
(either the whole lot including prefixes, or just the suffix if this is not
a CTF_K_BIG type), and then they can manipulate it using ctf.h functions
or whatever else they like.  Doing this does not preclude using libctf
querying functions at the same time (just don't change the type!  It's
const for a reason).

New API:

+const ctf_type_t *ctf_type_data (ctf_dict_t *, ctf_id_t, int prefix);

This function was unimplementable before the DTD changes, because the
ctf_type_t and vlen were separated in memory: but now they're always stored
in a single buffer, it's reliable and simple, indeed trivial.
2025-04-25 18:07:43 +01:00
Nick Alcock
33326f571f libctf: types: recursive type visiting
ctf_type_visit and ctf_type_rvisit have to adapt to the internal
API changes, but also to the change in the representation of
structures.  The new code is quite a lot simpler than the old,
because we don't need to roll our own iterator but can just use
ctf_member_next.

API changes, the usual for the *_f typedefs and anything to do with
structures:

-typedef int ctf_visit_f (const char *name, ctf_id_t type, unsigned long offset,
-			 int depth, void *arg);
+typedef int ctf_visit_f (ctf_dict_t *, const char *name, ctf_id_t type,
+			 size_t offset, int bit_width, int depth,
 			 void *arg);
2025-04-25 18:07:43 +01:00
Nick Alcock
1ece8c93c0 libctf, create: the unknown type
Just as for typedefs, this is just catching up with API changes on the
type-addition side.
2025-04-25 18:07:43 +01:00
Nick Alcock
83e9ca77b2 libctf, create: typedefs
Nothing here but adjustment to internal API changes.  Typedefs have no
special properties that need querying, so there are no changes to
ctf-types.c at all.
2025-04-25 18:07:43 +01:00
Nick Alcock
bd0c033b29 libctf, create, types: slices
Nothing difficult for this CTF-specific type kind, just the usual adjustment
to internal API changes.
2025-04-25 18:07:43 +01:00
Nick Alcock
0cd5118024 libctf: create, types: arrays
The same internal API changes for arrays.  There is one ABI change here,
to ctf_arinfo_t:

-  uint32_t ctr_nelems;		/* Number of elements.  */
+  size_t ctr_nelems;		/* Number of elements.  */
2025-04-25 18:07:43 +01:00
Nick Alcock
d65d03bec4 libctf: create, types: reftypes and pointers
This is pure adjustment for internal API changes, and a change to the
type-compatibility of pointers to type 0 now that it can be void as well as
"unrepresentable".

By now this dance should be quite familiar.
2025-04-25 18:07:43 +01:00