Commit Graph

142 Commits

Author SHA1 Message Date
Nick Alcock
0d366df443 libctf: use __attribute__((__gnu_printf__)) where appropriate
We don't use any GNU-specific printf args, but this prevents warnings about
%z, observed on MinGW even though every libc anyone is likely to use there
supports %z perfectly well, and we're not stopping using it just because
MinGW complains.  Doing this means we stand more chance of seeing *actual*
problems on such platforms without them being drowned in noise.

We turn this off on clang, which doesn't support __gnu_printf__.

Suggested by Eli Zaretskii.

libctf/
	PR libctf/31863
	* ctf-impl.h (_libctf_printflike_): Use __gnu_printf__.
2025-06-26 15:50:48 +01:00
Nick Alcock
00f6dcc33c libctf: dedup: preserve non-root flag across normal links
The previous commits dropped preservation of the non-root flag in ctf_link
and arranged to use it somewhat differently to track conflicting types in
cu-mapped CUs when doing cu-mapped links.  This was necessary to prevent
entirely spuriously hidden types from appearing on the output of such links.

Bring it (and the test for it) back.  The problem with the previous design
was that it implicitly assumed that the non-root flag it saw on the input
was always meant to be preserved (when in the final phase of cu-mapped links
it merely means that conflicting types were found in intermediate links),
and also that it could figure out what the non-root flag on the input was by
sucking in the non-root flag of the input type corresponding to an output in
the output mapping (which maps type hashes to a corresponding type on some
input).

This method of getting properties of the input type *does* work *if* that
property was one of those hashed by the ctf_dedup_hash_type process.  In
that case, every type with a given hash will have the same value for all
hashed-in properties, so it doesn't matter which one is consulted (the
output mapping points at an arbitrary one of those input types).  But the
non-root flag is explicitly *not* hashed in: as a comment in
ctf_dedup_rhash_type notes, being non-root is not a property of a type, and
two types (one non-root, one not) can perfectly well be the same type even
though one is visible and one isn't.  So just copying the non-root flag from
the output mapping's idea of the input type will copy in a value that is not
stabilized by the hash, so is more-or-less random!

So we cannot do that.  We have to do something else, which means we have to
decide what to do if two identical types with different nonroot flag values
pop up.  The most sensible thing to do is probably to say that if all
instances of a type are non-root-visible, the linked output should also be
non-root-visible: any root-visible types in that set, and the output type is
root-visible again.

We implement this with a new cd_nonroot_consistency dynhash, which maps type
hashes to the value 0 ("all instances root-visible"), 1 ("all instances
non-root-visible") or 2 ("inconsistent").  After hashing is over, we save a
bit of memory by deleting everything from this hashtab that doesn't have a
value of 1 ("non-root-visible"), then use this to decide whether to emit any
given type as non-root-visible or not.

However... that's not quite enough.  In cu-mapped links, we want to
disregard this whole thing because we just hide everything -- but in phase
2, when we take the smushed-together CUs resulting from phase 1 and
deduplicate them against each other, we want to do what the previous commits
implemented and ignore the non-root flag entirely, instead falling back to
preventing clashes by hiding anything that would be considered conflicting.
We extend the existing cu_mapped parameter to various bits of ctf_dedup so
that it is now tristate: 0 means a normal link, 1 means the smush-it-
together phase of cu-mapped links, and 2 means the final phase of cu-mapped
links.  We do the hide-conflicting stuff only in phase 2, meaning that
normal links by GNU ld can always respect the value of the nonroot flag put
on types in the input.

(One extra thing added as part of this: you can now efficiently delete the
last value returned by ctf_dynhash_next() by calling
ctf_dynhash_next_remove.)

We bring back the ctf-nonroot-linking test with one tweak: linking now works
on mingw as long as you're using the ucrt libc, so re-enable it for better
test coverage on that platform.

libctf/
	PR libctf/33047
	* ctf-hash.c (ctf_dynhash_next_remove): New.
	* ctf-impl.h (struct ctf_dedup) [cd_nonroot_consistency]: New.
	* ctf-link.c (ctf_link_deduplicating):  Differentiate between
	cu-mapped and non-cu-mapped links, even in the final phase.
	* ctf-dedup.c (ctf_dedup_hash_type): Callback prototype addition.
	Get the non-root flag and pass it down.
	(ctf_dedup_rhash_type): Callback prototype addition. Document
	restrictions on use of the nonroot flag.
	(ctf_dedup_populate_mappings): Populate cd_nonroot_consistency.
	(ctf_dedup_hash_type_fini): New function: delete now-unnecessary
	values from cd_nonroot_consistency.
	(ctf_dedup_init): Initialize it.
	(ctf_dedup_fini): Destroy it.
	(ctf_dedup): cu_mapping is now cu_mapping_phase.  Call
	ctf_dedup_hash_type_fini.
	(ctf_dedup_emit_type): Use cu_mapping_phase and
	cd_nonroot_consistency to propagate the non-root flag into outputs
	for normal links, and to do name-based conflict checking only for
	phase 2 of cu-mapped links.
	(ctf_dedup_emit): cu_mapping is now cu_mapping_phase.  Adjust
	assertion accordingly.
	* testsuite/libctf-writable/ctf-nonroot-linking.c: Bring back.
	* testsuite/libctf-writable/ctf-nonroot-linking.lk: Likewise.
2025-06-26 15:50:48 +01:00
Nick Alcock
0e7d3016f2 libctf: tiny comment typo fix
ctf_next_t's internal unions don't just cover dicts, but all sorts of other
things too.
2025-05-28 16:06:26 +01:00
Nick Alcock
16e0dd9aab libctf: archive: format v2
This commit does a bunch of things, all tangled together tightly enough that
disentangling them seemed no to be worth doing.

The biggest is a new archive format, v2, identified by a magic number which
is one higher than the v1 format's magic number.  As usual with libctf we
can only write out the new format, but can still read the old one.

The new format has multiple improvements over the old:

 - It is written native-endian and aggressively endian-swapped at open time,
   just like CTF and BTF dicts; format v1 was little-endian, necessitating
   byteswapping all over the place at read and write time rather than
   localized in one pair of functions at read time.

 - The modent array of name-offset -> archive-offset mappings for the CTF
   archives is explicitly pointed at via a new ctfa_modents header member
   rather than just starting after the end of the header.

 - The length that prepends each archive member actually indicates its
   length rather than always being sizeof (uint64_t) bytes too high (this
   was an outright bug)

 - There is a new shared properties table which in future we may be able to
   use to unify common values from the constituent CTF headers, reducing the
   size overhead of these (repeated, uncompressed) entities.  Right now it
   only contains one value, parent_name, which is the parent dict name if
   one is common across all dicts in the archive (always true for any
   archives derived from ctf_link()).  This is used to let
   ctf_archive_next() et al reliably open dicts in the archive even if they
   are child BTF dicts (which do not contain a header name).

   The properties table shares its property names with the CTF members,
   and uses the same format (and shared code) for the property values as for
   CTF archive members: length-prepended.  The archive members and
   name->value table ("modents") use distinct tables for properties and CTF
   dicts, to ensure they are spatially separated in the file, to maximize
   compressibility if we end up with a lot of properties and people compress
   the whole thing.

We can also restrict various old bug-workaround kludges that only apply to
dicts found in v1 archives: in particular, we needed to dig out the preamble
of some CTF dicts without opening them to figure out whether they used the
.dynstr or .strtab sections: this whole bug workaround is now unnecessary
for v2 and above.

There are other changes for readability and consistency:

 - The archive wrapper data structure, known outside ctf-archive.c as
   ctf_archive_t, is now consistently referred to inside ctf-archive.c as
   'struct ctf_archive_internal' and given the parameter name 'arci' rather
   than sometimes using ctf_archive_t and sometimes using 'wrapper' or 'arc'
   as parameter names.  The archive itself is always called 'struct
   ctf_archive' to emphasise that it is *not* a ctf_archive_t.
   ctf_archive_t remains the public typedef: the fact that it's not actually
   the same thing as the archive file format is an internal implementation
   detail.

 - We keep the archive header around in a new ctfi_hdr member, distinct
   from the actual archive itself, to make upgrading from v1 and cross-
   endianness support easier.  The archive itself is now kept as a char *
   and used only to root pointer arithmetic.
2025-05-28 15:11:37 +01:00
Nick Alcock
3aacd0f9c0 libctf: ctf-link: minor comment improvements 2025-04-25 21:23:07 +01:00
Nick Alcock
f38832b398 libctf: dedup: decl tag support.
Decl tags to types and to functions and function arguments are relatively
straightforward, as are decl tags to structures as a whole or to members of
untagged structures; but decl tags to specific members of tagged structs and
unions have two separate nasty problems, entirely down to the use of tagged
structures to break cycles in the type graph.

The first is that we have to mark decl tags conflicting if their associated
struct is conflicting, but traversal from types to their parents halts at
tagged structs and unions, because the type graph is sharded via stubs at
those points and conflictedness ceases.  But we don't want to do that here:
a decl_tag to member 10 of some struct is only valid if that struct *has*
ten members, and if the struct is conflicted, some may have only one.  The
decl tag is only valid for the specific struct-with-ten-members it was
originally pointing at, anyway: other structs-with-ten-members may have
entirely different members there, which are not tagged or which are tagged
with something else.

So we track this by keeping track of the only thing that is knowable about
struct/union stubs: their decorated name.  The citers graph gains mappings
from decorated SoU names to decl tags (where the decl tag has a
component_idx), and conflictedness marking chases that and marks
accordingly, via the new ctf_dedup_mark_conflicting_hash_citers.

The second problem is that we have to emit decl tags to struct members of
all kinds after the members are emitted, but the members are emitted later
than core type deduplication because they might refer to any types in the
dict, including types added after the struct was added.  So we need to
accumulate decl tags to struct members in a new hashtab
(cd_emission_struct_decl_tags) and add yet *another* pass that traverses
that and emits all the decl tags in it.  (If it turns out that decl tags to
other things can similarly appear before the type they refer to, we'll
either have to sort them earlier or emit them at the end as well -- but this
seems unlikely.)

None of this complexity is properly tested, because we're not yet emitting
decl tags (as far as I know).  But at least it doesn't break anything else,
and it's somewhere to start.
2025-04-25 21:23:07 +01:00
Nick Alcock
4db605353c libctf: dedup: datasecs and vars
These are a bit trickier than previous things.  Datasecs are unusual: the
content they contain for a given variable is conceptually part of that
variable, in that a variable can only appear in one datasec: so if two TUs
have different datasec values for a variable, you'll want to emit two
conflicting variables with different datasec entries.  Equally, if they
have entries in different datasecs, they're conflicting.  But the *index*
of a variable in a datasec has nothing to do with the variable: it's just
a property of how many other variables are in the datasec.

So we turn the type graph upside down for them.  We track the variable ->
datasec mappings for every variable we are dedupping, and use this to hash
variables with datasec entries *twice*: firstly, as purely variable type,
name, and promoted-to-non-extern linkage, and secondly with all of that plus
the datasec name, offset and size: we indicate that the non-extern hash
*replaces* the extern one, and use this later on.  The datasec itself is not
hashed at all!  We skip it at both hashing and emission time (without
breaking anything else, because nothing points at datasecs, so nothing will
ever recurse down into one).

The popcount code (used to find the "most popular" type, the one to put in
the shared dict) changes to say that replaced types (extern vars) popcounts
are added to the counts of the types that replace them (the corresponding
non-extern vars).

At emission time, replaced variables (extern variables) are skipped,
ensuring that extern vars with non-conflicting non-extern counterparts are
skipped in favour of the non-extern ones.  ctf_add_section_variable then
takes care of emitting both the var and its corresponding datasec for us.
2025-04-25 21:23:07 +01:00
Nick Alcock
95eb77bddb libctf: dedup: enums, enum64s, functions, func linkage
These are all fairly simple and are handled together because some of the
diffs are annoyingly entwined.

enum and enum64 are trivial: it's just like enums used to be, except that we
hash in the unsignedness value, and emit signed or unsigned enums or enum64s
appropriately.  (The signedness stuff on the emission side is fairly
invisible: it's automatically handled for us by ctf_type_encoding and
ctf_add_enum*_encoded, via the CTF_INT_SIGNED encoding.)

Functions are also fairly simple: we hash in all the parameter names as well
as the args, and emit them accordingly.

Linkage is more difficult.  We want to deduplicate extern and non-extern
declarations together, while leaving static ones separate.  We do this by
promoting extern linkage to global at hashing time, and maintaining a
cd_linkages hashmap which maps from type hash values of func linkages (and
vars) to the best linkage known so far, then updating it if a better one
("less extern") comes along (relying on the fact that we are already
unifying the hashes of otherwise-identical extern and non-extern types).  At
emission time, we use this hashtab to figure out what linkage to emit.
2025-04-25 21:23:07 +01:00
Nick Alcock
f170154176 libctf: drop unnecessary macro
Every use of this macro has been deleted.
2025-04-25 21:23:07 +01:00
Nick Alcock
9ea8bea7f0 libctf: link: BTF support
This is in two parts, one new API function and one change.

New API:
+int ctf_link_output_is_btf (ctf_dict_t *);

Changed API:
unsigned char *ctf_link_write (ctf_dict_t *, size_t *size,
-			      size_t threshold);
+			      size_t threshold, int *is_btf);

The idea here is that callers can call ctf_link_output_is_btf on a
ctf_link()ed (deduplicated) dict to tell whether a link will yield
BTF-compatible output before actually generating that output, so
they can e.g. decide whether to avoid trying to compress the dict
if they know it would be BTF otherwise (since compressing a dict
renders it non-BTF-compatible).

ctf_link_write() gains an optional is_btf output parameter that
reports whether the dict that was finally generated is actually BTF
after all, perhaps because the caller didn't call
ctf_link_output_is_btf or wants to be robust against possible future
changes that may add other reasons why a written-out dict can't be BTF
at the last minute.

These are simple wrappers around already-existing machinery earlier in
this series.
2025-04-25 21:23:07 +01:00
Nick Alcock
5ec23dfb74 libctf: strings: no external strings in BTF
One of the things BTF doesn't have is the concept of external strings which
can be shared with the ELF strtab.  Therefore, even if the linker has
reported strings which the dict is reusing, when we generate the strtab for
a BTF dict we should emit those strings into it (and we should certainly
not cause the presence of external strings to prevent BTF emission!)

Note that since already-written strtab entries are never erased, writing a
dict as BTF and then CTF will cause external strings to be emitted even for
the CTF.  This sort of repeated writing in different formats seems to be
very rare: in any case, the problem can be avoided by simply doing the CTF
writeout first (the following BTF writeout will spot the missing external-
in-CTF strings and add them).

We also throw away the internal-only function ctf_strraw_explicit(), which
was used to add strings with a hardwired strtab: it was only ever used to
write out the variable section, which is gone in v4.
2025-04-25 18:07:44 +01:00
Nick Alcock
c14bdfc7a4 libctf: serialize: kind suppression and prohibition
The CTF serialization machinery decides whether to write out a dict as BTF
or CTF (or, in LIBCTF_BTM_BTF mode, whether to write out a dict or fail with
ECTF_NOTBTF) in part by looking at the type kinds in the dictionary.

It is possible that you'd like to extend this check and ban specific type
kinds from the dictionary (possibly even if it's CTF); it's also possible
that you'd like to *not* fail even if a CTF-only kind is found, but rather
replace it with a still-valid stub (CTF_K_UNKNOWN / BTF_KIND_UNKNOWN) and
keep going.  (The kernel's btfarchive machinery does this to ensure that
the compiler and previous link stages have emitted only valid BTF type
kinds.)

ctf_write_suppress_kind supports both these use cases:

+int ctf_write_suppress_kind (ctf_dict_t *fp, int kind, int prohibited);

This commit adds only the core population code: the actual suppression is
spread across the serializer and will be added in the next commits.
2025-04-25 18:07:44 +01:00
Nick Alcock
2c5f74300a libctf: serialize: user control over BTF-versus-CTF writeout
We need some way for users to declare that they want BTF or CTF in
particular to be written out when they ask for it, or that they don't mind
which.  Adding this to all the ctf_write functions (like the compression
threshold already is) would be a bit of a nightmare: there are a great many
of them and this doesn't seem like something people would want to change
on a per-dict basis (even if we did, we'd need to think about archives and
linking, which work on a higher level than single dicts).

So we repurpose an unused, vestigial existing function, ctf_version(), which
was originally intended to do some sort of rather unclear API switching at
runtime, to allow switching between different CTF file format versions (not
yet supported, you have to pass CTF_VERSION) and BTF writeout modes:

/* BTF/CTF writeout version info.

   ctf_btf_mode has three levels:

   - LIBCTF_BTM_ALWAYS writes out full-blown CTFv4 at all times
   - LIBCTF_BTM_POSSIBLE writes out CTFv4 if needed to avoid information loss,
     BTF otherwise.  If compressing, the same as LIBCTF_BTM_ALWAYS.
   - LIBCTF_BTM_BTF writes out BTF always, and errors otherwise.

   Note that no attempt is made to downgrade existing CTF dicts to BTF: if you
   read in a CTF dict and turn on LIBCTF_BTM_POSSIBLE, you'll get a CTF dict; if
   you turn on LIBCTF_BTM_BTF, you'll get an unconditional error.  Thus, this is
   really useful only when reading in BTF dicts or when creating new dicts.  */

typedef enum ctf_btf_mode
{
  LIBCTF_BTM_BTF = 0,
  LIBCTF_BTM_POSSIBLE = 1,
  LIBCTF_BTM_ALWAYS = 2
} ctf_btf_mode_t;

/* Set the CTF library client version to the specified version: this is the
   version of dicts written out by the ctf_write* functions.  If version is
   zero, we just return the default library version number.  The BTF version
   (for CTFv4 and above) is indicated via btf_hdr_len, also zero for "no
   change".

    You can influence what type kinds are written out to a CTFv4 dict via the
    ctf_write_suppress_kind() function.  */

extern int ctf_version (int ctf_version_, size_t btf_hdr_len,
			ctf_btf_mode_t btf_mode);

(We retain the ctf_version_ stuff to leave space in the API to let the
library possibly do file format downgrades in future, since we've already
had requests for such things from users.)
2025-04-25 18:07:44 +01:00
Nick Alcock
f782340ba5 libctf, serialize: preparatory steps
The new serializer is quite a lot more customizable than the old, because it
can write out BTF as well as CTF: you can ask to write out BTF or fail,
write out CTF if required to avoid information loss, otherwise BTF, or
always write out CTF.

Callers often need to find out whether a dict could be written out as BTF
before deciding how to write it out (because a dict can never be written out
as BTF if it is compressed, a caller might well want to ask if there is
anything else that prevents BTF writeout -- say, slices, conflicting types,
or CTF_K_BIG -- before deciding whether to compress it).  GNU ld will do
this whenever it is passed only BTF sections on the input.

Figuring out whether a dict can be written out as BTF is quite expensive: we
have to traverse all the types and check them, including every member of
every struct.  So we'd rather do that work only once.  This means making a
lot of state once private to ctf_preserialize public enough that another
function can initialize it; and since the whole API is available after
calling this function and before serializing, we should probably arrange
that if we do things we know will invalidate the results of all this
checking, we are forced to do it again.

This commit does that, moving all the existing serialization state into a
new ctf_serialize_t and adding to it.  Several functions grow force_ctf
arguments that allow the caller to force CTF emission even if the type
section looks BTFish: the writeout code and archive creation use this to
force CTF emission if we are compressing, and archive creation uses it
to force CTF emission if a CTF multi-member archive is in use, because
BTF doesn't support archives at all so there's no point maintaining
BTF compatibility in that case.  The ctf_write* functions gain support for
writing out BTF headers as well as CTF, depending on whether what was
ultimately written out was actually BTF or not.

Even more than most commits in this series, there is no way this is
going to compile right now: we're in the middle of a major transition,
completed in the next few commits.
2025-04-25 18:07:44 +01:00
Nick Alcock
fb8917ac21 libctf, create, types: type and decl tags
These are a little more fiddly than previous kinds, because their
namespacing rules are odd: they have names (so presumably we want an API to
look them up by name), but the names are not unique (they don't need to be,
because they are not entities you can refer to from C), so many distinct
tags in the same TU can have the same name.  Type tags only refer to a type
ID: decl tags refer to a specific function parameter or structure member via
a zero-indexed "component index".

The name tables for these things are a hash of name to a set of type IDs;
rather different from all the other named entities in libctf.  As a
consequence, they can presently be looked up only using their own dedicated
functions, not using ctf_lookup_by_name et al.  (It's not clear if this
restriction could ever be lifted: ctf_lookup_by_name and friends return a
type ID, not a set of them.)

They are similar enough to each other that we can at least have one function
to look up both type and decl tags if you don't care about their
component_idx and only want a type ID: ctf_tag.  (And one to iterate over
them, ctf_tag_next).

(A caveat: because tags aren't widely used or generated yet, much of this is
more or less untested and/or supposition and will need testing later.)

New API, more or less the minimum needed because it's not entirely clear how
these things will be used:

+ctf_id_t ctf_tag (ctf_dict_t *, ctf_id_t tag);
+ctf_id_t ctf_decl_tag (ctf_dict_t *, ctf_id_t decl_tag,
+		       int64_t *component_idx);
+ctf_id_t ctf_tag_next (ctf_dict_t *, const char *tag, ctf_next_t **);
+ctf_id_t ctf_add_type_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *);
+ctf_id_t ctf_add_decl_type_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *);
+ctf_id_t ctf_add_decl_tag (ctf_dict_t *, uint32_t, ctf_id_t, const char *,
+			   int component_idx);
2025-04-25 18:07:43 +01:00
Nick Alcock
a632f3ed33 libctf, types: ctf_type_kind_{iter,next} et al
These new functions let you iterate over types by kind, letting you get all
variables, all enums, all datasecs, etc.  (This is amenable to future
optimization, and some is expected shortly.)

We also add new iternal functions ctf_type_kind_{forwarded_,unsliced_,}tp
which are like the corresponding non-_tp functions except that they
take a ctf_type_t rather than a type ID: doing this allows the deduplicator
to use these nearly-public functions more.  The public ctf_type_kind*
functions are reimplemented in terms of these.

This machinery is the principal place where the magic encoding of forwards
is encoded.
2025-04-25 18:07:43 +01:00
Nick Alcock
ea21a1b2ae libctf: create, types: variables and datasecs (REVIEW NEEDED)
This is an area of significant difference from CTFv3.  The API changes
significantly, with quite a few additions to allow creation and querying of
these new datasec entities:

-typedef int ctf_variable_f (const char *name, ctf_id_t type, void *arg);
+typedef int ctf_variable_f (ctf_dict_t *, const char *name, ctf_id_t type,
+			    void *arg);
+typedef int ctf_datasec_var_f (ctf_dict_t *fp, ctf_id_t type, size_t offset,
+			       size_t datasec_size, void *arg);

+/* Search a datasec for a variable covering a given offset.
+
+   Errors with ECTF_NODATASEC if not found.  */
+
+ctf_id_t ctf_datasec_var_offset (ctf_dict_t *fp, ctf_id_t datasec,
+				 uint32_t offset);
+
+/* Return the datasec that a given variable appears in, or ECTF_NODATASEC if
+   none.  */
+
+ctf_id_t ctf_variable_datasec (ctf_dict_t *fp, ctf_id_t var);

+int ctf_datasec_var_iter (ctf_dict_t *, ctf_id_t, ctf_datasec_var_f *,
+			  void *);
+ctf_id_t ctf_datasec_var_next (ctf_dict_t *, ctf_id_t, ctf_next_t **,
+			       size_t *size, size_t *offset);

-int ctf_add_variable (ctf_dict_t *, const char *, ctf_id_t);
+/* ctf_add_variable adds variables to no datasec at all;
+   ctf_add_section_variable adds them to the given datasec, or to no datasec at
+   all if the datasec is NULL.  */
+
+ctf_id_t ctf_add_variable (ctf_dict_t *, const char *, int linkage, ctf_id_t);
+ctf_id_t ctf_add_section_variable (ctf_dict_t *, uint32_t,
+				   const char *datasec, const char *name,
+				   int linkage, ctf_id_t type,
+				   size_t size, size_t offset);

We tie datasecs quite closely to variables at addition (and, as should
become clear later, dedup) time: you never create datasecs, you only create
variables *in* datasecs, and the datasec springs into existence when you do
so: datasecs are always found in the same dict as the variables they contain
(the variables are never in the parent if the datasec is in a child or
anything).  We keep track of the variable->datasec mapping in
ctf_var_datasecs (populating it at addition and open time), to allow
ctf_variable_datasec to work at reasonable speed.  (But, as yet, there are
no tests of this function at all.)

The datasecs are created unsorted (to avoid variable addition becoming
O(n^2)) and sorted at serialization time, and when ctf_datasec_var_offset is
invoked.

We reuse the natural-alignment code from struct addition to get a plausible
offset in datasecs if an alignment of -1 is specified: maybe this is
unnecessary now (it was originally added when ctf_add_variable added
variables to a "default datasec", while now it just leaves them out of
all datasecs, like externs are).

One constraint of this is that we currently prohibit the addition of
nonrepresentable-typed variables, because we can't tell what their natural
alignment is: if we dropped the whole "align" and just required everyone
adding a variable to a datasec to specify an offset, we could drop that
restriction. WDYT?

One additional caveat: right now, ctf_lookup_variable() looks up the type of
a variable (because when it was invented, variables were not entities in
themselves that you could look up).  This name is confusing as hell as a
result.  It might be less confusing to make it return the CTF_K_VAR, but
that would be awful to adapt callers to, since both are represented with
ctf_id_t's, so the compiler wouldn't warn about the needed change at all...
I've vacillated on this three or four times now.
2025-04-25 18:07:43 +01:00
Nick Alcock
4a4312b684 libctf: types: ctf_type_resolve_nonrepresentable
This new internal function allows us to say "resolve a type to its base
type, but treat type 0 like BTF, returning 0 if it is found rather than
erroring with ECTF_NONREPRESENTABLE".  Used in the next commit.
2025-04-25 18:07:42 +01:00
Nick Alcock
20e6f72dc7 libctf: create: structure and union member addition
There is one API addition here:

int ctf_add_member_bitfield (ctf_dict_t *, ctf_id_t souid,
                             const char *, ctf_id_t type,
                             unsigned long bit_offset,
                             int bit_width);

SoU addition handles the representational changes for bitfields and for
CTF_K_BIG structs (i.e. all structs you can add members to), errors out if
you add bitfields to structs that aren't created with the
CTF_ADD_STRUCT_BITFIELDS flag, and arranges to add padding as needed if
there is too much of a gap for the offsets to encode in one hop (that
part is still untested).
2025-04-25 18:07:42 +01:00
Nick Alcock
05a2970ad1 libctf: create, lookup: delete DVDs; ctf_lookup_by_kind
Variable handling in BTF and CTFv4 works quite differently from in CTFv3.
Rather than a separate section containing sorted, bsearchable variables,
they are simply named entities like types, stored in CTF_K_VARs.

As a first stage towards migrating to this, delete most references to
the ctf_varent_t and ctf_dvdef_t, including the DVD lookup code, all
the linking code, and quite a lot of the serialization code.

Note: CTF_LINK_OMIT_VARIABLES_SECTION, and the whole "delete variables that
already exist in the symtypetabs section" stuff, has yet to be
reimplemented.  We can implement CTF_LINK_OMIT_VARIABLES_SECTION by simply
excising all CTF_K_VARs at deduplication time if requested.  (Note:
symtypetabs should still point directly at the type, not at the CTF_K_VAR.)

(Symtypetabs in general need a bit more thought -- perhaps we can now store
them in a separate .ctf.symtypetab section with its own little four-entry
header for the symtypetabs and their indexes, making .ctf even more like
.BTF; the only difference would then be that .ctf could include prefix
types, CTF_K_FLOAT, and external string refs.  For later discussion.)

We also add ctf_lookup_by_kind() at this stage (because it is hopelessly
diff-entangled with ctf_lookup_variable): this looks up a type of a
particular kind, without needing a per-kind lookup function for it,
nor needing to hack around adding string prefixes (so you can do
ctf_lookup_by_kind (fp, CTF_K_STRUCT, "foo") rather than having to
do ctf_lookup_by_name (fp, "struct foo"): often this is more convenient, and
anything that reduces string buffer manipulation in C is good.)
2025-04-25 18:07:42 +01:00
Nick Alcock
64b65a0a34 libctf: types: struct/union member querying and iteration
This commit revises ctf_member_next, ctf_member_iter, ctf_member_count, and
ctf_member_info for the new CTFv4 world.  This also pulls in a bunch of
infrastructure used by most of the type querying functions, and fundamental
changes to the way DTD records are represented in libctf (ctf-create not yet
adjusted).  Other type querying functions affected by changes in struct
representation are also changed.

There are some API changes here: new bit-width fields in ctf_member_f,
ctf_membinfo_t and ctf_member_next, and a fix to the type of the offset in
ctf_member_f, ctf_membinfo_t and and ctf_member_count.  (ctf_member_next got
the offset type right already.)

ctf_member_f also gets a new ctf_dict_t arg so that you can actually use
the member type it passes in without having to package up and pass in the
dict type yourself (a frequent need).  This change is later echoed in most
of the rest of the *_f typedefs.

 typedef struct ctf_membinfo
 {
   ctf_id_t ctm_type;		/* Type of struct or union member.  */
-  unsigned long ctm_offset;	/* Offset of member in bits.  */
+  size_t ctm_offset;		/* Offset of member in bits.  */
+  int ctm_bit_width;		/* Width of member in bits: -1: not bitfield */
 } ctf_membinfo_t;

-typedef int ctf_member_f (const char *name, ctf_id_t membtype,
-			  unsigned long offset, void *arg);
+typedef int ctf_member_f (ctf_dict_t *, const char *name, ctf_id_t membtype,
+			  size_t offset, int bit_width, void *arg);

 extern ssize_t ctf_member_next (ctf_dict_t *, ctf_id_t, ctf_next_t **,
 				const char **name, ctf_id_t *membtype,
-				int flags);
+				int *bit_width, int flags);

-int ctf_member_count (ctf_dict_t *, ctf_id_t);
+ssize_t ctf_member_count (ctf_dict_t *, ctf_id_t);

The DTD changes are that where before the ctf_dtdef_t had a dtd_data which
was the ctf_type_t type node for a type, and a separate dtd_vlen which was
the vlen buffer which (in the final serialized representation) would
directly follow that type, now it has one single buffer, dtd_buf, which
consists of a stream of one or more ctf_type_t nodes, followed by a vlen,
as it will appear in the final serialized form.  This buffer has internal
pointers into it: dtd_data is a pointer to the last ctf_type_t in the stream
(the true type node, after all prefixes), and dtd_vlen is a pointer to the
vlen (precisely one ctf_type_t after the dtd_data).  This representation is
nice because it means there is even less distinction between a dynamic type
added by ctf_add_*() and a static one read directly out of a dict: you can
traverse the entire type without caring where it came from, simplifying
most of the type querying functions.

(There are a few more things in there which will be useful mostly when
adding new types: their uses will be seen later.)

Two new nontrivial functions exist (one of which is annoyingly tangled up in
the diff, sorry about that): ctf_find_prefix, which hunts down a given
prefix (if it exists) among the possibly many that may exist on a type (so
you can ask it to find the CTF_K_BIG prefix for a type if it exists, and
it'll return you a pointer to its ctf_type_t record), and ctf_vlen, which
you hand a type ID and its ctf_type_t *, and it gives you back a pointer to
its vlen and tells you how long it is.  (This is one of only two places left
in ctf-types.c which cares whether a type is dynamic or not.  The other has
yet to be added).  Almost every function in ctf-types.c will end up calling
ctf_lookup_by_id and ctf_vlen in turn.

ctf_next_t has changed significantly: the ctn_type member is split in two so
that we can tell whether a given iterator works using types or indexes, and
we gain the ability to iterate over enum64s, DTDs themselves, and datasecs
(most of this will only be used in later commits).

The old internal function ctf_struct_member, which handled the distinction
between ctf_member_t and ctf_lmember_t, is gone.  Instead we have new code
that handles the different representation of bitfield versus non-bitfield
structs and unions, and more code to handle the different representation of
CTF_K_BIG structs and unions (their offsets are the distance from the last
offset, rather than the distance from the start of the structure).
2025-04-25 18:07:42 +01:00
Nick Alcock
ad13b7d44f libctf: CTFv4: type opening
The majority of this commit rejigs the core type table opening
code for CTFv4: there are a few ancillary bits it drags in,
indicated below.

The internal definition of a child dict (that may not have type or string
lookups performed in it until ctf_open time) used to be 'has a
cth_parent_name', but since BTF doesn't have one of those at all, we add
an additional check: a dict the first byte of whose strtab is not 0 must
be a child.  (If *either* is true, this is a child dict, which allows for
the possibility of CTF dicts with non-deduplicated strtabs -- thus with
leading \0's -- to exist in future.)

The initial sweep through the type table in init_static_types (to size
the name-table lookup hashes) also now checks for various types which
indicate that this must be a CTF dict, in addition to being adjusted
to cater for new CTFv4 representations of things like forwards.  (At
this early stage, we cannot rely on the functions in ctf-type.c to
abstract over this for us.)

We make some new hashtables for new namespace-like things: datasecs
and type and decl tags.

The main name-population loop in init_static_types_names_internal
takes prefixes into account, looking for the name on the suffix type
(where the name is always found).  LSTRUCT handling is removed (they
no longer exist); ENUM64s, enum forwards, VARs, datasecs, and type
and decl tags get their names suitably populated.  Some buggy code
which tried to populate the name tables for cvr-quals (which are
nameless) was dropped.

We add an extra pass which traverses all datasecs and keeps track of which
datasec each var is instantiated in (if any) in a new ctf_var_datasecs hash
table.  (This uses a number of type-querying functions which don't yet
exist: they'll be added in the upcoming commits.)

We handle the type 0 == void case by pointing the first element of
ctf_txlate at a type read in named "void" (making type 0 an alias to it),
or, if one doesn't exist, creating a new one (outside the type table and dtd
arrays), and pointing type 0 at that.  Since it is numbered 0 and not in the
type table or dtd arrays, it will never be written out at serialization
time, but since it is *present*, libctf consumers who expect the void type
to have an integral definition rather than being a magic number will get
what they expect.
2025-04-25 18:07:42 +01:00
Nick Alcock
f7d05ab342 libctf: CTFv4: core opening (other than the type table)
This commit modifies the core opening code to handle opening CTFv4 and BTF.
Much of the back-compatibility side is left for later and is currently
untested, as is the type table side of things.

We keep the v3 header (if any) stashed away in ctf_dict_t.ctf_v3_header, for
the sake of the CTF dumper; we "upgrade" the BTF header to CTF (so that the
rest of the code can ignore the distinction, and so that you can do CTFish
things like adding symtypetab entries even to things opened as BTF), but
keep note of the fact that it was opened as BTF in ctf_dict_t.ctf_opened_btf,
so that things like ctf_import can allow for the absence of the various
parent-length fields.

A couple of ctf_dict_t fields are renamed for consistency with the headers'
names for them (ctf_parname becomes ctf_parent_name; ctf_dynparname becomes
ctf_dyn_parent_name; ctf_cuname becomes ctf_cu_name).  Not all users are yet
adjusted.
2025-04-25 18:07:42 +01:00
Nick Alcock
99e9ab4828 libctf: adjust foreign-endian byteswapping for v4
Split into a separate commit because it's not yet really tested.

Callers not yet adjusted.
2025-04-25 18:07:41 +01:00
Nick Alcock
2ef9554023 libctf: ctf-lookup: support prefixes in ctf_lookup_by_id
ctf_lookup_by_id now has a new optional suffix argument, which,
if set, returns the suffix of a prefixed type: the ctf_type_t it
returns remains (as ever) the first one in the type (i.e. it
may be a prefix type).  This is most convenient because the prefix
is the ctf_type_t that LCTF_KIND and other LCTF functions taking
ctf_type_t's expect.

Callers not yet adjusted.
2025-04-25 18:07:41 +01:00
Nick Alcock
a80b903b45 libctf: simplify ctf_txlate
Before now, this critical internal structure was an array mapping from a
type ID to the type index of the type with that ID.  This was critical for
the old world in which ctf_update() reserialized the entire dict, so things
moved around in memory all the time: but these days, a ctf_type_t * never
moves after creation, so we can just make ctf_txlate an array of ctf_type_t *
and be done with it.

This lets us point type indexes anywhere in memory, not just to entries
in the ctf_buf, which means we can have synthetic ones for various purposes.
And we will.
2025-04-25 18:07:41 +01:00
Nick Alcock
1d70873382 libctf: dynhash/dynset: a bit of const-correctness
A pile of dynhash and dynset functions were requiring non-const hashes/sets
unnecessarily.  Fix them.
2025-04-25 18:07:41 +01:00
Nick Alcock
40aea6c596 libctf: ctf_next_t.ctn_size: make a size_t
Literally every single user would rather this is a size_t, rather
than an ssize_t.  Change it.
2025-04-25 18:07:41 +01:00
Nick Alcock
6a4a485c7b libctf: adapt core dictops for v4 and prefix types
The heart of libctf's reading code is the ctf_dictops_t and the functions it
provides for reading various things no matter what the CTF version in use:
these are called via LCTF_*() macros that translate into calls into the
dictops.

The introduction of prefix types in v4 requires changes here: in particular,
we want the ability to get the type kind of whatever ctf_type_t we are
looking at (the 'unprefixed' kind), as well as the ability to get the type
kind taking prefixes into account: and more generally we want the ability
to both look at a given prefix and look at the type as a whole.  So several
ctf_dictops_t entries are added for this (ctfo_get_prefixed_kind,
ctfo_get_prefixed_vlen).

This means API changes (no callers yet adjusted, it'll happen as we go),
because the existing macros were mostly called with e.g. a ctt_info value
and returned a type kind, while now we need to be called with the actual
ctf_type_t itself, so we can possibly walk beyond it to find the real type
record.  ctfo_get_vbytes needs adjusting for this.

We also add names to most of the ctf_type_t parameters, because suddenly we
can have up to three of them: one relating to the first entry in the type
record (which may be a prefix, usually called 'prefix'), one relating to the
true type record (which may be a suffix, so usually called 'suffix'), and
one possibly relating to some intermediate record if we have multiple
prefixes (usually called 'tp').

There is one horrible special case in here: the vlen of the new
CTF_K_FUNC_LINKAGE kind (equivalent to BTF_KIND_FUNC) is always zero: it
reuses the vlen field to encode the linkage (!).  BTF is rife with ugly
hacks like this.
2025-04-25 18:07:41 +01:00
Nick Alcock
ab3ad58be9 libctf: don't warn about unused fp in ctf_assert
When hash debugging is enabled and NDEBUG is not set, ctf_assert()
translates into a true assert().  Don't leave the fp parameter
unused in this case (which can cause compiler errors when -Werror
is also on).
2025-04-25 18:07:41 +01:00
Nick Alcock
b5d3790c66 libctf: consecutive ctf_id_t assignment
This change modifies type ID assignment in CTF so that it works like BTF:
rather than flipping the high bit on for types in child dicts, types ascend
directly from IDs in the parent to IDs in the child, without interruption
(so type 0x4 in the parent is immediately followed by 0x5 in all children).

Doing this while retaining useful semantics for modification of parents is
challenging.  By definition, child type IDs are not known until the parent
is written out, but we don't want to find ourselves constrained to adding
types to the parent in one go, followed by all child types: that would make
the deduplicator a nightmare and would frankly make the entire ctf_add*()
interface next to useless: all existing clients that add types at all
add types to both parents and children without regard for ordering, and
breaking that would probably necessitate redesigning all of them.

So we have to be a litle cleverer.

We approach this the same way as we approach strings in the recent refs
rework: if a parent has children attached (or has ever had them attached
since it was created or last read in), any new types created in the parent
are assigned provisional IDs starting at the very top of the type space and
working down.  (Their indexes in the internal libctf arrays remain
unchanged, so we don't suddenly need multigigabyte indexes!).  At writeout
(preserialization) time, we traverse the type table (and all other table
containing type IDs) and assign refs to every type ID in exactly the same
way we assign refs to every string offset (just a different set of refs --
we don't want to update type IDs with string offset values!).

For a parent dict with children, these refs are real entities in memory:
pointers to the memory locations where type IDs are stored, tracked in the
DTD of each type.  As we traverse the type table, we assign real IDs to each
type (by simple incrementation), storing those IDs in a new dtd_final_type
field in the DTD for each type.  Once the type table and all other tables
containing type IDs are fully traversed, we update all the refs and
overwrite the IDs currently residing in each with the final IDs for each
type.

That fixes up IDs in the parent dict itself (including forward references in
structs and the like: that's why the ref updates only happen at the end);
but what about child dicts' references, both to parent types and to their
own?  We add armouring to enforce that parent dicts are always serialized
before their children (which ctf-link.c already does, because it's a
precondition for strtab deduplication), and then arrange that when a ref is
added to a type whose ID has been assigned (has a dtd_final_type), we just
immediately do an update rather than storing a ref for later updating.
Since the parent is already serialized, all parent type IDs have a
dtd_final_type by this point, and all parent IDs in the children are
properly updated. The child types can now be renumbered now we now the
number of types in the parent, and their refs updated identically to what
was just done with the parent.

One wrinkle: before the child refs are updated, while we are working over
the child's type section, the type IDs in the child start from 1 (or
something like that), which might seem to overlap the parent IDs.  But this
is not the case: when you serialize the parent, the IDs written out to disk
are changed, but the only change to the representation in memory is that we
remember a dtd_final_type for each type (and use it to update all the child
type refs): its ID in memory is the same as it always was, a nonoverlapping
provisional ID higher than any other valid ID.  We enforce all of this by
asserting that when you add a ref to a type, the memory location that is
modified must be in the buffer being serialized: the code will not let you
accidentally modify the actual DTDs in memory.

We track the number of types in the parent in a new CTFv4 (not BTF) header
field (the dumper is updated): we will also use this to open CTFv3 child
dicts without change by simply declaring for them that the parent dict has
2^31 types in it (or 2^15, for v2 and below): the IDs in the children then
naturally come out right with no other changes needed.  (Right now, opening
CTFv3 child dicts requires extra compatibility code that has not been
written, but that code will no longer need to worry about type ID
differences.)

Various things are newly forbidden:

 - you cannot ctf_import() a child into a parent if you already ctf_add()ed
   types to the child, because all its IDs would change (and since you
   already cannot ctf_add() types to a child that hasn't had its parent
   imported, this in practice means only that ctf_create() must be followed
   immediately by a ctf_import() if this is a new child, which all sane
   clients were doing anyway).

 - You cannot import a child into a parent which has the wrong number of
   (non-provisional) types, again because all its IDs would be wrong:
   because parents only add types in the provisional space if children are
   attached to it, this would break the not unknown case of opening an
   archive, adding types to the parent, and only then importing children
   into it, so we add a special case: archive members which are not children
   in an archive with more than one member always pretend to have at least
   one child, so type additions in them are always provisional even before
   you ctf_import anything. In practice, this does exactly what we want,
   since all archives so far are created by the linker and have one parent
   and N children of that parent.

Because this introduces huge gaps between index and type ID for provisional
types, some extra assertions are added to ensure that the internal
ctf_type_to_index() is only ever called on types in the current dict (never
a parent dict): before now, this was just taken on trust, and it was often
wrong (which at best led to wrong results, as wrong array indexes were used,
and at worst to a buffer overflow). When hash debugging is on (suggesting
that the user doesn't mind expensive checks), every ctf_type_to_index()
triggers a ctf_index_to_type() to make sure that the operations are proper
inverses.

Lots and lots of tests are added to verify that assignment works and that
updating of every type kind works fine -- existing tests suffice for
type IDs in the variable and symtypetab sections.

The ld-ctf tests get a bunch of largely display-based updates: various
tests refer to 0x8... type IDs, which no longer exist, and because the
IDs are shorter all the spacing and alignment has changed.
2025-03-16 15:25:27 +00:00
Nick Alcock
beccf36b88 libctf: move string deduplication into ctf-archive
This means that any archive containing dicts can get its strings dedupped
together, rather than only those that are ctf_linked.

(For now, we are still constrained to ctf_linked archives, since fixing that
requires further changes to ctf_dedup_strings: but this gives us the first
half of what is necessary.)

libctf/
	* ctf-link.c (ctf_link_write): Move string dedup into...
	* ctf-archive.c (ctf_arc_preserialize): ... this new function.
	(ctf_arc_write_fd): Call it.
2025-02-28 15:13:24 +00:00
Nick Alcock
a480362d88 libctf: string: refs rework
This commit moves provisional (not-yet-serialized) string refs towards the
scheme to be used for CTF IDs in the future.  In particular

 - provisional string offsets now count downwards from just under the
   external string offset space (all bits on but the high bit).  This makes
   it possible to detect an overflowing strtab, and also makes it trivial to
   determine whether any string offset (ref) updates were missed -- where
   before we might get a slightly corrupted or incorrect string, we now get
   a huge high strtab offset corresponding to no string, and an error is
   emitted at read time.

 - refs are emitted at serialization time during the pass through the types.
   They are strictly associated with the newly-written-out buffer: the
   existing opened CTF dict is not changed, though it does still get the new
   strtab so that new refs to the same string can just refer directly to it.
   The provisional strtab hash table that contains these strings is not
   deleted after serialization (because we might serialize again): instead,
   we keep track in the parent of the lowest-yet-used ("latest") provisional
   strtab offset, and any strtab offset above that, but not external
   (high-bit-on) is considered provisional.

   This is sort-of-enforced by moving most of the ref-addition function
   declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is
   not included by ctf-create.c or ctf-open.c.

 - because we don't add refs when adding types, we don't need to handle the
   case where we add things to expanding vlens (enums, struct members) and
   have to realloc() them.  So the entire painful movable refs system can
   just be deleted, along with the ability to remove refs piecemeal at all
   (purging all of them is still possible).  Strings added during type
   addition are added via ctf_str_add(), which adds no refs: the strings are
   picked up at serialization time and refs to their final, serialized
   resting place added.  The DTDs never have any refs in them, and their
   provisional strtab offsets are never updated by the ref system.

This caused several bugs to fall out of the earlier work and get fixed.
In particular, attempts to look up a string in a child dict now search
the parent's provisional strtab too: we add some extra special casing
for the null string so we don't need to worry about deduplication
moving it somewhere other than offset zero.

Finally, the optimization that removes an unreferenced synthetic external
strtab (the record of the strings the linker has told us about, kept around
internally for lookup during late serialization) is faulty: references to a
strtab entry will only produce CTF-level refs if their value might change,
and an external string's offset won't change, so it produces no refs: worse
yet, even if we did get a ref (say, if the string was originally believed
to be internal and only later were we told that the linker knew about it
too), when we serialize a strtab, all its refs are dropped (since they've
been updated and can no longer change); so if we serialized it a second
time, its synthetic external strtab would be considered empty and dropped,
even though the same external strings as before still exist, referencing
it.  We must keep the synthetic external strtab around as long as external
strings exist that reference it, i.e. for the life of the dict.

One benefit of all this: now we're emitting provisional string offsets at
a really high value, it's out of the way of the consecutive, deduplicated
string offsets in child dicts.  So we can drop the constraint that you
cannot add strings to a dict with children, which allows us to add types
freely to parent dicts again.  What you can't do is write that dict out
again: when we serialize, we currently update the dict being serialized
with the updated strtabs: when you write a dict out, its provisional
strings become real strings, and suddenly the offsets would overlap once
more.  But opening a dict and its children, adding to it, and then
writing it out again is rare indeed, and we have a workaround: anyone
wanting to do this can just use ctf_link instead.
2025-02-28 15:13:24 +00:00
Nick Alcock
97a72b2a35 libctf: create: fix vlen / vbytes confusion
The initial_vlen parameter to ctf_add_generic is misnamed: it's not the
initial vlen (the initial number of members of a struct, etc), but rather
the initial size of the vlen region.  We have a term for that, vbytes: use
it.

Amazingly this doesn't seem to have caused any bugs to creep in.
2025-02-28 15:13:24 +00:00
Nick Alcock
dc93d01ff2 libctf: de-macroize LCTF_TYPE_TO_INDEX / LCTF_INDEX_TO_TYPE
Making these functions is unnecessary right now, but will become much
clearer shortly.

While we're at it, we can drop the third child argument to
LCTF_INDEX_TO_TYPE: it's only used for nontrivial purposes that aren't
literally the same as getting the result from the fp in one place,
in ctf_lookup_by_name_internal, and that place is easily fixed by just
looking in the right dictionary in the first place.
2025-02-28 15:13:24 +00:00
Nick Alcock
b875301e74 libctf: drop LCTF_TYPE_ISPARENT/LCTF_TYPE_ISCHILD
Parent/child determination is about to become rather more complex, making a
macro impractical.  Use the ctf_type_isparent/ischild function calls
everywhere and remove the macro.  Make them more const-correct too, to
make them more widely usable.

While we're about it, change several places that hand-implemented
ctf_get_dict() to call it instead, and armour several functions against
the null returns that were always possible in this case (but previously
unprotected-against).
2025-02-28 15:13:24 +00:00
Nick Alcock
9835747b21 libctf: generalize the ref system
Despite the removal of the separate movable ref list, the ref system as
a whole is more than complex enough to be worth generalizing now that
we are adding different kinds of ref.

Refs now are lists of uint32_t * which can be updated through the
pointer for all entries in the list and moved to new sites for all
pointers in a given range: they are no longer references to string
offsets in particular and can be references to other uint32_t-sized
things instead (note that ctf_id_t is a typedef to a uint32_t).

ctf-string.c has been adjusted accordingly (the adjustments are tiny,
more or less just turning a bunch of references to atom into
&atom->csa_refs).
2025-02-28 15:13:24 +00:00
Nick Alcock
21f748e1e3 libctf, string: delete separate movable ref storage again
This was added last year to let us maintain a backpointer to the movable
refs dynhash in movable ref atoms without spending space for the
backpointer on the majority of (non-movable) refs and also without
causing an atom which had some refs movable and some refs not movable to
dereference unallocated storage when freed.

The backpointer's only purpose was to let us locate the
ctf_str_movable_refs dynhash during item freeing, when we had nothing
but a pointer to the atom being freed.  Now we have a proper freeing
arg, we don't need the backpointer at all: we can just pass a pointer to
the dict in to the atoms dynhash as a freeing arg for the atom freeing
functions, and throw the whole backpointer and separate movable ref list
complexity away.
2025-02-28 15:13:23 +00:00
Nick Alcock
6a6a3cc9c2 libctf, hash: add support for freeing functions taking an arg
There are a bunch of places in libctf where the code is complicated
by the fact that freeing a hash key or value requires access to the
dict: more generally, they want an arg pointer to *something*.

But for the sake of being able to use free() as a freeing function,
we can't do this at all times.  We also don't want to bloat up the
hash itself with an arg value unless necessary (in the same way we
already avoid storing the key or value freeing functions unless at
least one of them is specified).

So from the outside this change is simple: add a new
ctf_dynhash_create_arg which takes a new sort of freeing function
which takes an argument.  Internally, we store the arg only when
the key or owner is set, and cast from the one freeing function
to the other iff the arg is non-NULL.  This means it's impossible
to pass a value that may or may not be NULL to the freeing
function, but that's harmless for all current uses, and allows
significant simplifications elsewhere.
2025-02-28 15:13:23 +00:00
Nick Alcock
4d2d5afa60 libctf: actually deduplicate the strtab
This commit finally implements strtab deduplication, putting together all
the pieces assembled in the earlier commits.

The magic is entirely localized to ctf_link_write, which preserializes all
the dicts (parent first), and calls ctf_dedup_strings on the parent.

(The error paths get tweaked a bit too.)

Calling ctf_dedup_strings has implications elsewhere: the lifetime rules for
the inputs versus outputs change a bit now that the child output dicts
contain references to the parent dict's atoms table.  We also pre-purge
movable refs from all the deduplicated strings before freeing any of this
because movable refs contain backreferences into the dict they came from,
which means the parent contains references to all the children!  Purging
the refs first makes those references go away so we can free the children
without creating any wild pointers, even temporarily.

There's a new testcase that identifies a regression whereby offset 0 (the
null string) and index 0 (in children now often the parent dict name,
".ctf") got mixed up, leading to anonymous structs and unions getting the
not entirely C-valid name ".ctf" instead.

May other testcases get adjusted to no longer depend on the precise layout
of the strtab.

TODO: add new tests to verify that strings are actually being deduplicated.

libctf/
	* ctf-link.c (ctf_link_write): Deduplicate strings.
	* ctf-open.c (ctf_dict_close): Free refs, then the link outputs,
        then the out cu_mapping, then the inputs, in that order.
        * ctf-string.c (ctf_str_purge_refs): Not static any more.
	* ctf-impl.h: Declare it.

ld/
	* testsuite/ld-ctf/conflicting-cycle-2.A-1.d: Don't depend on
        strtab contents.
	* testsuite/ld-ctf/conflicting-cycle-2.A-2.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-2.parent.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-3.C-1.d: Likewise.
	* testsuite/ld-ctf/conflicting-cycle-3.C-2.d: Likewise.
	* testsuite/ld-ctf/anonymous-conflicts*: New test.
2025-02-28 14:47:24 +00:00
Nick Alcock
9daceda796 libctf: dedup: add strtab deduplicator
This is a pretty simple two-phase process (count duplicates that are
actually going to end up in the strtab and aren't e.g. strings without refs,
strings with external refs etc, and move them into the parent) with one
wrinkle: we sorta-abuse the csa_external_offset field in the deduplicated
child atom (normally used to indicate that this string is located in the ELF
strtab) to indicate that this atom is in the *parent*.  If you think of
"external" as meaning simply "is in some other strtab, we don't care which
one", this still makes enough sense to not need to change the name, I hope.

This is still not called from anywhere, so strings are (still!) not
deduplicated, and none of the dedup machinery added in earlier commits does
anything yet.

libctf/
	* ctf-dedup.c (ctf_dedup_emit_struct_members): Note that strtab
	dedup happens (well) after struct member emission.
	(ctf_dedup_strings): New.
	* ctf-impl.h (ctf_dedup_strings): Declare.
2025-02-28 14:47:24 +00:00
Nick Alcock
3bec4f1f3c include, libctf: string lookup and writeout of a parent-shared strtab
The next stage of strtab sharing is actual lookup of strings in such
strtabs, interning of strings in such strtabs and writing out of
such strtabs (but not actually figuring out which strings should
be shared: that's next).

We introduce several new internal ctf_str_* API functions to augment the
existing rather large set: ctf_str_add_copy, which adds a string and always
makes a copy of it (used when deduplicating to stop dedupped strings holding
permanent references on the input dicts), and ctf_str_no_dedup_ref (which
adds a ref to a string while preventing it from ever being deduplicated,
used for header fields like the parent name, which is the same for almost
all child dicts but had still better not be stored in the parent!).

ctf_strraw_explicit, the ultimate underlying "look up a string" function
that backs ctf_strptr et al, gains the ability to automatically find strings
in the parent if the offset is < cth_parent_strlen, and generally make all
offsets parent-relative (so something at offset 1 in the child strlen will
need to be looked up at offset 257 if cth_parent_strlen is 256).  This
suffices to paste together the parent and child from the perspective
of lookup.

We do quite a lot of new checks in here, simply because it's called all over
the place and it's preferable to emit a nice error into the ctf_err_warning
stream if things go wrong.  Among other things this traps cases where you
accidentally added a string to the parent, throwing off all the offsets.
Completely invalid offsets also now add a message to the err_warning
stream.

Insertion of new atoms (the deduplicated entities underlying strings in a
given dict), already a flag-heavy operation, gains more flags, corresponding
to the new ctf_str_add_copy and ctf_str_no_dedup_ref functions: atom
addition also checks the ctf_max_children set by ctf_import and prevents
addition of new atoms to any dicts with ctf_imported children and an
already-serialized strtab.

strtab writeout gains more checks as well: you can't write out a strtab for
a child dict whose parent hasn't been serialized yet (and thus doesn't have
a serialized strtab itself); you can't write it out if the child already
depended on a shared parent strtab and that strtab has changed length.  The
null atom at offset 0 is only written to the parent strtab; and ref updating
changes to look up offsets in the parent's atoms table iff a new
CTF_STR_ATOM_IN_PARENT flag is set on the atom (this will be set by
deduplication to ensure that serializing a dict will update all its refs
properly even though a bunch of them have moved to the parent dict).

None of this actually has any *effect* yet because no string deduplication
is being carried out, and the cth_parent_strlen is still locked at 0.

include/
	* ctf-api.h (_CTF_ERRORS) [ECTF_NOTSERIALIZED]: New.
        (ECTF_NERR): Updated.

libctf/
	* ctf-impl.h (CTF_STR_ATOM_IN_PARENT): New.
	(CTF_STR_ATOM_NO_DEDUP): Likewise.
	(ctf_str_add_no_dedup_ref): New.
	(ctf_str_add_copy): New.
	* ctf-string.c (ctf_strraw_explicit): Look in parents if necessary:
        use parent-relative offsets.
	(ctf_strptr_validate): Avoid duplicating errors.
	(ctf_str_create_atoms): Update comment.
	(CTF_STR_COPY): New.
	(CTF_STR_NO_DEDUP): Likewise.
	(ctf_str_add_ref_internal): Use them, setting the corresponding
        csa_flags, prohibiting addition to serialized parents, and copying
        strings if so requested.
	(ctf_str_add): Turn into a wrapper around...
	(ctf_str_add_flagged): ... this new function.  The offset is now
        parent-relative.
	(ctf_str_add_ref): Likewise.
	(ctf_str_add_movable_ref): Likewise.
	(ctf_str_add_copy): New.
	(ctf_str_add_no_dedup_ref): New.
	(ctf_str_write_strtab): Prohibit writes when the parent has
        changed length or is not serialized.  Only write the null atom
        to parent strtabs.  Chase refs to the parent if necessary.
2025-02-28 14:47:24 +00:00
Nick Alcock
a14fb397b2 libctf: tear opening and serialization in two
The next stage in sharing the strtab involves tearing two core parts
of libctf into two pieces.

Large parts of init_static_types, called at open time, involve traversing
the types table and initializing the hashtabs used by the type name lookup
functions and the enumerator conflicting checks.  If the string table is
partly located in the parent dict, this is obviously not going to work: so
split out that code into a new init_static_types_names function (which
also means moving the wrapper around init_static_types that was used
to simplify the enumerator code into being a wrapper around
init_static_types_names instead) and call that from init_static_types
(for parent dicts, and < v4 dicts), and from ctf_import (for v4 dicts).

At the same time as doing this we arrange to set LCTF_NO_STR (recently
introduced) iff this is a v4 child dict with a nonzero cth_parent_strlen:
this then blocks more or less everything that involves string operations
until a ctf_import has actually imported the strtab it depends on.  (No
string oeprations that actually use this have been introduced yet, but
since no string deduplication is happening yet either this is harmless.)

For v4 dicts, at import time we also validate that the cth_parent_strlen has
the same value as the parent's strlen (zero is also a valid value,
indicating a non-shared strtab, as is commonplace in older dicts, dicts
emitted by the compiler, parent dicts etc).  This makes ctf_import more
complex, so we simplify things again by dropping all the repeated code in
the obscure used-only-by-ctf_link ctf_import_unref and turning both into
wrappers around an internal function.  We prohibit repeated ctf_imports
(except of NULL or the same dict repeatedly), and set up some new fields
which will be used later to prevent people from adding strings to parent
dicts with pre-existing serialized strtabs once they have children imported
into them (which would change their string length and corrupt all those
strtabs).

Serialization also needs to be torn in two.  The problem here is that
currently serialization does too much: it emits everything including the
strtab, does things that depend on the strtab being finalized (notably
variable table sorting), and then writes it out.  Much of this emission
itself involves strtab writes, so the strtab is not actually complete until
halfway through ctf_serialize.  But when deduplicating, we want to use
machinery in ctf-link and ctf-dedup to deduplicate the strtab after it is
complete, and only then write it out.

We could do this via having ctf_serialize call some sort of horrible
callback, but it seems much simpler to just cut ctf_serialize in two,
and introduce a new ctf_preserialize which can optionally be called to do
all this "everything but the strtab" work.  (If it's not called,
ctf_serialize calls it itself.)

This means pulling some internal variables out of ctf_serialize into the
ctf_dict_t, and slightly abusing LCTF_NO_STR to mean (in addition to its
"no, you can't do much between opening a child dict and importing its
parent" semantics), "no, you can't do much between calling ctf_preserialize
and ctf_serialize". The requirements of both are not quite identical -- you
definitely can do things that involve string lookups after ctf_preserialize
-- but it serves to stop callers from accidentally adding more types after
the types table has been written out, and that's good enough.
ctf_preserialize isn't public API anyway.

libctf/
	* ctf-impl.h (struct ctf_dict) [ctf_serializing_buf]: New.
        [ctf_serializing_buf_size]: Likewise.
        [ctf_serializing_vars]: Likewise.
        [ctf_serializing_nvars]: Likewise.
        [ctf_max_children]: Likewise.
	(LCTF_PRESERIALIZED): New.
	(ctf_preserialize): New.
	(ctf_depreserialize): New.
	* ctf-open.c (init_static_types): Rename to...
	(init_static_types_names): ... this, wrapping a different
        function.
        (init_static_types_internal): Rename to...
        (init_static_types): ... this, and set LCTF_NO_STR if neecessary.
        Tear out the name-lookup guts into...
	(init_static_types_names_internal): ... this new function. Fix a few
        comment typos.
	(ctf_bufopen): Emphasise that you cannot rely on looking up strings
        at any point in ctf_bufopen any more.
	(ctf_dict_close): Free ctf_serializing_buf.
	(ctf_import): Turn into a wrapper, calling...
	(ctf_import_internal): ... this.  Prohibit repeated ctf_imports of
        different parent dicts, or "unimporting" by setting it back to NULL
        again.  Validate the parent we do import using cth_parent_strlen.
        Call init_static_types_names if the strtab is shared with the
        parent.
	(ctf_import_unref): Turn into a wrapper.
	* ctf-serialize.c (ctf_serialize): Split out everything before
        strtab serialization into...
	(ctf_preserialize): ... this new function.
	(ctf_depreserialize): New, undo preserialization on error.
2025-02-28 14:47:24 +00:00
Nick Alcock
70d05ab0b2 libctf: add mechanism to prohibit most operations without a strtab
We are about to add machinery that deduplicates a child dict's strtab
against its parent.  Obviously if you open such a dict but do not import its
parent, all strtab lookups must fail: so add an LCTF_NO_STR flag that is set
in that window and make most operations fail if it's not set.  (Two more
that will be set in future commits are serialization and string lookup
itself.)

Notably, not all symbol lookup is impossible in this window: you can still
look up by symbol index, as long as this dict is not using an indexed
strtypetab (which obviously requires string lookups to get the symbol name).

include/
	* ctf-api.h (_CTF_ERRORS) [ECTF_HASPARENT]: New.
        [ECTF_WRONGPARENT]: Likewise.
	(ECTF_NERR): Update.
        Update comments to note the new limitations on ctf_import et al.

libctf/
	* ctf-impl.h (LCTF_NO_STR): New.
	* ctf-create.c (ctf_rollback): Error out when LCTF_NO_STR.
	(ctf_add_generic): Likewise.
	(ctf_add_struct_sized): Likewise.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_forward): Likewise.
	(ctf_add_unknown): Likewise.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.
	(ctf_add_variable): Likewise.
	(ctf_add_funcobjt_sym_forced): Likewise.
	(ctf_add_type): Likewise (on either dict).
	* ctf-dump.c (ctf_dump): Likewise.
	* ctf-lookup.c (ctf_lookup_by_name): Likewise.
	(ctf_lookup_variable): Likewise. Likewise.
	(ctf_lookup_enumerator): Likewise.
	(ctf_lookup_enumerator_next): Likewise.
	(ctf_symbol_next): Likewise.
	(ctf_lookup_by_sym_or_name): Likewise, if doing indexed lookups.
	* ctf-types.c (ctf_member_next): Likewise.
	(ctf_enum_next): Likewise.
	(ctf_type_aname): Likewise.
	(ctf_type_name_raw): Likewise.
	(ctf_type_compat): Likewise, for either dict.
	(ctf_member_info): Likewise.
	(ctf_enum_name): Likewise.
	(ctf_enum_value): Likewise.
	(ctf_type_rvisit): Likewise.
	(ctf_variable_next): Note that we don't need to test LCTF_NO_STR.
2025-02-28 14:47:24 +00:00
Nick Alcock
9a74ab12c8 include, libctf: start work on libctf v4
This format is a superset of BTF, but for now we just do the minimum to
declare a new file format version, without actually introducing any format
changes.

From now on, we refuse to reserialize CTFv1 dicts: these have a distinct
parent/child boundary which obviously cannot change upon reserialization
(that would change the type IDs): instead, we encoded this by stuffing in
a unique CTF version for such dicts.  We can't do that now we have one
version for all CTFv4 dicts, and testing such old dicts is very hard these
days anyway, and is not automated: so just drop support for writing them out
entirely. (You still *can* write them out, but you have to do a full-blown
ctf_link, which generates an all-new fresh dict and recomputes type IDs as
part of deduplication.)

To prevent this extremely-not-ready format escaping into the wild, add a
new mechanism whereby any format version higher than the new #define
CTF_STABLE_VERSION cannot be serialized unless I_KNOW_LIBCTF_IS_UNSTABLE is
set in the environment.

include/
	* ctf-api.h (_CTF_ERRORS) [ECTF_CTFVERS_NO_SERIALIZE]: New.
        [ECTF_UNSTABLE]: New.
         (ECTF_NERR): Update.
	* ctf.h: Small comment improvements..
        (ctf_header_v3): New, copy of ctf_header.
	(CTF_VERSION_4): New.
	(CTF_VERSION): Now CTF_VERSION_4.
	(CTF_STABLE_VERSION): Still 4, CTF_VERSION_3.

ld/
	* testsuite/ld-ctf/*.d: Update to CTF_VERSION_4.

libctf/
	* ctf-impl.h (LCTF_NO_SERIALIZE): New.
	* ctf-dump.c (ctf_dump_header): Add CTF_VERSION_4.
	* ctf-open.c (ctf_dictops): Likewise.
        (upgrade_header): Rename to...
	(upgrade_header_v2): ... this.
	(upgrade_header_v3): New.
	(upgrade_types): Support upgrading from CTF_VERSION_3.
        Turn on LCTF_NO_SERIALIZE for CTFv1.
	(init_static_types_internal): Upgrade all types tables older than
	* CTF_VERSION_4.
	(ctf_bufopen): Support CTF_VERSION_4: error out if we forget to
	update this switch in future.  Add header upgrading from v3 and
	below.  Improve comments slightly.
	* ctf-serialize.c (ctf_serialize): Block serialization of unstable
	file formats, and of file formats for which LCTF_NO_SERIALIZE is
	turned on (v1).
2025-02-28 14:47:24 +00:00
Alan Modra
e8e7cf2abe Update year range in copyright notice of binutils files 2025-01-01 18:29:57 +10:30
Nick Alcock
6da9267482 libctf, include: add ctf_dict_set_flag: less enum dup checking by default
The recent change to detect duplicate enum values and return ECTF_DUPLICATE
when found turns out to perturb a great many callers.  In particular, the
pahole-created kernel BTF has the same problem we historically did, and
gleefully emits duplicated enum constants in profusion.  Handling the
resulting duplicate errors from BTF -> CTF converters reasonably is
unreasonably difficult (it amounts to forcing them to skip some types or
reimplement the deduplicator).

So let's step back a bit.  What we care about mostly is that the
deduplicator treat enums with conflicting enumeration constants as
conflicting types: programs that want to look up enumeration constant ->
value mappings using the new APIs to do so might well want the same checks
to apply to any ctf_add_* operations they carry out (and since they're
*using* the new APIs, added at the same time as this restriction was
imposed, there is likely to be no negative consequence of this).

So we want some way to allow processes that know about duplicate detection
to opt into it, while allowing everyone else to stay clear of it: but we
want ctf_link to get this behaviour even if its caller has opted out.

So add a new concept to the API: dict-wide CTF flags, set via
ctf_dict_set_flag, obtained via ctf_dict_get_flag.  They are not bitflags
but simple arbitrary integers and an on/off value, stored in an unspecified
manner (the one current flag, we translate into an LCTF_* flag value in the
internal ctf_dict ctf_flags word). If you pass in an invalid flag or value
you get a new ECTF_BADFLAG error, so the caller can easily tell whether
flags added in future are valid with a particular libctf or not.

We check this flag in ctf_add_enumerator, and set it around the link
(including on child per-CU dicts).  The newish enumerator-iteration test is
souped up to check the semantics of the flag as well.

The fact that the flag can be set and unset at any time has curious
consequences. You can unset the flag, insert a pile of duplicates, then set
it and expect the new duplicates to be detected, not only by
ctf_add_enumerator but also by ctf_lookup_enumerator.  This means we now
have to maintain the ctf_names and conflicting_enums enum-duplication
tracking as new enums are added, not purely as the dict is opened.
Move that code out of init_static_types_internal and into a new
ctf_track_enumerator function that addition can also call.

(None of this affects the file format or serialization machinery, which has
to be able to handle duplicate enumeration constants no matter what.)

include/
	* ctf-api.h (CTF_ERRORS) [ECTF_BADFLAG]: New.
	(ECTF_NERR): Update.
	(CTF_STRICT_NO_DUP_ENUMERATORS): New flag.
	(ctf_dict_set_flag): New function.
	(ctf_dict_get_flag): Likewise.

libctf/
	* ctf-impl.h (LCTF_STRICT_NO_DUP_ENUMERATORS): New flag.
	(ctf_track_enumerator): Declare.
	* ctf-dedup.c (ctf_dedup_emit_type): Set it.
	* ctf-link.c (ctf_create_per_cu): Likewise.
	(ctf_link_deduplicating_per_cu): Likewise.
	(ctf_link): Likewise.
	(ctf_link_write): Likewise.
	* ctf-subr.c (ctf_dict_set_flag): New function.
	(ctf_dict_get_flag): New function.
	* ctf-open.c (init_static_types_internal): Move enum tracking to...
	* ctf-create.c (ctf_track_enumerator): ... this new function.
	(ctf_add_enumerator): Call it.
	* libctf.ver: Add the new functions.
	* testsuite/libctf-lookup/enumerator-iteration.c: Test them.
2024-07-31 21:02:05 +01:00
Nick Alcock
36c771b179 libctf: fix CTF dict compression
Commit 483546ce4f ("libctf: make ctf_serialize() actually serialize")
accidentally broke dict compression.  There were two bugs:

 - ctf_arc_write_one_ctf was still making its own decision about
   whether to compress the dict via direct ctf_size comparison, which is
   unfortunate because now that it no longer calls ctf_serialize itself,
   ctf_size is always zero when it does this: it should let the writing
   functions decide on the threshold, which they contain code to do which is
   simply not used for lack of one trivial wrapper to write to an fd and
   also provide a compression threshold

 - ctf_write_mem, the function underlying all writing as of the commit
   above, was calling zlib's compressBound and avoiding compression if this
   returned a value larger than the input.  Unfortunately compressBound does
   not do a trial compression and determine whether the result is
   compressible: it just adds zlib header sizes to the value passed in, so
   our test would *always* have concluded that the value was incompressible!
   Avoid by simply always compressing if the raw size is larger than the
   threshold: zlib is quite clever enough to avoid actually compressing
   if the data is incompressible.

Add a testcase for this.

libctf/
	* ctf-impl.h (ctf_write_thresholded): New...
	* ctf-serialize.c (ctf_write_thresholded): ... defined here,
        a wrapper around...
        (ctf_write_mem): ... this.  Don't check compressibility.
	(ctf_compress_write): Reimplement as a ctf_write_thresholded
        wrapper.
	(ctf_write): Likewise.
	* ctf-archive.c (arc_write_one_ctf): Just call
        ctf_write_thresholded rather than trying to work out whether
        to compress.
	* testsuite/libctf-writable/ctf-compressed.*: New test.
2024-07-31 21:02:05 +01:00
Nick Alcock
b404bf7270 libctf, string: split the movable refs out of the ref list
In commit 149ce5c263 we introduced the concept of "movable" refs,
which are refs that can be moved in batches, to let us maintain valid ref
lists even when adding refs to blocks of memory that can be realloced (which
is any type containing a vlen which can expand, like names contained within
enum or struct members).  Movable refs need a backpointer to the movable
refs dynhash for this dict; since non-movable refs are very common, we tried
to save memory by having a slightly bigger struct for moveable refs with a
backpointer in it, and casting appropriately, indicating which sort of ref
we were dealing with via a flag on the atom.

Unfortunately this doesn't work reliably, because you can perfectly well
have a string ("foo", say) which has both non-movable refs (say, an external
symbol and a variable name) and movable refs (say, a structure member name)
to the same atom.  Indicate which struct we're dealing with with an atom
flag and suddenly you're casting a ctf_str_atom_ref to a
ctf_str_atom_ref_movable (which is bigger) and dereferencing random memory
off the end of it and interpreting it as a backpointer to the movable refs
dynhash.  This is unlikely to work well.

So bite the bullet and split refs into two separate lists, one for movable
refs, one for immovable refs. It means some annoying code duplication, but
there's not very much of it, and it means we can keep the movable refs
hashtab (which in turn means we don't have to do linear searches to find all
relevant refs when moving refs, which in turn means that
structure/union/enum member additions remain amortized O(n) time, not
O(n^2).

Callers can now purge movable and non-movable refs independently of each
other.  We don't use this yet, but a use is coming.

libctf/
	* ctf-impl.h (CTF_STR_ATOM_MOVABLE): Delete.
        (struct ctf_str_atom) [csa_movable_refs]: New.
	(struct ctf_dict): Adjust comment.
	(ctf_str_purge_refs): Add MOVABLE arg.
	* ctf-string.c (ctf_str_purge_movable_atom_refs): Split out of...
        (ctf_str_purge_atom_refs): ... this.
	(ctf_str_free_atom): Call it.
	(ctf_str_purge_one_atom_refs): Likewise.
	(aref_create): Adjust accordingly.
	(ctf_str_move_refs): Likewise.
	(ctf_str_remove_ref): Remove movable refs too, including
	deleting the ref from ctf_str_movable_refs.
	(ctf_str_purge_refs): Add MOVABLE arg.
	(ctf_str_update_refs): Update movable refs.
	(ctf_str_write_strtab): Check, and purge, movable refs.
2024-07-31 21:02:04 +01:00
Nick Alcock
68720e03f5 libctf, dedup: drop unnecessary arg from ctf_dedup()
The PARENTS arg is carefully passed down through all the layers of hash
functions and then never used for anything.  (In the distant past it was
used for cycle detection, but the algorithm eventually committed doesn't
need to do cycle detection...)

The PARENTS arg is still used by ctf_dedup_emit(), but even there we can
loosen the requirements and state that you can just leave entries
corresponding to dicts with no parents at zero (which will be useful
in an upcoming commit).

libctf/
	* ctf-dedup.c (ctf_dedup_hash_type): Drop PARENTS arg.
	(ctf_dedup_rhash_type): Likewise.
	(ctf_dedup): Likewise.
	(ctf_dedup_emit_struct_members): Mention what you can do to
        PARENTS entries for parent dicts.
	* ctf-impl.h (ctf_dedup): Adjust accordingly.
	* ctf-link.c (ctf_link_deduplicating_per_cu): Likewise.
	(ctf_link_deduplicating): Likewise.
2024-07-31 21:02:04 +01:00