binutils-gdb

Author	SHA1	Message	Date
Nick Alcock	7bea1097ec	libctf: dedup: conflicting CU names and merging into the parent The last two dedup changes are, firstly, to use ctf_add_conflicting() to arrange that conflicting types that are hidden because they are added to the same dict as the types they conflict with (e.g. conflicting types in modules) are properly marked with the CU name that the type comes from. This could of course not be done with the old non-root flag, but now that we have proper prefix types, we can record it, and consumers can find out what CU any type comes from via ctf_type_conflicting (or, for non-kernel CTF generated by GNU ld, via the ctf_cuname of the per-cu dict). Secondly, we add a new kind of CU mapping for cu-mapped (two-stage) links (as a reminder, these carry out a second stage of dedupping in which they squash specific CUs down to a named set of child dicts, fusing named inputs into particular named outputs: the kernel linker uses this to make child dicts that represent modules rather than translation units). You can now map any CU name to "" (the null string). This indicates that types that would land in the CU in question should not be emitted into any sort of per-module dict but should instead just be emitted into the shared dict, possibly being marked conflicting as they do so. The usual popcount mechanism will be used to pick the type which is left unhidden. The usual forwarding stubs you would expect to find for conflicting structs and unions will not be emitted: instead, real structs and unions will take their place. Consumers must take care when chasing parent types that point to tagged structs to make sure that there isn't a correspondingly-named struct in the child they're looking at (but this is generally a problem with type chasing in children anyway, which I have a TODO open to find some sort of solution to: this should be being done automatically, and isn't).	2025-04-25 21:23:07 +01:00
Nick Alcock	f38832b398	libctf: dedup: decl tag support. Decl tags to types and to functions and function arguments are relatively straightforward, as are decl tags to structures as a whole or to members of untagged structures; but decl tags to specific members of tagged structs and unions have two separate nasty problems, entirely down to the use of tagged structures to break cycles in the type graph. The first is that we have to mark decl tags conflicting if their associated struct is conflicting, but traversal from types to their parents halts at tagged structs and unions, because the type graph is sharded via stubs at those points and conflictedness ceases. But we don't want to do that here: a decl_tag to member 10 of some struct is only valid if that struct has ten members, and if the struct is conflicted, some may have only one. The decl tag is only valid for the specific struct-with-ten-members it was originally pointing at, anyway: other structs-with-ten-members may have entirely different members there, which are not tagged or which are tagged with something else. So we track this by keeping track of the only thing that is knowable about struct/union stubs: their decorated name. The citers graph gains mappings from decorated SoU names to decl tags (where the decl tag has a component_idx), and conflictedness marking chases that and marks accordingly, via the new ctf_dedup_mark_conflicting_hash_citers. The second problem is that we have to emit decl tags to struct members of all kinds after the members are emitted, but the members are emitted later than core type deduplication because they might refer to any types in the dict, including types added after the struct was added. So we need to accumulate decl tags to struct members in a new hashtab (cd_emission_struct_decl_tags) and add yet another pass that traverses that and emits all the decl tags in it. (If it turns out that decl tags to other things can similarly appear before the type they refer to, we'll either have to sort them earlier or emit them at the end as well -- but this seems unlikely.) None of this complexity is properly tested, because we're not yet emitting decl tags (as far as I know). But at least it doesn't break anything else, and it's somewhere to start.	2025-04-25 21:23:07 +01:00
Nick Alcock	bf735030ac	libctf: dedup: type tags Another trivial case: they're just like pointers except that they have a name (and we don't need to care about that, because names are hashed in, if present, anyway).	2025-04-25 21:23:07 +01:00
Nick Alcock	4db605353c	libctf: dedup: datasecs and vars These are a bit trickier than previous things. Datasecs are unusual: the content they contain for a given variable is conceptually part of that variable, in that a variable can only appear in one datasec: so if two TUs have different datasec values for a variable, you'll want to emit two conflicting variables with different datasec entries. Equally, if they have entries in different datasecs, they're conflicting. But the index of a variable in a datasec has nothing to do with the variable: it's just a property of how many other variables are in the datasec. So we turn the type graph upside down for them. We track the variable -> datasec mappings for every variable we are dedupping, and use this to hash variables with datasec entries twice: firstly, as purely variable type, name, and promoted-to-non-extern linkage, and secondly with all of that plus the datasec name, offset and size: we indicate that the non-extern hash replaces the extern one, and use this later on. The datasec itself is not hashed at all! We skip it at both hashing and emission time (without breaking anything else, because nothing points at datasecs, so nothing will ever recurse down into one). The popcount code (used to find the "most popular" type, the one to put in the shared dict) changes to say that replaced types (extern vars) popcounts are added to the counts of the types that replace them (the corresponding non-extern vars). At emission time, replaced variables (extern variables) are skipped, ensuring that extern vars with non-conflicting non-extern counterparts are skipped in favour of the non-extern ones. ctf_add_section_variable then takes care of emitting both the var and its corresponding datasec for us.	2025-04-25 21:23:07 +01:00
Nick Alcock	6b8885cfc9	libctf: dedup: structs with bitfields, BTF floats The last two trivial cases. Hash in the bitfieldness of structs and the bit-width of members (their bit-offset is already being hashed in), and emit them accordingly. BTF floats hardly have any state: emitting them is even easier.	2025-04-25 21:23:07 +01:00
Nick Alcock	95eb77bddb	libctf: dedup: enums, enum64s, functions, func linkage These are all fairly simple and are handled together because some of the diffs are annoyingly entwined. enum and enum64 are trivial: it's just like enums used to be, except that we hash in the unsignedness value, and emit signed or unsigned enums or enum64s appropriately. (The signedness stuff on the emission side is fairly invisible: it's automatically handled for us by ctf_type_encoding and ctf_add_enum*_encoded, via the CTF_INT_SIGNED encoding.) Functions are also fairly simple: we hash in all the parameter names as well as the args, and emit them accordingly. Linkage is more difficult. We want to deduplicate extern and non-extern declarations together, while leaving static ones separate. We do this by promoting extern linkage to global at hashing time, and maintaining a cd_linkages hashmap which maps from type hash values of func linkages (and vars) to the best linkage known so far, then updating it if a better one ("less extern") comes along (relying on the fact that we are already unifying the hashes of otherwise-identical extern and non-extern types). At emission time, we use this hashtab to figure out what linkage to emit.	2025-04-25 21:23:07 +01:00
Nick Alcock	81b9312ac4	libctf: dedup: comment fixes, debug indentation changes, and a tiny leak Getting these out of the way to avoid them wrecking the diffs for the next commits.	2025-04-25 21:23:07 +01:00
Nick Alcock	adc6ca003a	libctf: dedup: fix a broken error path in string dedup If we run out of memory updating the string counts, set the right errno: ctf_dynhash_insert returns a negative error value, and we want a positive one in the ctf_errno.	2025-04-25 21:23:07 +01:00
Nick Alcock	3a6e1f87e7	libctf: dedup: chase API changes: use the public API more To get ready for the deduplicator changes, we chase the API changes to things like ctf_member_next, and add support for prefix types (using the suffix where appropriate, etc). We use the ctf-types API for things like forward lookup, using the private _tp functions to reduce overhead while centralizing knowledge of things like the encoding of enum forwards outside the deduplicator. No functional changes yet.	2025-04-25 21:23:07 +01:00
Nick Alcock	f170154176	libctf: drop unnecessary macro Every use of this macro has been deleted.	2025-04-25 21:23:07 +01:00
Nick Alcock	27d5d0ccc7	libctf: open-bfd: open BTF dicts Teaching ctf_open and ctf_fdopen to open BTF dicts if passed is quite simple: we just need to check the magic number and allow BTF dicts into the lower-level ctf_simple_open machinery (which ultimately calls ctf_bufopen).	2025-04-25 21:23:07 +01:00
Nick Alcock	0a283f3d7a	libctf: link: drop unnecessary back-compatibility code We no longer need to ensure that inputs have a new-format func info section: no such sections exist in CTFv4 (and the v3 compatibility code will throw away old-format sections).	2025-04-25 21:23:07 +01:00
Nick Alcock	9ea8bea7f0	libctf: link: BTF support This is in two parts, one new API function and one change. New API: +int ctf_link_output_is_btf (ctf_dict_t ); Changed API: unsigned char ctf_link_write (ctf_dict_t , size_t size, - size_t threshold); + size_t threshold, int *is_btf); The idea here is that callers can call ctf_link_output_is_btf on a ctf_link()ed (deduplicated) dict to tell whether a link will yield BTF-compatible output before actually generating that output, so they can e.g. decide whether to avoid trying to compress the dict if they know it would be BTF otherwise (since compressing a dict renders it non-BTF-compatible). ctf_link_write() gains an optional is_btf output parameter that reports whether the dict that was finally generated is actually BTF after all, perhaps because the caller didn't call ctf_link_output_is_btf or wants to be robust against possible future changes that may add other reasons why a written-out dict can't be BTF at the last minute. These are simple wrappers around already-existing machinery earlier in this series.	2025-04-25 21:23:07 +01:00
Nick Alcock	343de78445	libctf: strings: don't check for non-deduplicable atoms in the parent Callers of ctf_str_add_no_dedup_ref are indicating that they would like the string they have added a reference to to appear in the current dict and not be deduplicated into the parent. This is true even if the string already exists in the parent, so we should not check for strings in the parent and reuse them in this case.	2025-04-25 18:17:33 +01:00
Nick Alcock	3520fb4568	libctf: serialize: finish off the serializer The only remaining parts of serialization that need fixing up is ctf_preserialize, which despite its name does nearly all the work of serialization: the only bit it doesn't do is write the string tables (since that has to happen across dicts after all the dicts have otherwise been laid out, in order to deduplicate the strtabs). As usual in this series, there's adjustment for various field name changes (maxtypes -> ntypes, the move into ctf_serialize, etc), and extra work to figure out whether we're emitting BTF or not and to handle the distinction between CTF and BTF headers, and not try to emit CTF-only stuff like the symtypetabs into BTF dicts; we can also throw out a bunch of old code that sets compatibility flags, everything to do with forcing variables into the dynamic state in case they changed (we're going to handle that more generally for everything in the types table at a later date, outside serialization), and everything to do with special handling of variables in general. But much of that is only a couple of lines each, and most of the changes are mechanical: this is probably the simplest serialization commit in this series.	2025-04-25 18:12:47 +01:00
Nick Alcock	176afc3c8b	libctf: open: fix closing of children with imported parents Closing a parent dict for the last time erases all its types and strings, which makes type and string lookups in any surviving children impossible from then on. Since children hold a reference to their parent, this can only happen in ctf_dict_close of the last child, after the parent has been closed by the caller as well. Since DTD deletion now involves doing type and string lookups in order to clean out the name tables, close the parent only after the child DTDs have been deleted.	2025-04-25 18:09:02 +01:00
Nick Alcock	908a7e7167	libctf: open, types: ctf_import for BTF ctf_import needs a bunch of fixes to work with pure BTF dicts -- and, for that matter, importing newly-created parent dicts that have never been written out, which may have a bunch of nonprovisional types (if types were added to it before any imports were done) or may not (if at least one ctf_import into it was done before any types were added). So we adjust things so that the values that are checked against are the nonprovisional-types values: the header revisions actually changed the name of cth_parent_typemax to cth_parent_ntypes to make this clearer, so catch up with that. In the parent, we have to use ctf_idmax, not ctf_typemax. One thing we must prohibit is that you cannot add a bunch of types to a child and then import a parent into it: the type IDs will all be wrong and the string offsets more so. This was partly prohibited: prohibit it entirely (excepting only that the not-actually-written-out void type we might add to new BTF dicts does not influence this check). Since BTF children don't have a cth_parent_ntypes or a cth_parent_strlen, we cannot check this stuff, but just set them and hope.	2025-04-25 18:07:44 +01:00
Nick Alcock	d5012389a4	libctf: serialize: handle CTF-versus-BTF output format checks The internal function ctf_serialize_output_format centralizes all the checks for BTF-versus-CTF, checking to see if the type section, active suppressions, and BTF-emission mode permit BTF emission, setting ctf_serialize.cs_is_btf if we are actually BTF, and raising ECTF_NOTBTF if we are requiring BTF emission but the type section is such that we can't emit it. (There is a forcing parameter in place, as with most of these serialization functions, to allow for the caller to force CTF emission if it knows the output will be compressed or will be part of multi-member archives or something else external to the type section that BTF does not support.)	2025-04-25 18:07:44 +01:00
Nick Alcock	585f569a2d	libctf: serialize: size and emit the type section As with sizing, this needs to support type suppression and CTF_K_BIG elision, and adapt to the DTD representation changes. Those changes cause a general complexity reduction because we no longer have to memcpy the vlen into place separately for every type kind, but can do it all at once using shared code above the per-kind switch statement. That statement's only job now is generating refs out of type IDs and string offsets, and translating the struct offset from gap- into non-gap representation for non-big structs. We do three distinct things: - check whether all the types in a section are BTF-compatible, after suppression of unwanted type kinds (including types with unwanted prefixes), and elision of unneeded struct/union CTF_K_BIGs - size the type section, taking suppression and CTF_K_BIG elision into account - actually emit it, again taking all the above into account These all have to come to the same conclusions for every type: if the first one gets things wrong we might try to emit something as BTF when we can't; if the latter two are inconsistent, we might have a buffer overrun. So the type emission code double-checks BTF-compatibility and raises ECTF_NOTBTF if necessary; we also aggressively check for potential overruns before every memcpy() into the buffer and raise an ECTF_INTERNAL assertion failure if need be. Thankfully there are a lot fewer memcpy()s than there used to be: there are only four places we need to check, all close to each other, which is pretty maintainable. We add a bit of debugging when --enable-libctf-hash-debugging is on, printing the translation from provisional to final type ID so that you can use it to map back to the provisional ID again when trying to track down deduplicator problems, since the IDs the deduplicator will report at its emission time are only provisional (the final parent-relative IDs are not assigned until now).	2025-04-25 18:07:44 +01:00
Nick Alcock	67cd167767	libctf: serialize: type section sizing This is made much simpler by the fact that the DTD representation now tracks the size of each vlen, so we don't need per-type-kind code to track it ourselves any more. There's extra code to handle type suppression, CTF_K_BIG elision, and prefixes.	2025-04-25 18:07:44 +01:00
Nick Alcock	db98972145	libctf: serialize: check the type section for BTF-incompatible types We add a new ctf_type_sect_is_btf function (internal to ctf-serialize.c) to check the type section against the write prohibitions list and (after write-suppression) against the set of types allowed in BTF, and determine whether this type section contains any types BTF does not allow. CTF-specific type kinds like CTF_K_FLOAT are obviously prohibited in BTF, as are CTF-specific prefixes, except that CTF_K_BIG is allowed if and only if both its ctt_size and vlen are still zero: in that case it will be elided by type section writeout and will never appear in the BTF at all. Structs are checked to make sure they don't use any nameless padding members and that (if they are bitfields) all their offsets will still fit after conversion from CTF_K_BIG gap-between-struct-members representation (if they are not bitfields, we know they will fit, but for bitfields, they might be too big).	2025-04-25 18:07:44 +01:00
Nick Alcock	5ec23dfb74	libctf: strings: no external strings in BTF One of the things BTF doesn't have is the concept of external strings which can be shared with the ELF strtab. Therefore, even if the linker has reported strings which the dict is reusing, when we generate the strtab for a BTF dict we should emit those strings into it (and we should certainly not cause the presence of external strings to prevent BTF emission!) Note that since already-written strtab entries are never erased, writing a dict as BTF and then CTF will cause external strings to be emitted even for the CTF. This sort of repeated writing in different formats seems to be very rare: in any case, the problem can be avoided by simply doing the CTF writeout first (the following BTF writeout will spot the missing external- in-CTF strings and add them). We also throw away the internal-only function ctf_strraw_explicit(), which was used to add strings with a hardwired strtab: it was only ever used to write out the variable section, which is gone in v4.	2025-04-25 18:07:44 +01:00
Nick Alcock	c14bdfc7a4	libctf: serialize: kind suppression and prohibition The CTF serialization machinery decides whether to write out a dict as BTF or CTF (or, in LIBCTF_BTM_BTF mode, whether to write out a dict or fail with ECTF_NOTBTF) in part by looking at the type kinds in the dictionary. It is possible that you'd like to extend this check and ban specific type kinds from the dictionary (possibly even if it's CTF); it's also possible that you'd like to not fail even if a CTF-only kind is found, but rather replace it with a still-valid stub (CTF_K_UNKNOWN / BTF_KIND_UNKNOWN) and keep going. (The kernel's btfarchive machinery does this to ensure that the compiler and previous link stages have emitted only valid BTF type kinds.) ctf_write_suppress_kind supports both these use cases: +int ctf_write_suppress_kind (ctf_dict_t *fp, int kind, int prohibited); This commit adds only the core population code: the actual suppression is spread across the serializer and will be added in the next commits.	2025-04-25 18:07:44 +01:00
Nick Alcock	2c5f74300a	libctf: serialize: user control over BTF-versus-CTF writeout We need some way for users to declare that they want BTF or CTF in particular to be written out when they ask for it, or that they don't mind which. Adding this to all the ctf_write functions (like the compression threshold already is) would be a bit of a nightmare: there are a great many of them and this doesn't seem like something people would want to change on a per-dict basis (even if we did, we'd need to think about archives and linking, which work on a higher level than single dicts). So we repurpose an unused, vestigial existing function, ctf_version(), which was originally intended to do some sort of rather unclear API switching at runtime, to allow switching between different CTF file format versions (not yet supported, you have to pass CTF_VERSION) and BTF writeout modes: /* BTF/CTF writeout version info. ctf_btf_mode has three levels: - LIBCTF_BTM_ALWAYS writes out full-blown CTFv4 at all times - LIBCTF_BTM_POSSIBLE writes out CTFv4 if needed to avoid information loss, BTF otherwise. If compressing, the same as LIBCTF_BTM_ALWAYS. - LIBCTF_BTM_BTF writes out BTF always, and errors otherwise. Note that no attempt is made to downgrade existing CTF dicts to BTF: if you read in a CTF dict and turn on LIBCTF_BTM_POSSIBLE, you'll get a CTF dict; if you turn on LIBCTF_BTM_BTF, you'll get an unconditional error. Thus, this is really useful only when reading in BTF dicts or when creating new dicts. / typedef enum ctf_btf_mode { LIBCTF_BTM_BTF = 0, LIBCTF_BTM_POSSIBLE = 1, LIBCTF_BTM_ALWAYS = 2 } ctf_btf_mode_t; / Set the CTF library client version to the specified version: this is the version of dicts written out by the ctf_write* functions. If version is zero, we just return the default library version number. The BTF version (for CTFv4 and above) is indicated via btf_hdr_len, also zero for "no change". You can influence what type kinds are written out to a CTFv4 dict via the ctf_write_suppress_kind() function. */ extern int ctf_version (int ctf_version_, size_t btf_hdr_len, ctf_btf_mode_t btf_mode); (We retain the ctf_version_ stuff to leave space in the API to let the library possibly do file format downgrades in future, since we've already had requests for such things from users.)	2025-04-25 18:07:44 +01:00
Nick Alcock	f782340ba5	libctf, serialize: preparatory steps The new serializer is quite a lot more customizable than the old, because it can write out BTF as well as CTF: you can ask to write out BTF or fail, write out CTF if required to avoid information loss, otherwise BTF, or always write out CTF. Callers often need to find out whether a dict could be written out as BTF before deciding how to write it out (because a dict can never be written out as BTF if it is compressed, a caller might well want to ask if there is anything else that prevents BTF writeout -- say, slices, conflicting types, or CTF_K_BIG -- before deciding whether to compress it). GNU ld will do this whenever it is passed only BTF sections on the input. Figuring out whether a dict can be written out as BTF is quite expensive: we have to traverse all the types and check them, including every member of every struct. So we'd rather do that work only once. This means making a lot of state once private to ctf_preserialize public enough that another function can initialize it; and since the whole API is available after calling this function and before serializing, we should probably arrange that if we do things we know will invalidate the results of all this checking, we are forced to do it again. This commit does that, moving all the existing serialization state into a new ctf_serialize_t and adding to it. Several functions grow force_ctf arguments that allow the caller to force CTF emission even if the type section looks BTFish: the writeout code and archive creation use this to force CTF emission if we are compressing, and archive creation uses it to force CTF emission if a CTF multi-member archive is in use, because BTF doesn't support archives at all so there's no point maintaining BTF compatibility in that case. The ctf_write* functions gain support for writing out BTF headers as well as CTF, depending on whether what was ultimately written out was actually BTF or not. Even more than most commits in this series, there is no way this is going to compile right now: we're in the middle of a major transition, completed in the next few commits.	2025-04-25 18:07:44 +01:00
Nick Alcock	3c5eb5b20a	libctf: lookup, open: chase header field changes Nothing exciting here, just header fields slightly changing name and a couple of new comments and indentation fixes.	2025-04-25 18:07:43 +01:00
Nick Alcock	f7f72bcca6	libctf, open: new API for getting the size of CTF/BTF file sections I wrote this for BTF type size querying programs, but it might be of more general use and it's impossible to get this info in any other way, so we might want to keep it. New API: +size_t ctf_sect_size (ctf_dict_t *, ctf_sect_names_t sect);	2025-04-25 18:07:43 +01:00
Nick Alcock	4837852527	libctf: types: access to raw type data This new API lets users ask for the raw type data associated with a type (either the whole lot including prefixes, or just the suffix if this is not a CTF_K_BIG type), and then they can manipulate it using ctf.h functions or whatever else they like. Doing this does not preclude using libctf querying functions at the same time (just don't change the type! It's const for a reason). New API: +const ctf_type_t ctf_type_data (ctf_dict_t , ctf_id_t, int prefix); This function was unimplementable before the DTD changes, because the ctf_type_t and vlen were separated in memory: but now they're always stored in a single buffer, it's reliable and simple, indeed trivial.	2025-04-25 18:07:43 +01:00
Nick Alcock	33326f571f	libctf: types: recursive type visiting ctf_type_visit and ctf_type_rvisit have to adapt to the internal API changes, but also to the change in the representation of structures. The new code is quite a lot simpler than the old, because we don't need to roll our own iterator but can just use ctf_member_next. API changes, the usual for the _f typedefs and anything to do with structures: -typedef int ctf_visit_f (const char name, ctf_id_t type, unsigned long offset, - int depth, void arg); +typedef int ctf_visit_f (ctf_dict_t , const char name, ctf_id_t type, + size_t offset, int bit_width, int depth, void arg);	2025-04-25 18:07:43 +01:00
Nick Alcock	1ece8c93c0	libctf, create: the unknown type Just as for typedefs, this is just catching up with API changes on the type-addition side.	2025-04-25 18:07:43 +01:00
Nick Alcock	83e9ca77b2	libctf, create: typedefs Nothing here but adjustment to internal API changes. Typedefs have no special properties that need querying, so there are no changes to ctf-types.c at all.	2025-04-25 18:07:43 +01:00
Nick Alcock	bd0c033b29	libctf, create, types: slices Nothing difficult for this CTF-specific type kind, just the usual adjustment to internal API changes.	2025-04-25 18:07:43 +01:00
Nick Alcock	0cd5118024	libctf: create, types: arrays The same internal API changes for arrays. There is one ABI change here, to ctf_arinfo_t: - uint32_t ctr_nelems; /* Number of elements. / + size_t ctr_nelems; / Number of elements. */	2025-04-25 18:07:43 +01:00
Nick Alcock	d65d03bec4	libctf: create, types: reftypes and pointers This is pure adjustment for internal API changes, and a change to the type-compatibility of pointers to type 0 now that it can be void as well as "unrepresentable". By now this dance should be quite familiar.	2025-04-25 18:07:43 +01:00
Nick Alcock	d5dd8997b3	libctf: create, types: conflicting types The conflicting type kind is a CTF-specific prefix kind consisting purely of an optional translation unit name. It takes the place of the old hidden bit: we have already seen it used to prefix types added with a CTF_ADD_NONROOT flag. The deduplicator will also use them to label conflicting types from different TUs smushed into the same dict by the CU-mapping mechanism: unlike the hidden bit, with this scheme users can tell which CUs the conflicting types came from. New API: +int ctf_type_conflicting (ctf_dict_t , ctf_id_t, const char cuname); +int ctf_set_conflicting (ctf_dict_t , ctf_id_t, const char *); (Frankly I expect ctf_set_conflicting to be used only by deduplicators and things like that, but if we provide an option to query something we should also provide an option to produce it...)	2025-04-25 18:07:43 +01:00
Nick Alcock	fb8917ac21	libctf, create, types: type and decl tags These are a little more fiddly than previous kinds, because their namespacing rules are odd: they have names (so presumably we want an API to look them up by name), but the names are not unique (they don't need to be, because they are not entities you can refer to from C), so many distinct tags in the same TU can have the same name. Type tags only refer to a type ID: decl tags refer to a specific function parameter or structure member via a zero-indexed "component index". The name tables for these things are a hash of name to a set of type IDs; rather different from all the other named entities in libctf. As a consequence, they can presently be looked up only using their own dedicated functions, not using ctf_lookup_by_name et al. (It's not clear if this restriction could ever be lifted: ctf_lookup_by_name and friends return a type ID, not a set of them.) They are similar enough to each other that we can at least have one function to look up both type and decl tags if you don't care about their component_idx and only want a type ID: ctf_tag. (And one to iterate over them, ctf_tag_next). (A caveat: because tags aren't widely used or generated yet, much of this is more or less untested and/or supposition and will need testing later.) New API, more or less the minimum needed because it's not entirely clear how these things will be used: +ctf_id_t ctf_tag (ctf_dict_t , ctf_id_t tag); +ctf_id_t ctf_decl_tag (ctf_dict_t , ctf_id_t decl_tag, + int64_t component_idx); +ctf_id_t ctf_tag_next (ctf_dict_t , const char tag, ctf_next_t ); +ctf_id_t ctf_add_type_tag (ctf_dict_t , uint32_t, ctf_id_t, const char ); +ctf_id_t ctf_add_decl_type_tag (ctf_dict_t , uint32_t, ctf_id_t, const char ); +ctf_id_t ctf_add_decl_tag (ctf_dict_t , uint32_t, ctf_id_t, const char *, + int component_idx);	2025-04-25 18:07:43 +01:00
Nick Alcock	39cdb3e395	libctf: create, types: functions, linkage, arg names (API ADVICE) Functions change in CTFv4 by growing argument names as well as argument types; the representation changes into a two-element array of (type, string offset) rather than a simple array of arg types. Functions also gain an explicit linkage in a different type kind (CTF_K_FUNC_LINKAGE, which corresponds to BTF_KIND_FUNC). New API: typedef struct ctf_funcinfo { /* ... / - uint32_t ctc_argc; / Number of typed arguments to function. / + size_t ctc_argc; / Number of typed arguments to function. / }; int ctf_func_arg_names (ctf_dict_t , unsigned long, uint32_t, const char *); int ctf_func_type_arg_names (ctf_dict_t , ctf_id_t, uint32_t, const char *names); +extern int ctf_type_linkage (ctf_dict_t , ctf_id_t); -extern ctf_id_t ctf_add_function (ctf_dict_t , uint32_t, - const ctf_funcinfo_t , const ctf_id_t ); +extern ctf_id_t ctf_add_function (ctf_dict_t , uint32_t, + const ctf_funcinfo_t , const ctf_id_t , + const char *arg_names); +extern ctf_id_t ctf_add_function_linkage (ctf_dict_t , uint32_t, + ctf_id_t, const char *, int linkage); Adding this is fairly straightforward; the only annoying part is the way the callers need to allocate space for the arg name and type arrays. Maybe we should rethink these into something like ctf_type_aname(), allocating space for the caller so the caller doesn't need to? It would certainly make all the callers in libctf much less complex... While we're at it, adjust ctf_type_reference, ctf_type_align, and ctf_type_size for the new internal API changes (they also all have special-case code for functions).	2025-04-25 18:07:43 +01:00
Nick Alcock	a632f3ed33	libctf, types: ctf_type_kind_{iter,next} et al These new functions let you iterate over types by kind, letting you get all variables, all enums, all datasecs, etc. (This is amenable to future optimization, and some is expected shortly.) We also add new iternal functions ctf_type_kind_{forwarded_,unsliced_,}tp which are like the corresponding non-_tp functions except that they take a ctf_type_t rather than a type ID: doing this allows the deduplicator to use these nearly-public functions more. The public ctf_type_kind* functions are reimplemented in terms of these. This machinery is the principal place where the magic encoding of forwards is encoded.	2025-04-25 18:07:43 +01:00
Nick Alcock	ea21a1b2ae	libctf: create, types: variables and datasecs (REVIEW NEEDED) This is an area of significant difference from CTFv3. The API changes significantly, with quite a few additions to allow creation and querying of these new datasec entities: -typedef int ctf_variable_f (const char name, ctf_id_t type, void arg); +typedef int ctf_variable_f (ctf_dict_t , const char name, ctf_id_t type, + void arg); +typedef int ctf_datasec_var_f (ctf_dict_t fp, ctf_id_t type, size_t offset, + size_t datasec_size, void arg); +/ Search a datasec for a variable covering a given offset. + + Errors with ECTF_NODATASEC if not found. / + +ctf_id_t ctf_datasec_var_offset (ctf_dict_t fp, ctf_id_t datasec, + uint32_t offset); + +/* Return the datasec that a given variable appears in, or ECTF_NODATASEC if + none. / + +ctf_id_t ctf_variable_datasec (ctf_dict_t fp, ctf_id_t var); +int ctf_datasec_var_iter (ctf_dict_t , ctf_id_t, ctf_datasec_var_f , + void ); +ctf_id_t ctf_datasec_var_next (ctf_dict_t , ctf_id_t, ctf_next_t *, + size_t size, size_t offset); -int ctf_add_variable (ctf_dict_t , const char , ctf_id_t); +/ ctf_add_variable adds variables to no datasec at all; + ctf_add_section_variable adds them to the given datasec, or to no datasec at + all if the datasec is NULL. / + +ctf_id_t ctf_add_variable (ctf_dict_t , const char , int linkage, ctf_id_t); +ctf_id_t ctf_add_section_variable (ctf_dict_t , uint32_t, + const char datasec, const char name, + int linkage, ctf_id_t type, + size_t size, size_t offset); We tie datasecs quite closely to variables at addition (and, as should become clear later, dedup) time: you never create datasecs, you only create variables in datasecs, and the datasec springs into existence when you do so: datasecs are always found in the same dict as the variables they contain (the variables are never in the parent if the datasec is in a child or anything). We keep track of the variable->datasec mapping in ctf_var_datasecs (populating it at addition and open time), to allow ctf_variable_datasec to work at reasonable speed. (But, as yet, there are no tests of this function at all.) The datasecs are created unsorted (to avoid variable addition becoming O(n^2)) and sorted at serialization time, and when ctf_datasec_var_offset is invoked. We reuse the natural-alignment code from struct addition to get a plausible offset in datasecs if an alignment of -1 is specified: maybe this is unnecessary now (it was originally added when ctf_add_variable added variables to a "default datasec", while now it just leaves them out of all datasecs, like externs are). One constraint of this is that we currently prohibit the addition of nonrepresentable-typed variables, because we can't tell what their natural alignment is: if we dropped the whole "align" and just required everyone adding a variable to a datasec to specify an offset, we could drop that restriction. WDYT? One additional caveat: right now, ctf_lookup_variable() looks up the type of a variable (because when it was invented, variables were not entities in themselves that you could look up). This name is confusing as hell as a result. It might be less confusing to make it return the CTF_K_VAR, but that would be awful to adapt callers to, since both are represented with ctf_id_t's, so the compiler wouldn't warn about the needed change at all... I've vacillated on this three or four times now.	2025-04-25 18:07:43 +01:00
Nick Alcock	097ff012e4	libctf: decl, types: revise ctf_decl, ctf_type_name These all need fairly trivial revisions for prefix types. While we're at it, we can add explicit handling of nonrepresentable types, returning CTF_K_UNKNOWN for such types rather than throwing an error, so that type printing prints (nonrepresentable type) for such types as it always intended to.	2025-04-25 18:07:42 +01:00
Nick Alcock	2c1a0a70d1	libctf, create, types: encoding, BTF floats This adds support for the nearly useless BTF_KIND_FLOAT, under the name CTF_K_BTF_FLOAT. At the same time we fix up the ctf_add_encoding and ctf_type_encoding machinery for the new API changes. I expect this to change a bit: Ali Bahrami reckons I've oversimplified the CTFv4 encoding representation and need to reintroduce at least a width. New API: ctf_id_t ctf_add_btf_float (ctf_dict_t , uint32_t, const char , const ctf_encoding_t *);	2025-04-25 18:07:42 +01:00
Nick Alcock	03609073b0	libctf: create, types: enums and enum64s; type encoding This commit adapts most aspects of enum handling: querying and iteration, enumerator querying and iteration, ctf_type_add, etc. We have to adapt to enum64s and to signed versus unsigned enums, to our vlen and DTD changes and other internal API changes to handle prefix types etc, and fix the types of things to allow for 64-bit enumerators. We can also (finally!) get useful info on enum size rather than being restricted to a value hardwired into libctf. We also adjust all the type-encoding functions for the internal API changes, since enums are the first encodable entities we have covered. API changes: -typedef int ctf_enum_f (const char name, int val, void arg); +typedef int ctf_enum_f (const char name, int64_t val, void arg); +typedef int ctf_unsigned_enum_f (const char name, uint64_t val, void arg); -extern const char ctf_enum_name (ctf_dict_t , ctf_id_t, int); -extern int ctf_enum_value (ctf_dict_t , ctf_id_t, const char , int ); +extern const char ctf_enum_name (ctf_dict_t , ctf_id_t, int64_t); +extern int ctf_enum_value (ctf_dict_t , ctf_id_t, const char , int64_t ); +extern int ctf_enum_unsigned_value (ctf_dict_t , ctf_id_t, const char , uint64_t ); + +/ Return 1 if this enum's contents are unsigned, so you can tell which of the + above functions to use. / + +extern int ctf_enum_unsigned (ctf_dict_t , ctf_id_t); -/* Return all enumeration constants in a given enum type. / -extern int ctf_enum_iter (ctf_dict_t , ctf_id_t, ctf_enum_f , void ); +/* Return all enumeration constants in a given enum type. The return value, and + VAL argument, may need to be cast to uint64_t: see ctf_enum_unsigned(). / +extern int64_t ctf_enum_iter (ctf_dict_t , ctf_id_t, ctf_enum_f , void ); extern const char ctf_enum_next (ctf_dict_t , ctf_id_t, ctf_next_t *, - int ); + int64_t ); + +/ enums are created signed by default. If you want an unsigned enum, + use ctf_add_enum_encoded() with an encoding of 0 (CTF_INT_SIGNED and + everything else off). This will not create a slice, unlike all other + uses of ctf_add_enum_encoded(), and the result is still representable + as BTF. / + +extern ctf_id_t ctf_add_enum64_encoded (ctf_dict_t , uint32_t, const char , + const ctf_encoding_t ); +extern ctf_id_t ctf_add_enum64 (ctf_dict_t , uint32_t, const char ); -extern int ctf_add_enumerator (ctf_dict_t , ctf_id_t, const char , int); +extern int ctf_add_enumerator (ctf_dict_t , ctf_id_t, const char , int64_t); The only aspects of enums that are not now handled are forwards to enums, dumping of enums, and deduplication of enums.	2025-04-25 18:07:42 +01:00
Nick Alcock	0a3ee49dd0	libctf: types: add ctf_struct_bitfield (NEEDS REVIEW) This new public API function allows you to find out if a struct has the bitfield flag set or not. (There are no other properties specific to a struct, so we needed a new function for it. I am open to a ctf_struct_info() function handing back a struct if people prefer.) New API: int ctf_struct_bitfield (ctf_dict_t *, ctf_id_t);	2025-04-25 18:07:42 +01:00
Nick Alcock	ceb15ece5e	libctf: create: ctf_add_type modifications This adapts ctf_add_type a little, adding support for prefix types, shifting away from hardwired things towards API functions that can adapt to the CTFv4 changes, adapting to the structure/union API changes, and adding bitfielded structures and structure members as needed.	2025-04-25 18:07:42 +01:00
Nick Alcock	4a4312b684	libctf: types: ctf_type_resolve_nonrepresentable This new internal function allows us to say "resolve a type to its base type, but treat type 0 like BTF, returning 0 if it is found rather than erroring with ECTF_NONREPRESENTABLE". Used in the next commit.	2025-04-25 18:07:42 +01:00
Nick Alcock	20e6f72dc7	libctf: create: structure and union member addition There is one API addition here: int ctf_add_member_bitfield (ctf_dict_t , ctf_id_t souid, const char , ctf_id_t type, unsigned long bit_offset, int bit_width); SoU addition handles the representational changes for bitfields and for CTF_K_BIG structs (i.e. all structs you can add members to), errors out if you add bitfields to structs that aren't created with the CTF_ADD_STRUCT_BITFIELDS flag, and arranges to add padding as needed if there is too much of a gap for the offsets to encode in one hop (that part is still untested).	2025-04-25 18:07:42 +01:00
Nick Alcock	cd8ea31666	libctf: create: struct/union addition There's one API addition here: the existing CTF_ADD_ROOT / CTF_ADD_NONROOT flags can have a new flag ORed with them, CTF_ADD_STRUCT_BITFIELDS, indicating that the newly-added struct/union is capable of having bitfields added to it via the new ctf_add_member_bitfield function (see a later commit). Without this, you can only add bitfields via the deprecated slice or base type encoding representations (the former will force CTF output). Implementation notes: structs and unions are always added with a CTF_K_BIG prefix: if promoting from a forward, one is added. These are elided at serialization time if they are not needed to encode this size of struct / this number of members. (This means you don't have to figure out in advance if your struct will be too big for BTF: you can just add members to it, and libctf will figure it out and upgrade the dict as needed, or tell you it can't if you've forbidden such things.) We take advantage of this to merge a couple of very similar functions, saving a bit of code.	2025-04-25 18:07:42 +01:00
Nick Alcock	d5bb2772c6	libctf: create: DTD addition and deletion; ctf_rollback DTD deletion changes mostly relate to the changes to the ctf_dtdef_t, but also we no longer nede to have special handling for forwards (we can just use ctf_type_kind_forwarded like everyone else). Rollback no longer needs to delete things by hand (it hasn't needed to for years): it can just call ctf_dtd_delete. ctf_add_generic changes substantially, mostly to allow for the ctf_dtdef_t changes. Rather than returning a type ID it now returns the DTD it just allocated: it can also be asked to add some prefixes, and return the first prefix added (which may not be the first prefix in the type, because if it is asked to add a non-root-visible type it will additionally allocate a CTF_K_CONFLICTING prefix to encode that). Finally, duplicate name detection is suppressed for type and decl tags.	2025-04-25 18:07:42 +01:00
Nick Alcock	05a2970ad1	libctf: create, lookup: delete DVDs; ctf_lookup_by_kind Variable handling in BTF and CTFv4 works quite differently from in CTFv3. Rather than a separate section containing sorted, bsearchable variables, they are simply named entities like types, stored in CTF_K_VARs. As a first stage towards migrating to this, delete most references to the ctf_varent_t and ctf_dvdef_t, including the DVD lookup code, all the linking code, and quite a lot of the serialization code. Note: CTF_LINK_OMIT_VARIABLES_SECTION, and the whole "delete variables that already exist in the symtypetabs section" stuff, has yet to be reimplemented. We can implement CTF_LINK_OMIT_VARIABLES_SECTION by simply excising all CTF_K_VARs at deduplication time if requested. (Note: symtypetabs should still point directly at the type, not at the CTF_K_VAR.) (Symtypetabs in general need a bit more thought -- perhaps we can now store them in a separate .ctf.symtypetab section with its own little four-entry header for the symtypetabs and their indexes, making .ctf even more like .BTF; the only difference would then be that .ctf could include prefix types, CTF_K_FLOAT, and external string refs. For later discussion.) We also add ctf_lookup_by_kind() at this stage (because it is hopelessly diff-entangled with ctf_lookup_variable): this looks up a type of a particular kind, without needing a per-kind lookup function for it, nor needing to hack around adding string prefixes (so you can do ctf_lookup_by_kind (fp, CTF_K_STRUCT, "foo") rather than having to do ctf_lookup_by_name (fp, "struct foo"): often this is more convenient, and anything that reduces string buffer manipulation in C is good.)	2025-04-25 18:07:42 +01:00
Nick Alcock	a93ad066f7	libctf: create: vlen growth and prefix addition (NEEDS REVIEW) This commit modifies ctf_grow_vlen to account for the recent changes to ctf_dtdef_t, and adds a new ctf_add_prefix function to add a prefix to an existing type, moving the dtd_data and dtd_vlen up accordinly. It deserves close review, since this is probably the single greatest bug cluster in libctf: the number of times I added to a variable of type ctf_type_t and assumed it would move it in bytes rather than ctf_type_t units is hard to believe.	2025-04-25 18:07:42 +01:00

1 2 3 4 5 ...

467 Commits