binutils-gdb

Author	SHA1	Message	Date
Nick Alcock	3520fb4568	libctf: serialize: finish off the serializer The only remaining parts of serialization that need fixing up is ctf_preserialize, which despite its name does nearly all the work of serialization: the only bit it doesn't do is write the string tables (since that has to happen across dicts after all the dicts have otherwise been laid out, in order to deduplicate the strtabs). As usual in this series, there's adjustment for various field name changes (maxtypes -> ntypes, the move into ctf_serialize, etc), and extra work to figure out whether we're emitting BTF or not and to handle the distinction between CTF and BTF headers, and not try to emit CTF-only stuff like the symtypetabs into BTF dicts; we can also throw out a bunch of old code that sets compatibility flags, everything to do with forcing variables into the dynamic state in case they changed (we're going to handle that more generally for everything in the types table at a later date, outside serialization), and everything to do with special handling of variables in general. But much of that is only a couple of lines each, and most of the changes are mechanical: this is probably the simplest serialization commit in this series.	2025-04-25 18:12:47 +01:00
Nick Alcock	d5012389a4	libctf: serialize: handle CTF-versus-BTF output format checks The internal function ctf_serialize_output_format centralizes all the checks for BTF-versus-CTF, checking to see if the type section, active suppressions, and BTF-emission mode permit BTF emission, setting ctf_serialize.cs_is_btf if we are actually BTF, and raising ECTF_NOTBTF if we are requiring BTF emission but the type section is such that we can't emit it. (There is a forcing parameter in place, as with most of these serialization functions, to allow for the caller to force CTF emission if it knows the output will be compressed or will be part of multi-member archives or something else external to the type section that BTF does not support.)	2025-04-25 18:07:44 +01:00
Nick Alcock	585f569a2d	libctf: serialize: size and emit the type section As with sizing, this needs to support type suppression and CTF_K_BIG elision, and adapt to the DTD representation changes. Those changes cause a general complexity reduction because we no longer have to memcpy the vlen into place separately for every type kind, but can do it all at once using shared code above the per-kind switch statement. That statement's only job now is generating refs out of type IDs and string offsets, and translating the struct offset from gap- into non-gap representation for non-big structs. We do three distinct things: - check whether all the types in a section are BTF-compatible, after suppression of unwanted type kinds (including types with unwanted prefixes), and elision of unneeded struct/union CTF_K_BIGs - size the type section, taking suppression and CTF_K_BIG elision into account - actually emit it, again taking all the above into account These all have to come to the same conclusions for every type: if the first one gets things wrong we might try to emit something as BTF when we can't; if the latter two are inconsistent, we might have a buffer overrun. So the type emission code double-checks BTF-compatibility and raises ECTF_NOTBTF if necessary; we also aggressively check for potential overruns before every memcpy() into the buffer and raise an ECTF_INTERNAL assertion failure if need be. Thankfully there are a lot fewer memcpy()s than there used to be: there are only four places we need to check, all close to each other, which is pretty maintainable. We add a bit of debugging when --enable-libctf-hash-debugging is on, printing the translation from provisional to final type ID so that you can use it to map back to the provisional ID again when trying to track down deduplicator problems, since the IDs the deduplicator will report at its emission time are only provisional (the final parent-relative IDs are not assigned until now).	2025-04-25 18:07:44 +01:00
Nick Alcock	67cd167767	libctf: serialize: type section sizing This is made much simpler by the fact that the DTD representation now tracks the size of each vlen, so we don't need per-type-kind code to track it ourselves any more. There's extra code to handle type suppression, CTF_K_BIG elision, and prefixes.	2025-04-25 18:07:44 +01:00
Nick Alcock	db98972145	libctf: serialize: check the type section for BTF-incompatible types We add a new ctf_type_sect_is_btf function (internal to ctf-serialize.c) to check the type section against the write prohibitions list and (after write-suppression) against the set of types allowed in BTF, and determine whether this type section contains any types BTF does not allow. CTF-specific type kinds like CTF_K_FLOAT are obviously prohibited in BTF, as are CTF-specific prefixes, except that CTF_K_BIG is allowed if and only if both its ctt_size and vlen are still zero: in that case it will be elided by type section writeout and will never appear in the BTF at all. Structs are checked to make sure they don't use any nameless padding members and that (if they are bitfields) all their offsets will still fit after conversion from CTF_K_BIG gap-between-struct-members representation (if they are not bitfields, we know they will fit, but for bitfields, they might be too big).	2025-04-25 18:07:44 +01:00
Nick Alcock	c14bdfc7a4	libctf: serialize: kind suppression and prohibition The CTF serialization machinery decides whether to write out a dict as BTF or CTF (or, in LIBCTF_BTM_BTF mode, whether to write out a dict or fail with ECTF_NOTBTF) in part by looking at the type kinds in the dictionary. It is possible that you'd like to extend this check and ban specific type kinds from the dictionary (possibly even if it's CTF); it's also possible that you'd like to not fail even if a CTF-only kind is found, but rather replace it with a still-valid stub (CTF_K_UNKNOWN / BTF_KIND_UNKNOWN) and keep going. (The kernel's btfarchive machinery does this to ensure that the compiler and previous link stages have emitted only valid BTF type kinds.) ctf_write_suppress_kind supports both these use cases: +int ctf_write_suppress_kind (ctf_dict_t *fp, int kind, int prohibited); This commit adds only the core population code: the actual suppression is spread across the serializer and will be added in the next commits.	2025-04-25 18:07:44 +01:00
Nick Alcock	f782340ba5	libctf, serialize: preparatory steps The new serializer is quite a lot more customizable than the old, because it can write out BTF as well as CTF: you can ask to write out BTF or fail, write out CTF if required to avoid information loss, otherwise BTF, or always write out CTF. Callers often need to find out whether a dict could be written out as BTF before deciding how to write it out (because a dict can never be written out as BTF if it is compressed, a caller might well want to ask if there is anything else that prevents BTF writeout -- say, slices, conflicting types, or CTF_K_BIG -- before deciding whether to compress it). GNU ld will do this whenever it is passed only BTF sections on the input. Figuring out whether a dict can be written out as BTF is quite expensive: we have to traverse all the types and check them, including every member of every struct. So we'd rather do that work only once. This means making a lot of state once private to ctf_preserialize public enough that another function can initialize it; and since the whole API is available after calling this function and before serializing, we should probably arrange that if we do things we know will invalidate the results of all this checking, we are forced to do it again. This commit does that, moving all the existing serialization state into a new ctf_serialize_t and adding to it. Several functions grow force_ctf arguments that allow the caller to force CTF emission even if the type section looks BTFish: the writeout code and archive creation use this to force CTF emission if we are compressing, and archive creation uses it to force CTF emission if a CTF multi-member archive is in use, because BTF doesn't support archives at all so there's no point maintaining BTF compatibility in that case. The ctf_write* functions gain support for writing out BTF headers as well as CTF, depending on whether what was ultimately written out was actually BTF or not. Even more than most commits in this series, there is no way this is going to compile right now: we're in the middle of a major transition, completed in the next few commits.	2025-04-25 18:07:44 +01:00
Nick Alcock	05a2970ad1	libctf: create, lookup: delete DVDs; ctf_lookup_by_kind Variable handling in BTF and CTFv4 works quite differently from in CTFv3. Rather than a separate section containing sorted, bsearchable variables, they are simply named entities like types, stored in CTF_K_VARs. As a first stage towards migrating to this, delete most references to the ctf_varent_t and ctf_dvdef_t, including the DVD lookup code, all the linking code, and quite a lot of the serialization code. Note: CTF_LINK_OMIT_VARIABLES_SECTION, and the whole "delete variables that already exist in the symtypetabs section" stuff, has yet to be reimplemented. We can implement CTF_LINK_OMIT_VARIABLES_SECTION by simply excising all CTF_K_VARs at deduplication time if requested. (Note: symtypetabs should still point directly at the type, not at the CTF_K_VAR.) (Symtypetabs in general need a bit more thought -- perhaps we can now store them in a separate .ctf.symtypetab section with its own little four-entry header for the symtypetabs and their indexes, making .ctf even more like .BTF; the only difference would then be that .ctf could include prefix types, CTF_K_FLOAT, and external string refs. For later discussion.) We also add ctf_lookup_by_kind() at this stage (because it is hopelessly diff-entangled with ctf_lookup_variable): this looks up a type of a particular kind, without needing a per-kind lookup function for it, nor needing to hack around adding string prefixes (so you can do ctf_lookup_by_kind (fp, CTF_K_STRUCT, "foo") rather than having to do ctf_lookup_by_name (fp, "struct foo"): often this is more convenient, and anything that reduces string buffer manipulation in C is good.)	2025-04-25 18:07:42 +01:00
Nick Alcock	b5d3790c66	libctf: consecutive ctf_id_t assignment This change modifies type ID assignment in CTF so that it works like BTF: rather than flipping the high bit on for types in child dicts, types ascend directly from IDs in the parent to IDs in the child, without interruption (so type 0x4 in the parent is immediately followed by 0x5 in all children). Doing this while retaining useful semantics for modification of parents is challenging. By definition, child type IDs are not known until the parent is written out, but we don't want to find ourselves constrained to adding types to the parent in one go, followed by all child types: that would make the deduplicator a nightmare and would frankly make the entire ctf_add*() interface next to useless: all existing clients that add types at all add types to both parents and children without regard for ordering, and breaking that would probably necessitate redesigning all of them. So we have to be a litle cleverer. We approach this the same way as we approach strings in the recent refs rework: if a parent has children attached (or has ever had them attached since it was created or last read in), any new types created in the parent are assigned provisional IDs starting at the very top of the type space and working down. (Their indexes in the internal libctf arrays remain unchanged, so we don't suddenly need multigigabyte indexes!). At writeout (preserialization) time, we traverse the type table (and all other table containing type IDs) and assign refs to every type ID in exactly the same way we assign refs to every string offset (just a different set of refs -- we don't want to update type IDs with string offset values!). For a parent dict with children, these refs are real entities in memory: pointers to the memory locations where type IDs are stored, tracked in the DTD of each type. As we traverse the type table, we assign real IDs to each type (by simple incrementation), storing those IDs in a new dtd_final_type field in the DTD for each type. Once the type table and all other tables containing type IDs are fully traversed, we update all the refs and overwrite the IDs currently residing in each with the final IDs for each type. That fixes up IDs in the parent dict itself (including forward references in structs and the like: that's why the ref updates only happen at the end); but what about child dicts' references, both to parent types and to their own? We add armouring to enforce that parent dicts are always serialized before their children (which ctf-link.c already does, because it's a precondition for strtab deduplication), and then arrange that when a ref is added to a type whose ID has been assigned (has a dtd_final_type), we just immediately do an update rather than storing a ref for later updating. Since the parent is already serialized, all parent type IDs have a dtd_final_type by this point, and all parent IDs in the children are properly updated. The child types can now be renumbered now we now the number of types in the parent, and their refs updated identically to what was just done with the parent. One wrinkle: before the child refs are updated, while we are working over the child's type section, the type IDs in the child start from 1 (or something like that), which might seem to overlap the parent IDs. But this is not the case: when you serialize the parent, the IDs written out to disk are changed, but the only change to the representation in memory is that we remember a dtd_final_type for each type (and use it to update all the child type refs): its ID in memory is the same as it always was, a nonoverlapping provisional ID higher than any other valid ID. We enforce all of this by asserting that when you add a ref to a type, the memory location that is modified must be in the buffer being serialized: the code will not let you accidentally modify the actual DTDs in memory. We track the number of types in the parent in a new CTFv4 (not BTF) header field (the dumper is updated): we will also use this to open CTFv3 child dicts without change by simply declaring for them that the parent dict has 2^31 types in it (or 2^15, for v2 and below): the IDs in the children then naturally come out right with no other changes needed. (Right now, opening CTFv3 child dicts requires extra compatibility code that has not been written, but that code will no longer need to worry about type ID differences.) Various things are newly forbidden: - you cannot ctf_import() a child into a parent if you already ctf_add()ed types to the child, because all its IDs would change (and since you already cannot ctf_add() types to a child that hasn't had its parent imported, this in practice means only that ctf_create() must be followed immediately by a ctf_import() if this is a new child, which all sane clients were doing anyway). - You cannot import a child into a parent which has the wrong number of (non-provisional) types, again because all its IDs would be wrong: because parents only add types in the provisional space if children are attached to it, this would break the not unknown case of opening an archive, adding types to the parent, and only then importing children into it, so we add a special case: archive members which are not children in an archive with more than one member always pretend to have at least one child, so type additions in them are always provisional even before you ctf_import anything. In practice, this does exactly what we want, since all archives so far are created by the linker and have one parent and N children of that parent. Because this introduces huge gaps between index and type ID for provisional types, some extra assertions are added to ensure that the internal ctf_type_to_index() is only ever called on types in the current dict (never a parent dict): before now, this was just taken on trust, and it was often wrong (which at best led to wrong results, as wrong array indexes were used, and at worst to a buffer overflow). When hash debugging is on (suggesting that the user doesn't mind expensive checks), every ctf_type_to_index() triggers a ctf_index_to_type() to make sure that the operations are proper inverses. Lots and lots of tests are added to verify that assignment works and that updating of every type kind works fine -- existing tests suffice for type IDs in the variable and symtypetab sections. The ld-ctf tests get a bunch of largely display-based updates: various tests refer to 0x8... type IDs, which no longer exist, and because the IDs are shorter all the spacing and alignment has changed.	2025-03-16 15:25:27 +00:00
Nick Alcock	a480362d88	libctf: string: refs rework This commit moves provisional (not-yet-serialized) string refs towards the scheme to be used for CTF IDs in the future. In particular - provisional string offsets now count downwards from just under the external string offset space (all bits on but the high bit). This makes it possible to detect an overflowing strtab, and also makes it trivial to determine whether any string offset (ref) updates were missed -- where before we might get a slightly corrupted or incorrect string, we now get a huge high strtab offset corresponding to no string, and an error is emitted at read time. - refs are emitted at serialization time during the pass through the types. They are strictly associated with the newly-written-out buffer: the existing opened CTF dict is not changed, though it does still get the new strtab so that new refs to the same string can just refer directly to it. The provisional strtab hash table that contains these strings is not deleted after serialization (because we might serialize again): instead, we keep track in the parent of the lowest-yet-used ("latest") provisional strtab offset, and any strtab offset above that, but not external (high-bit-on) is considered provisional. This is sort-of-enforced by moving most of the ref-addition function declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is not included by ctf-create.c or ctf-open.c. - because we don't add refs when adding types, we don't need to handle the case where we add things to expanding vlens (enums, struct members) and have to realloc() them. So the entire painful movable refs system can just be deleted, along with the ability to remove refs piecemeal at all (purging all of them is still possible). Strings added during type addition are added via ctf_str_add(), which adds no refs: the strings are picked up at serialization time and refs to their final, serialized resting place added. The DTDs never have any refs in them, and their provisional strtab offsets are never updated by the ref system. This caused several bugs to fall out of the earlier work and get fixed. In particular, attempts to look up a string in a child dict now search the parent's provisional strtab too: we add some extra special casing for the null string so we don't need to worry about deduplication moving it somewhere other than offset zero. Finally, the optimization that removes an unreferenced synthetic external strtab (the record of the strings the linker has told us about, kept around internally for lookup during late serialization) is faulty: references to a strtab entry will only produce CTF-level refs if their value might change, and an external string's offset won't change, so it produces no refs: worse yet, even if we did get a ref (say, if the string was originally believed to be internal and only later were we told that the linker knew about it too), when we serialize a strtab, all its refs are dropped (since they've been updated and can no longer change); so if we serialized it a second time, its synthetic external strtab would be considered empty and dropped, even though the same external strings as before still exist, referencing it. We must keep the synthetic external strtab around as long as external strings exist that reference it, i.e. for the life of the dict. One benefit of all this: now we're emitting provisional string offsets at a really high value, it's out of the way of the consecutive, deduplicated string offsets in child dicts. So we can drop the constraint that you cannot add strings to a dict with children, which allows us to add types freely to parent dicts again. What you can't do is write that dict out again: when we serialize, we currently update the dict being serialized with the updated strtabs: when you write a dict out, its provisional strings become real strings, and suddenly the offsets would overlap once more. But opening a dict and its children, adding to it, and then writing it out again is rare indeed, and we have a workaround: anyone wanting to do this can just use ctf_link instead.	2025-02-28 15:13:24 +00:00
Nick Alcock	ba66e0cc32	libctf: do not deduplicate strings in the header It is unreasonable to expect users to ctf_import the parent before being able to understand the header -- doubly so because the only string in the header which is likely to be deduplicable is the parent name, which is the same in every child, yet without the parent name being available in the child's strtab you cannot call ctf_parent_name to figure out which parent to import! libctf/ * ctf-serialize.c (ctf_preserialize): Prevent deduplication of header string fields. * ctf-open.c (ctf_set_base): Note this. * ctf-string.c (ctf_str_free_atom): Likewise.	2025-02-28 14:47:24 +00:00
Nick Alcock	a14fb397b2	libctf: tear opening and serialization in two The next stage in sharing the strtab involves tearing two core parts of libctf into two pieces. Large parts of init_static_types, called at open time, involve traversing the types table and initializing the hashtabs used by the type name lookup functions and the enumerator conflicting checks. If the string table is partly located in the parent dict, this is obviously not going to work: so split out that code into a new init_static_types_names function (which also means moving the wrapper around init_static_types that was used to simplify the enumerator code into being a wrapper around init_static_types_names instead) and call that from init_static_types (for parent dicts, and < v4 dicts), and from ctf_import (for v4 dicts). At the same time as doing this we arrange to set LCTF_NO_STR (recently introduced) iff this is a v4 child dict with a nonzero cth_parent_strlen: this then blocks more or less everything that involves string operations until a ctf_import has actually imported the strtab it depends on. (No string oeprations that actually use this have been introduced yet, but since no string deduplication is happening yet either this is harmless.) For v4 dicts, at import time we also validate that the cth_parent_strlen has the same value as the parent's strlen (zero is also a valid value, indicating a non-shared strtab, as is commonplace in older dicts, dicts emitted by the compiler, parent dicts etc). This makes ctf_import more complex, so we simplify things again by dropping all the repeated code in the obscure used-only-by-ctf_link ctf_import_unref and turning both into wrappers around an internal function. We prohibit repeated ctf_imports (except of NULL or the same dict repeatedly), and set up some new fields which will be used later to prevent people from adding strings to parent dicts with pre-existing serialized strtabs once they have children imported into them (which would change their string length and corrupt all those strtabs). Serialization also needs to be torn in two. The problem here is that currently serialization does too much: it emits everything including the strtab, does things that depend on the strtab being finalized (notably variable table sorting), and then writes it out. Much of this emission itself involves strtab writes, so the strtab is not actually complete until halfway through ctf_serialize. But when deduplicating, we want to use machinery in ctf-link and ctf-dedup to deduplicate the strtab after it is complete, and only then write it out. We could do this via having ctf_serialize call some sort of horrible callback, but it seems much simpler to just cut ctf_serialize in two, and introduce a new ctf_preserialize which can optionally be called to do all this "everything but the strtab" work. (If it's not called, ctf_serialize calls it itself.) This means pulling some internal variables out of ctf_serialize into the ctf_dict_t, and slightly abusing LCTF_NO_STR to mean (in addition to its "no, you can't do much between opening a child dict and importing its parent" semantics), "no, you can't do much between calling ctf_preserialize and ctf_serialize". The requirements of both are not quite identical -- you definitely can do things that involve string lookups after ctf_preserialize -- but it serves to stop callers from accidentally adding more types after the types table has been written out, and that's good enough. ctf_preserialize isn't public API anyway. libctf/ * ctf-impl.h (struct ctf_dict) [ctf_serializing_buf]: New. [ctf_serializing_buf_size]: Likewise. [ctf_serializing_vars]: Likewise. [ctf_serializing_nvars]: Likewise. [ctf_max_children]: Likewise. (LCTF_PRESERIALIZED): New. (ctf_preserialize): New. (ctf_depreserialize): New. * ctf-open.c (init_static_types): Rename to... (init_static_types_names): ... this, wrapping a different function. (init_static_types_internal): Rename to... (init_static_types): ... this, and set LCTF_NO_STR if neecessary. Tear out the name-lookup guts into... (init_static_types_names_internal): ... this new function. Fix a few comment typos. (ctf_bufopen): Emphasise that you cannot rely on looking up strings at any point in ctf_bufopen any more. (ctf_dict_close): Free ctf_serializing_buf. (ctf_import): Turn into a wrapper, calling... (ctf_import_internal): ... this. Prohibit repeated ctf_imports of different parent dicts, or "unimporting" by setting it back to NULL again. Validate the parent we do import using cth_parent_strlen. Call init_static_types_names if the strtab is shared with the parent. (ctf_import_unref): Turn into a wrapper. * ctf-serialize.c (ctf_serialize): Split out everything before strtab serialization into... (ctf_preserialize): ... this new function. (ctf_depreserialize): New, undo preserialization on error.	2025-02-28 14:47:24 +00:00
Nick Alcock	6c77689963	include, libctf: add cth_parent_strlen CTFv4 header field The first format difference between v3 and v4 is a cth_parent_strlen header field. This field (obviously not present in BTF) is populated from the string table length of the parent at serialization time (protection against being serialized before the parent is will be added in a later commit in this series), and will be used at open time to prohibit opening of dicts with a different strlen (which would corrupt the child's string table if it was shared with the parent). For now, just add the field, populate it at serialization time when linking (when not linking, no deduplication is done and the correct value remains unchanged), and dump it. include/ * ctf.h (ctf_header) [cth_parent_strlen]: New. libctf/ * ctf-dump.c (ctf_dump_header_sizefield): New. (ctf_dump_header): Use to dump the cth_parent_strlen. * ctf-open.c (upgrade_header_v2): Populate cth_parent_strlen. (upgrade_header_v3): Likewise. (ctf_flip_header): Flip it. (ctf_bufopen): Drop unnecessary initialization. * ctf-serialize.c (ctf_serialize): Write it out when linking. ld/ * testsuite/ld-ctf/data-func-conflicted-vars.d: Skip the nwe dump output. * testsuite/ld-ctf/data-func-conflicted.d: Likewise.	2025-02-28 14:47:24 +00:00
Nick Alcock	9a74ab12c8	include, libctf: start work on libctf v4 This format is a superset of BTF, but for now we just do the minimum to declare a new file format version, without actually introducing any format changes. From now on, we refuse to reserialize CTFv1 dicts: these have a distinct parent/child boundary which obviously cannot change upon reserialization (that would change the type IDs): instead, we encoded this by stuffing in a unique CTF version for such dicts. We can't do that now we have one version for all CTFv4 dicts, and testing such old dicts is very hard these days anyway, and is not automated: so just drop support for writing them out entirely. (You still can write them out, but you have to do a full-blown ctf_link, which generates an all-new fresh dict and recomputes type IDs as part of deduplication.) To prevent this extremely-not-ready format escaping into the wild, add a new mechanism whereby any format version higher than the new #define CTF_STABLE_VERSION cannot be serialized unless I_KNOW_LIBCTF_IS_UNSTABLE is set in the environment. include/ * ctf-api.h (_CTF_ERRORS) [ECTF_CTFVERS_NO_SERIALIZE]: New. [ECTF_UNSTABLE]: New. (ECTF_NERR): Update. * ctf.h: Small comment improvements.. (ctf_header_v3): New, copy of ctf_header. (CTF_VERSION_4): New. (CTF_VERSION): Now CTF_VERSION_4. (CTF_STABLE_VERSION): Still 4, CTF_VERSION_3. ld/ * testsuite/ld-ctf/.d: Update to CTF_VERSION_4. libctf/ ctf-impl.h (LCTF_NO_SERIALIZE): New. * ctf-dump.c (ctf_dump_header): Add CTF_VERSION_4. * ctf-open.c (ctf_dictops): Likewise. (upgrade_header): Rename to... (upgrade_header_v2): ... this. (upgrade_header_v3): New. (upgrade_types): Support upgrading from CTF_VERSION_3. Turn on LCTF_NO_SERIALIZE for CTFv1. (init_static_types_internal): Upgrade all types tables older than * CTF_VERSION_4. (ctf_bufopen): Support CTF_VERSION_4: error out if we forget to update this switch in future. Add header upgrading from v3 and below. Improve comments slightly. * ctf-serialize.c (ctf_serialize): Block serialization of unstable file formats, and of file formats for which LCTF_NO_SERIALIZE is turned on (v1).	2025-02-28 14:47:24 +00:00
Alan Modra	e8e7cf2abe	Update year range in copyright notice of binutils files	2025-01-01 18:29:57 +10:30
Nick Alcock	36c771b179	libctf: fix CTF dict compression Commit `483546ce4f` ("libctf: make ctf_serialize() actually serialize") accidentally broke dict compression. There were two bugs: - ctf_arc_write_one_ctf was still making its own decision about whether to compress the dict via direct ctf_size comparison, which is unfortunate because now that it no longer calls ctf_serialize itself, ctf_size is always zero when it does this: it should let the writing functions decide on the threshold, which they contain code to do which is simply not used for lack of one trivial wrapper to write to an fd and also provide a compression threshold - ctf_write_mem, the function underlying all writing as of the commit above, was calling zlib's compressBound and avoiding compression if this returned a value larger than the input. Unfortunately compressBound does not do a trial compression and determine whether the result is compressible: it just adds zlib header sizes to the value passed in, so our test would always have concluded that the value was incompressible! Avoid by simply always compressing if the raw size is larger than the threshold: zlib is quite clever enough to avoid actually compressing if the data is incompressible. Add a testcase for this. libctf/ * ctf-impl.h (ctf_write_thresholded): New... * ctf-serialize.c (ctf_write_thresholded): ... defined here, a wrapper around... (ctf_write_mem): ... this. Don't check compressibility. (ctf_compress_write): Reimplement as a ctf_write_thresholded wrapper. (ctf_write): Likewise. * ctf-archive.c (arc_write_one_ctf): Just call ctf_write_thresholded rather than trying to work out whether to compress. * testsuite/libctf-writable/ctf-compressed.*: New test.	2024-07-31 21:02:05 +01:00
Nick Alcock	483546ce4f	libctf: make ctf_serialize() actually serialize ctf_serialize() evolved from the old ctf_update(), which mutated the in-memory CTF dict to make all the dynamic in-memory types into static, unchanging written-to-the-dict types (by deserializing and reserializing it): back in the days when you could only do type lookups on static types, this meant you could see all the types you added recently, at the small, small cost of making it impossible to change those older types ever again and inducing an amortized O(n^2) cost if you actually wanted to add references to types you added at arbitrary times to later types. It also reset things so that ctf_discard() would throw away only types you added after the most recent ctf_update() call. Some time ago this was all changed so that you could look up dynamic types just as easily as static types: ctf_update() changed so that only its visible side-effect of affecting ctf_discard() remained: the old ctf_update() was renamed to ctf_serialize(), made internal to libctf, and called from the various functions that wrote files out. ... but it was still working by serializing and deserializing the entire dict, swapping out its guts with the newly-serialized copy in an invasive and horrible fashion that coupled ctf_serialize() to almost every field in the ctf_dict_t. This is totally useless, and fixing it is easy: just rip all that code out and have ctf_serialize return a serialized representation, and let everything use that directly. This simplifies most of its callers significantly. (It also points up another bug: ctf_gzwrite() failed to call ctf_serialize() at all, so it would only ever work for a dict you just ctf_write_mem()ed yourself, just for its invisible side-effect of serializing the dict!) This lets us simplify away a bunch of internal-only open-side functionality for overriding the syn_ext_strtab and some just-added functionality for forcing in an existing atoms table, without loss of functionality, and lets us lift the restriction on reserializing a dict that was ctf_open()ed rather than being ctf_create()d: it's now perfectly OK to open a dict, modify it (except for adding members to existing structs, unions, or enums, which fails with -ECTF_RDONLY), and write it out again, just as one would expect. libctf/ * ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos. (ctf_type_sect_size): Add static type sizes too. (ctf_serialize): Return the new dict rather than updating the existing dict. No longer fail for dicts with static types; copy them onto the start of the new types table. (ctf_gzwrite): Actually serialize before gzwriting. (ctf_write_mem): Improve forced (test-mode) endian-flipping: flip dicts even if they are too small to be compressed. Improve confusing variable naming. * ctf-archive.c (arc_write_one_ctf): Don't bother to call ctf_serialize: both the functions we call do so. * ctf-string.c (ctf_str_create_atoms): Drop serializing case (atoms arg). * ctf-open.c (ctf_simple_open): Call ctf_bufopen directly. (ctf_simple_open_internal): Delete. (ctf_bufopen_internal): Delete/rename to ctf_bufopen: no longer bother with syn_ext_strtab or forced atoms table, serialization no longer needs them. * ctf-create.c (ctf_create): Call ctf_bufopen directly. * ctf-impl.h (ctf_str_create_atoms): Drop atoms arg. (ctf_simple_open_internal): Delete. (ctf_bufopen_internal): Likewise. (ctf_serialize): Adjust. * testsuite/libctf-lookup/add-to-opened.c: Adjust now that this is supposed to work.	2024-04-19 16:14:47 +01:00
Nick Alcock	cf9da3b0b6	libctf: rethink strtab writeout This commit finally adjusts strtab writeout so that repeated writeouts, or writeouts of a dict that was read in earlier, only sorts the portion of the strtab that was newly added. There are three intertwined changes here: - pull the contents of strtabs from newly ctf_bufopened dicts into the atoms table, so that future additions will reuse the existing offset etc rather than adding new identical strings - allow the internal ctf_bufopen done by serialization to contribute its existing atoms table, so that existing atoms can be used for the remainder of the open process (like name table construction): this atoms table currente gets thrown away in the mass reassignment done later in ctf_serialize in any case, but it needs to be there during the open. - rewrite ctf_str_write_strtab so that a) it uses iterators rather than ctf__iter, reducing pointless structures which serve no other purpose than to implement ordinary variable scope, but more clunkily, and b) retains the existing strtab on the front of the new one, with its sort retained, rather than resorting, so all existing already-written strtab offsets remain valid across the call. This latter change finally permits repeated serializations, and reserializations of ctf_open()ed dicts, to work, but for now we keep the code that prevents that because serialization is about to change again in a way that will make it more obvious that doing such things is safe, and we can take it out then. (There are also some smaller changes like moving the purge of the refs table into ctf_str_write_strtab(), since that's where the changes happen that invalidate it, rather than doing it in ctf_serialize(). We also prohibit something that has never worked, opening a dict and then reporting symbols to it via ctf_link_add_strtab() et al: you must do that to newly-created dicts which have had stuff ctf_link()ed into them. This is very unlikely ever to be a problem in practice: linkers just don't do that sort of thing.) libctf/ ctf-create.c (ctf_create): Add (temporary) atoms arg. * ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New. (ctf_str_create_atoms): Adjust. (ctf_str_write_strtab): Likewise. (ctf_simple_open_internal): Likewise. * ctf-open.c (ctf_simple_open_internal): Add atoms arg. (ctf_bufopen): Likewise. (ctf_bufopen_internal): Initialize just enough of an atoms table: pre-init from the atoms arg if supplied. (ctf_simple_open): Adjust. * ctf-serialize.c (ctf_serialize): Constify the strtab. Move ref list purging into ctf_str_write_strtab. Initialize the new dict with the old dict's atoms table. Accept the new strtab from ctf_str_write_strtab. Adjust for addition of ctf_dynstrtab. * ctf-string.c (ctf_strraw_explicit): Improve comments. (ctf_str_create_atoms): Prepopulate from an existing atoms table, or alternatively pull in all strings from the strtab and turn them into atoms. (ctf_str_free_atoms): Free the dynstrtab and its strtab. (struct ctf_strtab_write_state): Remove. (ctf_str_count_strtab): Fold this... (ctf_str_populate_sorttab): ... and this... (ctf_str_write_strtab): ... into this. Prepend existing strings to the strtab rather than resorting them (and wrecking their offsets). Keep the dynstrtab updated. Update refs for all atoms with refs, whether or not they are strings newly added to the strtab.	2024-04-19 16:14:47 +01:00
Nick Alcock	149ce5c263	libctf: replace 'pending refs' abstraction A few years ago we introduced a 'pending refs' abstraction to fix one problem: serializing a dict, then changing it would tend to corrupt the dict because the strtab sort we do on strtab writeout (to improve compression efficiency) would modify the offset of any strings that sorted lexicographically earlier in the strtab: so we added a new restriction that all strings are added only at serialization time, and maintained a set of 'pending' refs that were added earlier, whose offsets we could update (like other refs) at writeout time. This was in hindsight seriously problematic for maintenance (because serialization has to traverse all strings in all datatypes in the entire dict), and has become impossible to sustain now that we can read in existing dicts, modify them, and reserialize them again. We really don't want to have to dig through the entire dict we jut read in just in order to dig out all its strtab offsets, then change it, just for the sake of a sort that adds a frankly trivial amount of compression efficiency. Sorting is still worthwhile -- but it sacrifices very little to only sort newly-added portions of the strtab, reusing older portions as necessary. As a first stage in this, discard the whole "pending refs" abstraction and replace it with "movable" refs, which are exactly like all other refs (addresses containing the strtab offset of some string, which are updated wiht the final strtab offset on serialization) except that we track them in a reverse dict so that we can move the refs around (which we do whenever we realloc() a buffer containing a bunch of structure members or something when we add members to the structure). libctf/ * ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add a movable ref. (ctf_add_member_offset): Likewise. * ctf-util.c (ctf_realloc): Delete. * ctf-serialize.c (ctf_serialize): No longer use it. Adjust to new fields. * ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs. (ctf_str_free_atom): Free freeable atoms' strings. (ctf_str_create_atoms): Create the movable refs dynhash if needed. (ctf_str_free_atoms): Destroy it. (CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous reversion). Add new flag. (aref_create): New, populate movable refs if need be. (ctf_str_add_ref_internal): Switch back to flags, update refs directly for nonprovisional strings (with already-known fixed offsets); create refs via aref_create. Allocate strings only if not within an mmapped strtab. (ctf_str_add_movable_ref): New. (ctf_str_add): Adjust to CTF_STR_* reintroduction. (ctf_str_add_external): LIkewise. (ctf_str_move_refs): New, move refs via ctf_str_movable_refs backpointer. (ctf_str_purge_refs): Drop ctf_str_num_refs. (ctf_str_update_refs): Fix indentation. * ctf-impl.h (struct ctf_str_atom_movable): New. (struct ctf_dict.ctf_str_num_refs): Drop. (struct ctf_dict.ctf_str_movable_refs): New. (ctf_str_add_movable_ref): Declare. (ctf_str_move_refs): Likewise. (ctf_realloc): Drop.	2024-04-19 16:14:46 +01:00
Nick Alcock	3301ddba1b	Revert "libctf: do not corrupt strings across ctf_serialize" This reverts commit `986e9e3aa0`. (We do not revert the testcase -- it remains valid -- but we are taking a different, less complex and more robust approach.) This also deletes the pending refs abstraction without (yet) replacing it, so some tests will fail for a commit or two.	2024-04-19 16:14:46 +01:00
Nick Alcock	4fa4e3d92a	libctf: delete LCTF_DIRTY This flag was meant as an optimization to avoid reserializing dicts unnecessarily. It was critically necessary back when serialization was done by ctf_update() and you had to call that every time you wanted any new modifications to the type table to be usable by other types, but that has been unnecessary for years now, and serialization is only done once when writing out, which one would naturally assume would always serialize the dict. Worse, it never really worked: it only tracked newly-added types, not things like added symbols which might equally well require reserialization, and it gets in the way of an upcoming change. Delete entirely. libctf/ * ctf-create.c (ctf_create): Drop LCTF_DIRTY. (ctf_discard): Likewise. (ctf_rollback): Likewise. (ctf_add_generic): Likewise. (ctf_set_array): Likewise. (ctf_add_enumerator): Likewise. (ctf_add_member_offset): Likewise. (ctf_add_variable_forced): Likewise. * ctf-link.c (ctf_link_intern_extern_string): Likewise. (ctf_link_add_strtab): Likewise. * ctf-serialize.c (ctf_serialize): Likewise. * ctf-impl.h (LCTF_DIRTY): Likewise. (LCTF_LINKING): Renumber.	2024-04-19 16:14:46 +01:00
Nick Alcock	8a60c93096	libctf: support addition of types to dicts read via ctf_open() libctf has long declared deserialized dictionaries (out of files or ELF sections or memory buffers or whatever) to be read-only: back in the furthest prehistory this was not the case, in that you could add a few sorts of type to such dicts, but attempting to do so often caused horrible memory corruption, so I banned the lot. But it turns out real consumers want it (notably DTrace, which synthesises pointers to types that don't have them and adds them to the ctf_open()ed dicts if it needs them). Let's bring it back again, but without the memory corruption and without the massive code duplication required in days of yore to distinguish between static and dynamic types: the representation of both types has been identical for a few years, with the only difference being that types as a whole are stored in a big buffer for types read in via ctf_open and per-type hashtables for newly-added types. So we discard the internally-visible concept of "readonly dictionaries" in favour of declaring the range of types that were already present when the dict was read in to be read-only: you can't modify them (say, by adding members to them if they're structs, or calling ctf_set_array on them), but you can add more types and point to them. (The API remains the same, with calls sometimes returning ECTF_RDONLY, but now they do so less often.) This is a fairly invasive change, mostly because code written since the ban was introduced didn't take the possibility of a static/dynamic split into account. Some of these irregularities were hard to define as anything but bugs. Notably: - The symbol handling was assuming that symbols only needed to be looked for in dynamic hashtabs or static linker-laid-out indexed/ nonindexed layouts, but now we want to check both in case people added more symbols to a dict they opened. - The code that handles type additions wasn't checking to see if types with the same name existed at all (so you could do ctf_add_typedef (fp, "foo", bar) repeatedly without error). This seems reasonable for types you just added, but we probably do want to ban addition of types with names that override names we already used in the ctf_open()ed portion, since that would probably corrupt existing type relationships. (Doing things this way also avoids causing new errors for any existing code that was doing this sort of thing.) - ctf_lookup_variable entirely failed to work for variables just added by ctf_add_variable: you had to write the dict out and read it back in again before they appeared. - The symbol handling remembered what symbols you looked up but didn't remember their types, so you could look up an object symbol and then find it popping up when you asked for function symbols, which seems less than ideal. Since we had to rejig things enough to be able to distinguish function and object symbols internally anyway (in order to give suitable errors if you try to add a symbol with a name that already existed in the ctf_open()ed dict), this bug suddenly became more visible and was easily fixed. We do not (yet) support writing out dicts that have been previously read in via ctf_open() or other deserializer (you can look things up in them, but not write them out a second time). This never worked, so there is no incompatibility; if it is needed at a later date, the serializer is a little bit closer to having it work now (the only table we don't deal with is the types table, and that's because the upcoming CTFv4 changes are likely to make major changes to the way that table is represented internally, so adding more code that depends on its current form seems like a bad idea). There is a new testcase that tests much of this, in particular that modification of existing types is still banned and that you can add new ones and chase them without error. libctf/ * ctf-impl.h (struct ctf_dict.ctf_symhash): Split into... (ctf_dict.ctf_symhash_func): ... this and... (ctf_dict.ctf_symhash_objt): ... this. (ctf_dict.ctf_stypes): New, counts static types. (LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR. (LCTF_RDWR): Deleted. (LCTF_DIRTY): Renumbered. (LCTF_LINKING): Likewise. (ctf_lookup_variable_here): New. (ctf_lookup_by_sym_or_name): Likewise. (ctf_symbol_next_static): Likewise. (ctf_add_variable_forced): Likewise. (ctf_add_funcobjt_sym_forced): Likewise. (ctf_simple_open_internal): Adjust. (ctf_bufopen_internal): Likewise. * ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with. (ctf_create): Migrate a bunch of initializations into bufopen. Force recreation of name tables. Do not forcibly override the model, let ctf_bufopen do it. (ctf_static_type): New. (ctf_update): Drop LCTF_RDWR check. (ctf_dynamic_type): Likewise. (ctf_add_function): Likewise. (ctf_add_type_internal): Likewise. (ctf_rollback): Check ctf_stypes, not LCTF_RDWR. (ctf_set_array): Likewise. (ctf_add_struct_sized): Likewise. (ctf_add_union_sized): Likewise. (ctf_add_enum): Likewise. (ctf_add_enumerator): Likewise (only on the target dict). (ctf_add_member_offset): Likewise. (ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types with colliding names. (ctf_add_forward): Note safety under the new rules. (ctf_add_variable): Split all but the existence check into... (ctf_add_variable_forced): ... this new function. (ctf_add_funcobjt_sym): Likewise... (ctf_add_funcobjt_sym_forced): ... for this new function. * ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts with any stypes. (ctf_link_add_strtab): Likewise. (ctf_link_shuffle_syms): Likewise. (ctf_link_intern_extern_string): Note pre-existing prohibition. * ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check. (ctf_lookup_variable): Split out looking in a dict but not its parent into... (ctf_lookup_variable_here): ... this new function. (ctf_lookup_symbol_idx): Track whether looking up a function or object: cache them separately. (ctf_symbol_next): Split out looking in non-dynamic symtypetab entries to... (ctf_symbol_next_static): ... this new function. Don't get confused by the simultaneous presence of static and dynamic symtypetab entries. (ctf_try_lookup_indexed): Don't waste time looking up symbols by index before there can be any idea how symbols are numbered. (ctf_lookup_by_sym_or_name): Distinguish between function and data object lookups. Drop LCTF_RDWR. (ctf_lookup_by_symbol): Adjust. (ctf_lookup_by_symbol_name): Likewise. * ctf-open.c (init_types): Rename to... (init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes. (ctf_simple_open): Drop writable arg. (ctf_simple_open_internal): Likewise. (ctf_bufopen): Likewise. (ctf_bufopen_internal): Populate fields only used for writable dicts. Drop LCTF_RDWR. (ctf_dict_close): Cater for symhash cache split. * ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR. * ctf-types.c (ctf_variable_next): Drop LCTF_RDWR. * testsuite/libctf-lookup/add-to-opened*: New test.	2024-04-19 16:14:46 +01:00
Nick Alcock	54a0219150	libctf: remove static/dynamic name lookup distinction libctf internally maintains a set of hash tables for type name lookups, one for each valid C type namespace (struct, union, enum, and everything else). Or, rather, it maintains two sets of hash tables: one, a ctf_hash , is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the other, a ctf_dynhash , is meant for lookups in ctf_create()d dicts. This distinction was somewhat valuable in the far pre-binutils past when two different hashtable implementations were used (one expanding, the other fixed-size), but those days are long gone: the hash table implementations are almost identical, both wrappers around the libiberty hashtab. The ctf_dynhash has many more capabilities than the ctf_hash (iteration, deletion, etc etc) and has no downsides other than starting at a fixed, arbitrary small size. That limitation is easy to lift (via a new ctf_dynhash_create_sized()), following which we can throw away nearly all the ctf_hash implementation, and all the code to choose between readable and writable hashtabs; the few convenience functions that are still useful (for insertion of name -> type mappings) can also be generalized a bit so that the extra string verification they do is potentially available to other string lookups as well. (libctf still has two hashtable implementations, ctf_dynhash, above, and ctf_dynset, which is a key-only hashtab that can avoid a great many malloc()s, used for high-volume applications in the deduplicator.) libctf/ * ctf-create.c (ctf_create): Eliminate ctn_writable. (ctf_dtd_insert): Likewise. (ctf_dtd_delete): Likewise. (ctf_rollback): Likewise. (ctf_name_table): Eliminate ctf_names_t. * ctf-hash.c (ctf_dynhash_create): Comment update. Reimplement in terms of... (ctf_dynhash_create_sized): ... this new function. (ctf_hash_create): Remove. (ctf_hash_size): Remove. (ctf_hash_define_type): Remove. (ctf_hash_destroy): Remove. (ctf_hash_lookup_type): Rename to... (ctf_dynhash_lookup_type): ... this. (ctf_hash_insert_type): Rename to... (ctf_dynhash_insert_type): ... this, moving validation to... * ctf-string.c (ctf_strptr_validate): ... this new function. * ctf-impl.h (struct ctf_names): Extirpate. (struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t. (struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t. (ctf_name_table): Now returns a ctf_dynhash_t. (ctf_lookup_by_rawhash): Remove. (ctf_hash_create): Likewise. (ctf_hash_insert_type): Likewise. (ctf_hash_define_type): Likewise. (ctf_hash_lookup_type): Likewise. (ctf_hash_size): Likewise. (ctf_hash_destroy): Likewise. (ctf_dynhash_create_sized): New. (ctf_dynhash_insert_type): New. (ctf_dynhash_lookup_type): New. (ctf_strptr_validate): New. * ctf-lookup.c (ctf_lookup_by_name_internal): Adapt. * ctf-open.c (init_types): Adapt. (ctf_set_ctl_hashes): Adapt. (ctf_dict_close): Adapt. * ctf-serialize.c (ctf_serialize): Adapt. * ctf-types.c (ctf_lookup_by_rawhash): Remove.	2024-04-19 16:14:46 +01:00
Alan Modra	59497587af	libctf warnings Seen with every compiler I have if using -fno-inline: home/alan/src/binutils-gdb/libctf/ctf-create.c: In function ‘ctf_add_encoded’: /home/alan/src/binutils-gdb/libctf/ctf-create.c:555:3: warning: ‘encoding’ may be used uninitialized [-Wmaybe-uninitialized] 555 \| memcpy (dtd->dtd_vlen, &encoding, sizeof (encoding)); Seen with gcc-4.9 and probably others at lower optimisation levels: home/alan/src/binutils-gdb/libctf/ctf-serialize.c: In function 'symtypetab_density': /home/alan/src/binutils-gdb/libctf/ctf-serialize.c:211:18: warning: 'sym' may be used uninitialized in this function [-Wmaybe-uninitialized] if (max < sym->st_symidx) Seen with gcc-4.5 and probably others at lower optimisation levels: /home/alan/src/binutils-gdb/libctf/ctf-types.c:1649:21: warning: 'tp' may be used uninitialized in this function /home/alan/src/binutils-gdb/libctf/ctf-link.c:765:16: warning: 'parent_i' may be used uninitialized in this function Also with gcc-4.5: In file included from /home/alan/src/binutils-gdb/libctf/ctf-endian.h:25:0, from /home/alan/src/binutils-gdb/libctf/ctf-archive.c:24: /home/alan/src/binutils-gdb/libctf/swap.h:70:0: warning: "_Static_assert" redefined /usr/include/sys/cdefs.h:568:0: note: this is the location of the previous definition swap.h (_Static_assert): Don't define if already defined. * ctf-serialize.c (symtypetab_density): Merge two CTF_SYMTYPETAB_FORCE_INDEXED blocks. * ctf-create.c (ctf_add_encoded): Avoid "encoding" may be used uninitialized warning. * ctf-link.c (ctf_link_deduplicating_open_inputs): Avoid "parent_i" may be used uninitialized warning. * ctf-types.c (ctf_type_rvisit): Avoid "tp" may be used uninitialized warning.	2024-04-17 09:24:36 +09:30
Alan Modra	fd67aa1129	Update year range in copyright notice of binutils files Adds two new external authors to etc/update-copyright.py to cover bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then updates copyright messages as follows: 1) Update cgen/utils.scm emitted copyrights. 2) Run "etc/update-copyright.py --this-year" with an extra external author I haven't committed, 'Kalray SA.', to cover gas testsuite files (which should have their copyright message removed). 3) Build with --enable-maintainer-mode --enable-cgen-maint=yes. 4) Check out /po/.pot which we don't update frequently.	2024-01-04 22:58:12 +10:30
Alan Modra	d87bef3a7b	Update year range in copyright notice of binutils files The newer update-copyright.py fixes file encoding too, removing cr/lf on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and embedded cr in binutils/testsuite/binutils-all/ar.exp string match.	2023-01-01 21:50:11 +10:30
Nick Alcock	3ec2b3c058	libctf: avoid mingw warning A missing paren led to an intended cast to avoid dependence on the size of size_t in one argument of ctf_err_warn applying to the wrong type by mistake. libctf/ChangeLog: * ctf-serialize.c (ctf_write_mem): Fix cast.	2022-06-21 19:27:15 +01:00
Nick Alcock	faf5e6ace8	libctf: add LIBCTF_WRITE_FOREIGN_ENDIAN debugging option libctf has always handled endianness differences by detecting foreign-endian CTF dicts on the input and endian-flipping them: dicts are always written in native endianness. This makes endian-awareness very low overhead, but it means that the foreign-endian code paths almost never get routinely tested, since "make check" usually reads in dicts ld has just written out: only a few corrupted-CTF tests are actually in fixed endianness, and even they only test the foreign- endian code paths when you run make check on a big-endian machine. (And the fix is surely not to add more .s-based tests like that, because they are a nightmare to maintain compared to the C-code-based ones.) To improve on this, add a new environment variable, LIBCTF_WRITE_FOREIGN_ENDIAN, which causes libctf to unconditionally endian-flip at ctf_write time, so the output is always in the wrong endianness. This then tests the foreign-endian read paths properly at open time. Make this easier by restructuring the writeout code in ctf-serialize.c, which duplicates the maybe-gzip-and-write-out code three times (once for ctf_write_mem, with thresholding, and once each for ctf_compress_write and ctf_write just so those can avoid thresholding and/or compression). Instead, have the latter two call the former with thresholds of 0 or (size_t) -1, respectively. The endian-flipping code itself gains a bit of complexity, because one single endian-flipper (flip_types) was assuming the input to be in foreign-endian form and assuming it could pull things out of the input once they had been flipped and make sense of them. At the cost of a few lines of duplicated initializations, teach it to read before flipping if we're flipping to foreign-endianness instead of away from it. libctf/ * ctf-impl.h (ctf_flip_header): No longer static. (ctf_flip): Likewise. * ctf-open.c (flip_header): Rename to... (ctf_flip_header): ... this, now it is not private to one file. (flip_ctf): Rename... (ctf_flip): ... this too. Add FOREIGN_ENDIAN arg. (flip_types): Likewise. Use it. (ctf_bufopen_internal): Adjust calls. * ctf-serialize.c (ctf_write_mem): Add flip_endian path via a newly-allocated bounce buffer. (ctf_compress_write): Move below ctf_write_mem and reimplement in terms of it. (ctf_write): Likewise. (ctf_gzwrite): Note that this obscure writeout function does not support endian-flipping.	2022-03-23 13:48:32 +00:00
Nick Alcock	203bfa2f6b	include, libctf, ld: extend variable section to contain functions too The CTF variable section is an optional (usually-not-present) section in the CTF dict which contains name -> type mappings corresponding to data symbols that are present in the linker input but not in the output symbol table: the idea is that programs that use their own symbol- resolution mechanisms can use this section to look up the types of symbols they have found using their own mechanism. Because these removed symbols (mostly static variables, functions, etc) all have names that are unlikely to appear in the ELF symtab and because very few programs have their own symbol-resolution mechanisms, a special linker flag (--ctf-variables) is needed to emit this section. Historically, we emitted only removed data symbols into the variable section. This seemed to make sense at the time, but in hindsight it really doesn't: functions are symbols too, and a C program can look them up just like any other type. So extend the variable section so that it contains all static function symbols too (if it is emitted at all), with types of kind CTF_K_FUNCTION. This is a little fiddly. We relied on compiler assistance for data symbols: the compiler simply emits all data symbols twice, once into the symtypetab as an indexed symbol and once into the variable section. Rather than wait for a suitably adjusted compiler that does the same for function symbols, we can pluck unreported function symbols out of the symtab and add them to the variable section ourselves. While we're at it, we do the same with data symbols: this is redundant right now because the compiler does it, but it costs very little time and lets the compiler drop this kludge and save a little space in .o files. include/ * ctf.h: Mention the new things we can see in the variable section. ld/ * testsuite/ld-ctf/data-func-conflicted-vars.d: New test. libctf/ * ctf-link.c (ctf_link_deduplicating_variables): Duplicate symbols into the variable section too. * ctf-serialize.c (symtypetab_delete_nonstatic_vars): Rename to... (symtypetab_delete_nonstatics): ... this. Check the funchash when pruning redundant variables. (ctf_symtypetab_sect_sizes): Adjust accordingly. * NEWS: Describe this change.	2022-03-23 13:48:32 +00:00
Alan Modra	a2c5833233	Update year range in copyright notice of binutils files The result of running etc/update-copyright.py --this-year, fixing all the files whose mode is changed by the script, plus a build with --enable-maintainer-mode --enable-cgen-maint=yes, then checking out /po/.pot which we don't update frequently. The copy of cgen was with commit d1dd5fcc38ead reverted as that commit breaks building of bfp opcodes files.	2022-01-02 12:04:28 +10:30
Nick Alcock	86f64bf43f	libctf, serialize: functions with no args have a NULL dtd_vlen Every place that accesses a function's dtd_vlen accesses it only if the number of args is nonzero, except the serializer, which always tries to memcpy it. The number of bytes it memcpys in this case is zero, but it is still undefined behaviour to copy zero bytes from a null pointer. So check for this case explicitly. libctf/ChangeLog 2021-03-25 Nick Alcock <nick.alcock@oracle.com> PR libctf/27628 * ctf-serialize.c (ctf_emit_type_sect): Allow for a NULL vlen in CTF_K_FUNCTION types.	2021-03-25 16:32:48 +00:00
Nick Alcock	08c428aff4	libctf: eliminate dtd_u, part 5: structs / unions Eliminate the dynamic member storage for structs and unions as we have for other dynamic types. This is much like the previous enum elimination, except that structs and unions are the only types for which a full-sized ctf_type_t might be needed. Up to now, this decision has been made in the individual ctf_add_{struct,union}_sized functions and duplicated in ctf_add_member_offset. The vlen machinery lets us simplify this, always allocating a ctf_lmember_t and setting the dtd_data's ctt_size to CTF_LSIZE_SENT: we figure out whether this is really justified and (almost always) repack things down into a ctf_stype_t at ctf_serialize time. This allows us to eliminate the dynamic member paths from the iterators and query functions in ctf-types.c in favour of always using the large-structure vlen stuff for dynamic types (the diff is ugly but that's just because of the volume of reindentation this calls for). This also means the large-structure vlen stuff gets more heavily tested, which is nice because it was an almost totally unused code path before now (it only kicked in for structures of size >4GiB, and how often do you see those?) The only extra complexity here is ctf_add_type. Back in the days of the nondeduplicating linker this was called a ridiculous number of times for countless identical copies of structures: eschewing the repeated lookups of the dtd in ctf_add_member_offset and adding the members directly saved an amazing amount of time. Now the nondeduplicating linker is gone, this is extreme overoptimization: we can rip out the direct addition and use ctf_member_next and ctf_add_member_offset, just like ctf_dedup_emit does. We augment a ctf_add_type test to try adding a self-referential struct, the only thing the ctf_add_type part of this change really perturbs. This completes the elimination of dtd_u. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtu_members>: Remove. <dtd_u>: Likewise. (ctf_dmdef_t): Remove. (struct ctf_next) <u.ctn_dmd>: Remove. * ctf-create.c (INITIAL_VLEN): New, more-or-less arbitrary initial vlen size. (ctf_add_enum): Use it. (ctf_dtd_delete): Do not free the (removed) dmd; remove string refs from the vlen on struct deletion. (ctf_add_struct_sized): Populate the vlen: do it by hand if promoting forwards. Always populate the full-size lsizehi/lsizelo members. (ctf_add_union_sized): Likewise. (ctf_add_member_offset): Set up the vlen rather than the dmd. Expand it as needed, repointing string refs via ctf_str_move_pending. Add the member names as pending strings. Always populate the full-size lsizehi/lsizelo members. (membadd): Remove, folding back into... (ctf_add_type_internal): ... here, adding via an ordinary ctf_add_struct_sized and _next iteration rather than doing everything by hand. * ctf-serialize.c (ctf_copy_smembers): Remove this... (ctf_copy_lmembers): ... and this... (ctf_emit_type_sect): ... folding into here. Figure out if a ctf_stype_t is needed here, not in ctf_add__sized. (ctf_type_sect_size): Figure out the ctf_stype_t stuff the same way here. ctf-types.c (ctf_member_next): Remove the dmd path and always use the vlen. Force large-structure usage for dynamic types. (ctf_type_align): Likewise. (ctf_member_info): Likewise. (ctf_type_rvisit): Likewise. * testsuite/libctf-regression/type-add-unnamed-struct-ctf.c: Add a self-referential type to this test. * testsuite/libctf-regression/type-add-unnamed-struct.c: Adjusted accordingly. * testsuite/libctf-regression/type-add-unnamed-struct.lk: Likewise.	2021-03-18 12:40:40 +00:00
Nick Alcock	77d724a7ec	libctf: eliminate dtd_u, part 4: enums This is the first tricky one, the first complex multi-entry vlen containing strings. To handle this in vlen form, we have to handle pending refs moving around on realloc. We grow vlen regions using a new ctf_grow_vlen function, and iterate through the existing enums every time a grow happens, telling the string machinery the distance between the old and new vlen region and letting it adjust the pending refs accordingly. (This avoids traversing all outstanding refs to find the refs that need adjusting, at the cost of having to traverse one enum: an obvious major performance win.) Addition of enums themselves (and also structs/unions later) is a bit trickier than earlier forms, because the type might be being promoted from a forward, and forwards have no vlen: so we have to spot that and create it if needed. Serialization of enums simplifies down to just telling the string machinery about the string refs; all the enum type-lookup code loses all its dynamic member lookup complexity entirely. A new test is added that iterates over (and gets values of) an enum with enough members to force a round of vlen growth. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_vlen_alloc>: New. (ctf_str_move_pending): Declare. * ctf-string.c (ctf_str_add_ref_internal): Fix error return. (ctf_str_move_pending): New. * ctf-create.c (ctf_grow_vlen): New. (ctf_dtd_delete): Zero out the vlen_alloc after free. Free the vlen later: iterate over it and free enum name refs first. (ctf_add_generic): Populate dtd_vlen_alloc from vlen. (ctf_add_enum): populate the vlen; do it by hand if promoting forwards. (ctf_add_enumerator): Set up the vlen rather than the dmd. Expand it as needed, repointing string refs via ctf_str_move_pending. Add the enumerand names as pending strings. * ctf-serialize.c (ctf_copy_emembers): Remove. (ctf_emit_type_sect): Copy the vlen into place and ref the strings. * ctf-types.c (ctf_enum_next): The dynamic portion now uses the same code as the non-dynamic. (ctf_enum_name): Likewise. (ctf_enum_value): Likewise. * testsuite/libctf-lookup/enum-many-ctf.c: New test. * testsuite/libctf-lookup/enum-many.lk: New test.	2021-03-18 12:40:40 +00:00
Nick Alcock	986e9e3aa0	libctf: do not corrupt strings across ctf_serialize The preceding change revealed a new bug: the string table is sorted for better compression, so repeated serialization with type (or member) additions in the middle can move strings around. But every serialization flushes the set of refs (the memory locations that are automatically updated with a final string offset when the strtab is updated), so if we are not to have string offsets go stale, we must do all ref additions within the serialization code (which walks the complete set of types and symbols anyway). Unfortunately, we were adding one ref in another place: the type name in the dynamic type definitions, which has a ref added to it by ctf_add_generic. So adding a type, serializing (via, say, one of the ctf_write functions), adding another type with a name that sorts earlier, and serializing again will corrupt the name of the first type because it no longer had a ref pointing to its dtd entry's name when its string offset was shifted later in the strtab to mae way for the other type. To ensure that we don't miss strings, we also maintain a set of pending refs that will be added later (during serialization), and remove entries from that set when the ref is finally added. We always use ctf_str_add_pending outside ctf-serialize.c, ensure that ctf_serialize adds all strtab offsets as refs (even those in the dtds) on every serialization, and mandate that no refs are live on entry to ctf_serialize and that all pending refs are gone before strtab finalization. (Of necessity ctf_serialize has to traverse all strtab offsets in the dtds in order to serialize them, so adding them as refs at the same time is easy.) (Note that we still can't erase unused atoms when we roll back, though we can erase unused refs: members and enums are still not removed by rollbacks and might reference strings added after the snapshot.) libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-hash.c (ctf_dynset_elements): New. * ctf-impl.h (ctf_dynset_elements): Declare it. (ctf_str_add_pending): Likewise. (ctf_dict_t) <ctf_str_pending_ref>: New, set of refs that must be added during serialization. * ctf-string.c (ctf_str_create_atoms): Initialize it. (CTF_STR_ADD_REF): New flag. (CTF_STR_MAKE_PROVISIONAL): Likewise. (CTF_STR_PENDING_REF): Likewise. (ctf_str_add_ref_internal): Take a flags word rather than int params. Populate, and clear out, ctf_str_pending_ref. (ctf_str_add): Adjust accordingly. (ctf_str_add_external): Likewise. (ctf_str_add_pending): New. (ctf_str_remove_ref): Also remove the potential ref if it is a pending ref. * ctf-serialize.c (ctf_serialize): Prohibit addition of strings with ctf_str_add_ref before serialization. Ensure that the ctf_str_pending_ref set is empty before strtab finalization. (ctf_emit_type_sect): Add a ref to the ctt_name. * ctf-create.c (ctf_add_generic): Add the ctt_name as a pending ref. * testsuite/libctf-writable/reserialize-strtab-corruption.*: New test.	2021-03-18 12:40:40 +00:00
Nick Alcock	2a05d50e90	libctf: don't lose track of all valid types upon serialization One pattern which is rarely done in libctf but which is meant to work is this: ctf_create(); ctf_add_(); // add stuff ctf_type_() // look stuff up ctf_write_(); ctf_add_(); // should still work ctf_type_() // so should this ctf_write_(); // and this i.e., writing out a dict should not break it and you should be able to do everything you could do with it before, including writing it out again. Unfortunately this has been broken for a while because the field which indicates the maximum valid type ID was not preserved across serialization: so type additions after serialization would overwrite types (obviously disastrous) and type lookups would just fail. Fix trivial. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-serialize.c (ctf_serialize): Preserve ctf_typemax across serialization.	2021-03-18 12:40:40 +00:00
Nick Alcock	81982d20fa	libctf: eliminate dtd_u, part 3: functions One more member vanishes from the dtd_u, leaving only the member for struct/union/enum members. There's not much to do here, since as of commit `afd78bd6f0` we use the same representation (type sizes, etc) in the dtu_argv as we will use in the final vlen, with one exception: the vlen has alignment padding, and the dtu_argv did not. Simplify things by adding suitable padding in both cases. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_argv>: Remove. * ctf-create.c (ctf_dtd_delete): No longer free it. (ctf_add_function): Use the dtd_vlen, not dtu_argv. Properly align. * ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen. * ctf-types.c (ctf_func_type_info): Just use the vlen. (ctf_func_type_args): Likewise.	2021-03-18 12:40:40 +00:00
Nick Alcock	534444b1ee	libctf: eliminate dtd_u, part 2: arrays This is even simpler than ints, floats and slices, with the only extra complication being the need to manually transfer the array parameter in the rarely-used function ctf_set_array. (Arrays are unique in libctf in that they can be modified post facto, not just created and appended to. I'm not sure why they got this exemption, but it's easy to maintain.) libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_arr>: Remove. * ctf-create.c (ctf_add_array): Use the dtd_vlen, not dtu_arr. (ctf_set_array): Likewise. * ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen. * ctf-types.c (ctf_array_info): Just use the vlen.	2021-03-18 12:40:40 +00:00
Nick Alcock	7879dd88ef	libctf: eliminate dtd_u, part 1: int/float/slice This series eliminates a lot of special-case code to handle dynamic types (types added to writable dicts and not yet serialized). Historically, when such types have variable-length data in their final CTF representations, libctf has always worked by adding such types to a special union (ctf_dtdef_t.dtd_u) in the dynamic type definition structure, then picking the members out of this structure at serialization time and packing them into their final form. This has the advantage that the ctf_add_* code doesn't need to know anything about the final CTF representation, but the significant disadvantage that all code that looks up types in any way needs two code paths, one for dynamic types, one for all others. Historically libctf "handled" this by not supporting most type lookups on dynamic types at all until ctf_update was called to do a complete reserialization of the entire dict (it didn't emit an error, it just emitted wrong results). Since commit `676c3ecbad`, which eliminated ctf_update in favour of the internal-only ctf_serialize function, all the type-lookup paths grew an extra branch to handle dynamic types. We can eliminate this branch again by dropping the dtd_u stuff and simply writing out the vlen in (close to) its final form at ctf_add_* time: type lookup for types using this approach is then identical for types in writable dicts and types that are in read-only ones, and serialization is also simplified (we just need to write out the vlen we already created). The only complexity lies in type kinds for which multiple vlen representations are valid depending on properties of the type, e.g. structures. But we can start simple, adjusting ints, floats, and slices to work this way, and leaving everything else as is. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_enc>: Remove. <dtd_u.dtu_slice>: Likewise. <dtd_vlen>: New. * ctf-create.c (ctf_add_generic): Perhaps allocate it. All callers adjusted. (ctf_dtd_delete): Free it. (ctf_add_slice): Use the dtd_vlen, not dtu_enc. (ctf_add_encoded): Likewise. Assert that this must be an int or float. * ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen. * ctf-dedup.c (ctf_dedup_rhash_type): Use the dtd_vlen, not dtu_slice. * ctf-types.c (ctf_type_reference): Likewise. (ctf_type_encoding): Remove most dynamic-type-specific code: just get the vlen from the right place. Report failure to look up the underlying type's encoding.	2021-03-18 12:40:36 +00:00
Nick Alcock	b9a964318a	libctf: split up ctf_serialize ctf_serialize and its various pieces may be split out into a separate file now, but ctf_serialize is still far too long and disordered, mixing header initialization, sizing of multiple CTF sections, sorting and emission of multiple CTF sections, strtab construction and ctf_dict_t copying into a single ugly organically-grown mess. Fix the worst of this by migrating all section sizing and emission into separate functions, two per section (or class of section in the case of the symtypetabs). Only the variable section is now sized and emitted directly in ctf_serialize (because it only takes about three lines to do so). The section sizes themselves are still maintained by ctf_serialize so that it can work out the header offsets, but ctf_symtypetab_sect_sizes and ctf_emit_symtypetab_sects share a lot of extra state: migrate that into a shared structure, emit_symtypetab_state_t. (Test results unchanged.) libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-serialize.c: General reshuffling, and... (emit_symtypetab_state_t): New, migrated from local variables in ctf_serialize. (ctf_serialize): Split out most section sizing and emission. (ctf_symtypetab_sect_sizes): New (split out). (ctf_emit_symtypetab_sects): Likewise. (ctf_type_sect_size): Likewise. (ctf_emit_type_sect): Likewise.	2021-03-18 12:37:55 +00:00
Nick Alcock	bf4c3185a5	libctf: split serialization and file writeout into its own file The code to serialize CTF dicts just gets bigger and bigger as the dictionary's complexity grows: adding symtypetabs almost doubled it on its own. It's long past time to split this out into its own source file, accompanied by the functions that do the actual writeout. This leaves ctf-create.c populated exclusively by functions related to actual writable dict creation (ctf_add_, ctf_create etc), and leaves both files a much more reasonable size. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> ctf-create.c (symtypetab_delete_nonstatic_vars): Move into ctf-serialize.c. (ctf_symtab_skippable): Likewise. (CTF_SYMTYPETAB_EMIT_FUNCTION): Likewise. (CTF_SYMTYPETAB_EMIT_PAD): Likewise. (CTF_SYMTYPETAB_FORCE_INDEXED): Likewise. (symtypetab_density): Likewise. (emit_symtypetab): Likewise. (emit_symtypetab_index): Likewise. (ctf_copy_smembers): Likewise. (ctf_copy_lmembers): Likewise. (ctf_copy_emembers): Likewise. (ctf_sort_var): Likewise. (ctf_serialize): Likewise. (ctf_gzwrite): Likewise. (ctf_compress_write): Likewise. (ctf_write_mem): Likewise. (ctf_write): Likewise. * ctf-serialize.c: New file. * Makefile.am (libctf_nobfd_la_SOURCES): Add it. * Makefile.in: Regenerate.	2021-03-18 12:37:53 +00:00

40 Commits