binutils-gdb

Author	SHA1	Message	Date
Nick Alcock	beccf36b88	libctf: move string deduplication into ctf-archive This means that any archive containing dicts can get its strings dedupped together, rather than only those that are ctf_linked. (For now, we are still constrained to ctf_linked archives, since fixing that requires further changes to ctf_dedup_strings: but this gives us the first half of what is necessary.) libctf/ * ctf-link.c (ctf_link_write): Move string dedup into... * ctf-archive.c (ctf_arc_preserialize): ... this new function. (ctf_arc_write_fd): Call it.	2025-02-28 15:13:24 +00:00
Nick Alcock	a480362d88	libctf: string: refs rework This commit moves provisional (not-yet-serialized) string refs towards the scheme to be used for CTF IDs in the future. In particular - provisional string offsets now count downwards from just under the external string offset space (all bits on but the high bit). This makes it possible to detect an overflowing strtab, and also makes it trivial to determine whether any string offset (ref) updates were missed -- where before we might get a slightly corrupted or incorrect string, we now get a huge high strtab offset corresponding to no string, and an error is emitted at read time. - refs are emitted at serialization time during the pass through the types. They are strictly associated with the newly-written-out buffer: the existing opened CTF dict is not changed, though it does still get the new strtab so that new refs to the same string can just refer directly to it. The provisional strtab hash table that contains these strings is not deleted after serialization (because we might serialize again): instead, we keep track in the parent of the lowest-yet-used ("latest") provisional strtab offset, and any strtab offset above that, but not external (high-bit-on) is considered provisional. This is sort-of-enforced by moving most of the ref-addition function declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is not included by ctf-create.c or ctf-open.c. - because we don't add refs when adding types, we don't need to handle the case where we add things to expanding vlens (enums, struct members) and have to realloc() them. So the entire painful movable refs system can just be deleted, along with the ability to remove refs piecemeal at all (purging all of them is still possible). Strings added during type addition are added via ctf_str_add(), which adds no refs: the strings are picked up at serialization time and refs to their final, serialized resting place added. The DTDs never have any refs in them, and their provisional strtab offsets are never updated by the ref system. This caused several bugs to fall out of the earlier work and get fixed. In particular, attempts to look up a string in a child dict now search the parent's provisional strtab too: we add some extra special casing for the null string so we don't need to worry about deduplication moving it somewhere other than offset zero. Finally, the optimization that removes an unreferenced synthetic external strtab (the record of the strings the linker has told us about, kept around internally for lookup during late serialization) is faulty: references to a strtab entry will only produce CTF-level refs if their value might change, and an external string's offset won't change, so it produces no refs: worse yet, even if we did get a ref (say, if the string was originally believed to be internal and only later were we told that the linker knew about it too), when we serialize a strtab, all its refs are dropped (since they've been updated and can no longer change); so if we serialized it a second time, its synthetic external strtab would be considered empty and dropped, even though the same external strings as before still exist, referencing it. We must keep the synthetic external strtab around as long as external strings exist that reference it, i.e. for the life of the dict. One benefit of all this: now we're emitting provisional string offsets at a really high value, it's out of the way of the consecutive, deduplicated string offsets in child dicts. So we can drop the constraint that you cannot add strings to a dict with children, which allows us to add types freely to parent dicts again. What you can't do is write that dict out again: when we serialize, we currently update the dict being serialized with the updated strtabs: when you write a dict out, its provisional strings become real strings, and suddenly the offsets would overlap once more. But opening a dict and its children, adding to it, and then writing it out again is rare indeed, and we have a workaround: anyone wanting to do this can just use ctf_link instead.	2025-02-28 15:13:24 +00:00
Nick Alcock	97a72b2a35	libctf: create: fix vlen / vbytes confusion The initial_vlen parameter to ctf_add_generic is misnamed: it's not the initial vlen (the initial number of members of a struct, etc), but rather the initial size of the vlen region. We have a term for that, vbytes: use it. Amazingly this doesn't seem to have caused any bugs to creep in.	2025-02-28 15:13:24 +00:00
Nick Alcock	dc93d01ff2	libctf: de-macroize LCTF_TYPE_TO_INDEX / LCTF_INDEX_TO_TYPE Making these functions is unnecessary right now, but will become much clearer shortly. While we're at it, we can drop the third child argument to LCTF_INDEX_TO_TYPE: it's only used for nontrivial purposes that aren't literally the same as getting the result from the fp in one place, in ctf_lookup_by_name_internal, and that place is easily fixed by just looking in the right dictionary in the first place.	2025-02-28 15:13:24 +00:00
Nick Alcock	b875301e74	libctf: drop LCTF_TYPE_ISPARENT/LCTF_TYPE_ISCHILD Parent/child determination is about to become rather more complex, making a macro impractical. Use the ctf_type_isparent/ischild function calls everywhere and remove the macro. Make them more const-correct too, to make them more widely usable. While we're about it, change several places that hand-implemented ctf_get_dict() to call it instead, and armour several functions against the null returns that were always possible in this case (but previously unprotected-against).	2025-02-28 15:13:24 +00:00
Nick Alcock	9835747b21	libctf: generalize the ref system Despite the removal of the separate movable ref list, the ref system as a whole is more than complex enough to be worth generalizing now that we are adding different kinds of ref. Refs now are lists of uint32_t * which can be updated through the pointer for all entries in the list and moved to new sites for all pointers in a given range: they are no longer references to string offsets in particular and can be references to other uint32_t-sized things instead (note that ctf_id_t is a typedef to a uint32_t). ctf-string.c has been adjusted accordingly (the adjustments are tiny, more or less just turning a bunch of references to atom into &atom->csa_refs).	2025-02-28 15:13:24 +00:00
Nick Alcock	21f748e1e3	libctf, string: delete separate movable ref storage again This was added last year to let us maintain a backpointer to the movable refs dynhash in movable ref atoms without spending space for the backpointer on the majority of (non-movable) refs and also without causing an atom which had some refs movable and some refs not movable to dereference unallocated storage when freed. The backpointer's only purpose was to let us locate the ctf_str_movable_refs dynhash during item freeing, when we had nothing but a pointer to the atom being freed. Now we have a proper freeing arg, we don't need the backpointer at all: we can just pass a pointer to the dict in to the atoms dynhash as a freeing arg for the atom freeing functions, and throw the whole backpointer and separate movable ref list complexity away.	2025-02-28 15:13:23 +00:00
Nick Alcock	6a6a3cc9c2	libctf, hash: add support for freeing functions taking an arg There are a bunch of places in libctf where the code is complicated by the fact that freeing a hash key or value requires access to the dict: more generally, they want an arg pointer to something. But for the sake of being able to use free() as a freeing function, we can't do this at all times. We also don't want to bloat up the hash itself with an arg value unless necessary (in the same way we already avoid storing the key or value freeing functions unless at least one of them is specified). So from the outside this change is simple: add a new ctf_dynhash_create_arg which takes a new sort of freeing function which takes an argument. Internally, we store the arg only when the key or owner is set, and cast from the one freeing function to the other iff the arg is non-NULL. This means it's impossible to pass a value that may or may not be NULL to the freeing function, but that's harmless for all current uses, and allows significant simplifications elsewhere.	2025-02-28 15:13:23 +00:00
Nick Alcock	4d2d5afa60	libctf: actually deduplicate the strtab This commit finally implements strtab deduplication, putting together all the pieces assembled in the earlier commits. The magic is entirely localized to ctf_link_write, which preserializes all the dicts (parent first), and calls ctf_dedup_strings on the parent. (The error paths get tweaked a bit too.) Calling ctf_dedup_strings has implications elsewhere: the lifetime rules for the inputs versus outputs change a bit now that the child output dicts contain references to the parent dict's atoms table. We also pre-purge movable refs from all the deduplicated strings before freeing any of this because movable refs contain backreferences into the dict they came from, which means the parent contains references to all the children! Purging the refs first makes those references go away so we can free the children without creating any wild pointers, even temporarily. There's a new testcase that identifies a regression whereby offset 0 (the null string) and index 0 (in children now often the parent dict name, ".ctf") got mixed up, leading to anonymous structs and unions getting the not entirely C-valid name ".ctf" instead. May other testcases get adjusted to no longer depend on the precise layout of the strtab. TODO: add new tests to verify that strings are actually being deduplicated. libctf/ * ctf-link.c (ctf_link_write): Deduplicate strings. * ctf-open.c (ctf_dict_close): Free refs, then the link outputs, then the out cu_mapping, then the inputs, in that order. * ctf-string.c (ctf_str_purge_refs): Not static any more. * ctf-impl.h: Declare it. ld/ * testsuite/ld-ctf/conflicting-cycle-2.A-1.d: Don't depend on strtab contents. * testsuite/ld-ctf/conflicting-cycle-2.A-2.d: Likewise. * testsuite/ld-ctf/conflicting-cycle-2.parent.d: Likewise. * testsuite/ld-ctf/conflicting-cycle-3.C-1.d: Likewise. * testsuite/ld-ctf/conflicting-cycle-3.C-2.d: Likewise. * testsuite/ld-ctf/anonymous-conflicts*: New test.	2025-02-28 14:47:24 +00:00
Nick Alcock	9daceda796	libctf: dedup: add strtab deduplicator This is a pretty simple two-phase process (count duplicates that are actually going to end up in the strtab and aren't e.g. strings without refs, strings with external refs etc, and move them into the parent) with one wrinkle: we sorta-abuse the csa_external_offset field in the deduplicated child atom (normally used to indicate that this string is located in the ELF strtab) to indicate that this atom is in the parent. If you think of "external" as meaning simply "is in some other strtab, we don't care which one", this still makes enough sense to not need to change the name, I hope. This is still not called from anywhere, so strings are (still!) not deduplicated, and none of the dedup machinery added in earlier commits does anything yet. libctf/ * ctf-dedup.c (ctf_dedup_emit_struct_members): Note that strtab dedup happens (well) after struct member emission. (ctf_dedup_strings): New. * ctf-impl.h (ctf_dedup_strings): Declare.	2025-02-28 14:47:24 +00:00
Nick Alcock	3bec4f1f3c	include, libctf: string lookup and writeout of a parent-shared strtab The next stage of strtab sharing is actual lookup of strings in such strtabs, interning of strings in such strtabs and writing out of such strtabs (but not actually figuring out which strings should be shared: that's next). We introduce several new internal ctf_str_* API functions to augment the existing rather large set: ctf_str_add_copy, which adds a string and always makes a copy of it (used when deduplicating to stop dedupped strings holding permanent references on the input dicts), and ctf_str_no_dedup_ref (which adds a ref to a string while preventing it from ever being deduplicated, used for header fields like the parent name, which is the same for almost all child dicts but had still better not be stored in the parent!). ctf_strraw_explicit, the ultimate underlying "look up a string" function that backs ctf_strptr et al, gains the ability to automatically find strings in the parent if the offset is < cth_parent_strlen, and generally make all offsets parent-relative (so something at offset 1 in the child strlen will need to be looked up at offset 257 if cth_parent_strlen is 256). This suffices to paste together the parent and child from the perspective of lookup. We do quite a lot of new checks in here, simply because it's called all over the place and it's preferable to emit a nice error into the ctf_err_warning stream if things go wrong. Among other things this traps cases where you accidentally added a string to the parent, throwing off all the offsets. Completely invalid offsets also now add a message to the err_warning stream. Insertion of new atoms (the deduplicated entities underlying strings in a given dict), already a flag-heavy operation, gains more flags, corresponding to the new ctf_str_add_copy and ctf_str_no_dedup_ref functions: atom addition also checks the ctf_max_children set by ctf_import and prevents addition of new atoms to any dicts with ctf_imported children and an already-serialized strtab. strtab writeout gains more checks as well: you can't write out a strtab for a child dict whose parent hasn't been serialized yet (and thus doesn't have a serialized strtab itself); you can't write it out if the child already depended on a shared parent strtab and that strtab has changed length. The null atom at offset 0 is only written to the parent strtab; and ref updating changes to look up offsets in the parent's atoms table iff a new CTF_STR_ATOM_IN_PARENT flag is set on the atom (this will be set by deduplication to ensure that serializing a dict will update all its refs properly even though a bunch of them have moved to the parent dict). None of this actually has any effect yet because no string deduplication is being carried out, and the cth_parent_strlen is still locked at 0. include/ * ctf-api.h (_CTF_ERRORS) [ECTF_NOTSERIALIZED]: New. (ECTF_NERR): Updated. libctf/ * ctf-impl.h (CTF_STR_ATOM_IN_PARENT): New. (CTF_STR_ATOM_NO_DEDUP): Likewise. (ctf_str_add_no_dedup_ref): New. (ctf_str_add_copy): New. * ctf-string.c (ctf_strraw_explicit): Look in parents if necessary: use parent-relative offsets. (ctf_strptr_validate): Avoid duplicating errors. (ctf_str_create_atoms): Update comment. (CTF_STR_COPY): New. (CTF_STR_NO_DEDUP): Likewise. (ctf_str_add_ref_internal): Use them, setting the corresponding csa_flags, prohibiting addition to serialized parents, and copying strings if so requested. (ctf_str_add): Turn into a wrapper around... (ctf_str_add_flagged): ... this new function. The offset is now parent-relative. (ctf_str_add_ref): Likewise. (ctf_str_add_movable_ref): Likewise. (ctf_str_add_copy): New. (ctf_str_add_no_dedup_ref): New. (ctf_str_write_strtab): Prohibit writes when the parent has changed length or is not serialized. Only write the null atom to parent strtabs. Chase refs to the parent if necessary.	2025-02-28 14:47:24 +00:00
Nick Alcock	a14fb397b2	libctf: tear opening and serialization in two The next stage in sharing the strtab involves tearing two core parts of libctf into two pieces. Large parts of init_static_types, called at open time, involve traversing the types table and initializing the hashtabs used by the type name lookup functions and the enumerator conflicting checks. If the string table is partly located in the parent dict, this is obviously not going to work: so split out that code into a new init_static_types_names function (which also means moving the wrapper around init_static_types that was used to simplify the enumerator code into being a wrapper around init_static_types_names instead) and call that from init_static_types (for parent dicts, and < v4 dicts), and from ctf_import (for v4 dicts). At the same time as doing this we arrange to set LCTF_NO_STR (recently introduced) iff this is a v4 child dict with a nonzero cth_parent_strlen: this then blocks more or less everything that involves string operations until a ctf_import has actually imported the strtab it depends on. (No string oeprations that actually use this have been introduced yet, but since no string deduplication is happening yet either this is harmless.) For v4 dicts, at import time we also validate that the cth_parent_strlen has the same value as the parent's strlen (zero is also a valid value, indicating a non-shared strtab, as is commonplace in older dicts, dicts emitted by the compiler, parent dicts etc). This makes ctf_import more complex, so we simplify things again by dropping all the repeated code in the obscure used-only-by-ctf_link ctf_import_unref and turning both into wrappers around an internal function. We prohibit repeated ctf_imports (except of NULL or the same dict repeatedly), and set up some new fields which will be used later to prevent people from adding strings to parent dicts with pre-existing serialized strtabs once they have children imported into them (which would change their string length and corrupt all those strtabs). Serialization also needs to be torn in two. The problem here is that currently serialization does too much: it emits everything including the strtab, does things that depend on the strtab being finalized (notably variable table sorting), and then writes it out. Much of this emission itself involves strtab writes, so the strtab is not actually complete until halfway through ctf_serialize. But when deduplicating, we want to use machinery in ctf-link and ctf-dedup to deduplicate the strtab after it is complete, and only then write it out. We could do this via having ctf_serialize call some sort of horrible callback, but it seems much simpler to just cut ctf_serialize in two, and introduce a new ctf_preserialize which can optionally be called to do all this "everything but the strtab" work. (If it's not called, ctf_serialize calls it itself.) This means pulling some internal variables out of ctf_serialize into the ctf_dict_t, and slightly abusing LCTF_NO_STR to mean (in addition to its "no, you can't do much between opening a child dict and importing its parent" semantics), "no, you can't do much between calling ctf_preserialize and ctf_serialize". The requirements of both are not quite identical -- you definitely can do things that involve string lookups after ctf_preserialize -- but it serves to stop callers from accidentally adding more types after the types table has been written out, and that's good enough. ctf_preserialize isn't public API anyway. libctf/ * ctf-impl.h (struct ctf_dict) [ctf_serializing_buf]: New. [ctf_serializing_buf_size]: Likewise. [ctf_serializing_vars]: Likewise. [ctf_serializing_nvars]: Likewise. [ctf_max_children]: Likewise. (LCTF_PRESERIALIZED): New. (ctf_preserialize): New. (ctf_depreserialize): New. * ctf-open.c (init_static_types): Rename to... (init_static_types_names): ... this, wrapping a different function. (init_static_types_internal): Rename to... (init_static_types): ... this, and set LCTF_NO_STR if neecessary. Tear out the name-lookup guts into... (init_static_types_names_internal): ... this new function. Fix a few comment typos. (ctf_bufopen): Emphasise that you cannot rely on looking up strings at any point in ctf_bufopen any more. (ctf_dict_close): Free ctf_serializing_buf. (ctf_import): Turn into a wrapper, calling... (ctf_import_internal): ... this. Prohibit repeated ctf_imports of different parent dicts, or "unimporting" by setting it back to NULL again. Validate the parent we do import using cth_parent_strlen. Call init_static_types_names if the strtab is shared with the parent. (ctf_import_unref): Turn into a wrapper. * ctf-serialize.c (ctf_serialize): Split out everything before strtab serialization into... (ctf_preserialize): ... this new function. (ctf_depreserialize): New, undo preserialization on error.	2025-02-28 14:47:24 +00:00
Nick Alcock	70d05ab0b2	libctf: add mechanism to prohibit most operations without a strtab We are about to add machinery that deduplicates a child dict's strtab against its parent. Obviously if you open such a dict but do not import its parent, all strtab lookups must fail: so add an LCTF_NO_STR flag that is set in that window and make most operations fail if it's not set. (Two more that will be set in future commits are serialization and string lookup itself.) Notably, not all symbol lookup is impossible in this window: you can still look up by symbol index, as long as this dict is not using an indexed strtypetab (which obviously requires string lookups to get the symbol name). include/ * ctf-api.h (_CTF_ERRORS) [ECTF_HASPARENT]: New. [ECTF_WRONGPARENT]: Likewise. (ECTF_NERR): Update. Update comments to note the new limitations on ctf_import et al. libctf/ * ctf-impl.h (LCTF_NO_STR): New. * ctf-create.c (ctf_rollback): Error out when LCTF_NO_STR. (ctf_add_generic): Likewise. (ctf_add_struct_sized): Likewise. (ctf_add_union_sized): Likewise. (ctf_add_enum): Likewise. (ctf_add_forward): Likewise. (ctf_add_unknown): Likewise. (ctf_add_enumerator): Likewise. (ctf_add_member_offset): Likewise. (ctf_add_variable): Likewise. (ctf_add_funcobjt_sym_forced): Likewise. (ctf_add_type): Likewise (on either dict). * ctf-dump.c (ctf_dump): Likewise. * ctf-lookup.c (ctf_lookup_by_name): Likewise. (ctf_lookup_variable): Likewise. Likewise. (ctf_lookup_enumerator): Likewise. (ctf_lookup_enumerator_next): Likewise. (ctf_symbol_next): Likewise. (ctf_lookup_by_sym_or_name): Likewise, if doing indexed lookups. * ctf-types.c (ctf_member_next): Likewise. (ctf_enum_next): Likewise. (ctf_type_aname): Likewise. (ctf_type_name_raw): Likewise. (ctf_type_compat): Likewise, for either dict. (ctf_member_info): Likewise. (ctf_enum_name): Likewise. (ctf_enum_value): Likewise. (ctf_type_rvisit): Likewise. (ctf_variable_next): Note that we don't need to test LCTF_NO_STR.	2025-02-28 14:47:24 +00:00
Nick Alcock	9a74ab12c8	include, libctf: start work on libctf v4 This format is a superset of BTF, but for now we just do the minimum to declare a new file format version, without actually introducing any format changes. From now on, we refuse to reserialize CTFv1 dicts: these have a distinct parent/child boundary which obviously cannot change upon reserialization (that would change the type IDs): instead, we encoded this by stuffing in a unique CTF version for such dicts. We can't do that now we have one version for all CTFv4 dicts, and testing such old dicts is very hard these days anyway, and is not automated: so just drop support for writing them out entirely. (You still can write them out, but you have to do a full-blown ctf_link, which generates an all-new fresh dict and recomputes type IDs as part of deduplication.) To prevent this extremely-not-ready format escaping into the wild, add a new mechanism whereby any format version higher than the new #define CTF_STABLE_VERSION cannot be serialized unless I_KNOW_LIBCTF_IS_UNSTABLE is set in the environment. include/ * ctf-api.h (_CTF_ERRORS) [ECTF_CTFVERS_NO_SERIALIZE]: New. [ECTF_UNSTABLE]: New. (ECTF_NERR): Update. * ctf.h: Small comment improvements.. (ctf_header_v3): New, copy of ctf_header. (CTF_VERSION_4): New. (CTF_VERSION): Now CTF_VERSION_4. (CTF_STABLE_VERSION): Still 4, CTF_VERSION_3. ld/ * testsuite/ld-ctf/.d: Update to CTF_VERSION_4. libctf/ ctf-impl.h (LCTF_NO_SERIALIZE): New. * ctf-dump.c (ctf_dump_header): Add CTF_VERSION_4. * ctf-open.c (ctf_dictops): Likewise. (upgrade_header): Rename to... (upgrade_header_v2): ... this. (upgrade_header_v3): New. (upgrade_types): Support upgrading from CTF_VERSION_3. Turn on LCTF_NO_SERIALIZE for CTFv1. (init_static_types_internal): Upgrade all types tables older than * CTF_VERSION_4. (ctf_bufopen): Support CTF_VERSION_4: error out if we forget to update this switch in future. Add header upgrading from v3 and below. Improve comments slightly. * ctf-serialize.c (ctf_serialize): Block serialization of unstable file formats, and of file formats for which LCTF_NO_SERIALIZE is turned on (v1).	2025-02-28 14:47:24 +00:00
Alan Modra	e8e7cf2abe	Update year range in copyright notice of binutils files	2025-01-01 18:29:57 +10:30
Nick Alcock	6da9267482	libctf, include: add ctf_dict_set_flag: less enum dup checking by default The recent change to detect duplicate enum values and return ECTF_DUPLICATE when found turns out to perturb a great many callers. In particular, the pahole-created kernel BTF has the same problem we historically did, and gleefully emits duplicated enum constants in profusion. Handling the resulting duplicate errors from BTF -> CTF converters reasonably is unreasonably difficult (it amounts to forcing them to skip some types or reimplement the deduplicator). So let's step back a bit. What we care about mostly is that the deduplicator treat enums with conflicting enumeration constants as conflicting types: programs that want to look up enumeration constant -> value mappings using the new APIs to do so might well want the same checks to apply to any ctf_add_* operations they carry out (and since they're using the new APIs, added at the same time as this restriction was imposed, there is likely to be no negative consequence of this). So we want some way to allow processes that know about duplicate detection to opt into it, while allowing everyone else to stay clear of it: but we want ctf_link to get this behaviour even if its caller has opted out. So add a new concept to the API: dict-wide CTF flags, set via ctf_dict_set_flag, obtained via ctf_dict_get_flag. They are not bitflags but simple arbitrary integers and an on/off value, stored in an unspecified manner (the one current flag, we translate into an LCTF_* flag value in the internal ctf_dict ctf_flags word). If you pass in an invalid flag or value you get a new ECTF_BADFLAG error, so the caller can easily tell whether flags added in future are valid with a particular libctf or not. We check this flag in ctf_add_enumerator, and set it around the link (including on child per-CU dicts). The newish enumerator-iteration test is souped up to check the semantics of the flag as well. The fact that the flag can be set and unset at any time has curious consequences. You can unset the flag, insert a pile of duplicates, then set it and expect the new duplicates to be detected, not only by ctf_add_enumerator but also by ctf_lookup_enumerator. This means we now have to maintain the ctf_names and conflicting_enums enum-duplication tracking as new enums are added, not purely as the dict is opened. Move that code out of init_static_types_internal and into a new ctf_track_enumerator function that addition can also call. (None of this affects the file format or serialization machinery, which has to be able to handle duplicate enumeration constants no matter what.) include/ * ctf-api.h (CTF_ERRORS) [ECTF_BADFLAG]: New. (ECTF_NERR): Update. (CTF_STRICT_NO_DUP_ENUMERATORS): New flag. (ctf_dict_set_flag): New function. (ctf_dict_get_flag): Likewise. libctf/ * ctf-impl.h (LCTF_STRICT_NO_DUP_ENUMERATORS): New flag. (ctf_track_enumerator): Declare. * ctf-dedup.c (ctf_dedup_emit_type): Set it. * ctf-link.c (ctf_create_per_cu): Likewise. (ctf_link_deduplicating_per_cu): Likewise. (ctf_link): Likewise. (ctf_link_write): Likewise. * ctf-subr.c (ctf_dict_set_flag): New function. (ctf_dict_get_flag): New function. * ctf-open.c (init_static_types_internal): Move enum tracking to... * ctf-create.c (ctf_track_enumerator): ... this new function. (ctf_add_enumerator): Call it. * libctf.ver: Add the new functions. * testsuite/libctf-lookup/enumerator-iteration.c: Test them.	2024-07-31 21:02:05 +01:00
Nick Alcock	36c771b179	libctf: fix CTF dict compression Commit `483546ce4f` ("libctf: make ctf_serialize() actually serialize") accidentally broke dict compression. There were two bugs: - ctf_arc_write_one_ctf was still making its own decision about whether to compress the dict via direct ctf_size comparison, which is unfortunate because now that it no longer calls ctf_serialize itself, ctf_size is always zero when it does this: it should let the writing functions decide on the threshold, which they contain code to do which is simply not used for lack of one trivial wrapper to write to an fd and also provide a compression threshold - ctf_write_mem, the function underlying all writing as of the commit above, was calling zlib's compressBound and avoiding compression if this returned a value larger than the input. Unfortunately compressBound does not do a trial compression and determine whether the result is compressible: it just adds zlib header sizes to the value passed in, so our test would always have concluded that the value was incompressible! Avoid by simply always compressing if the raw size is larger than the threshold: zlib is quite clever enough to avoid actually compressing if the data is incompressible. Add a testcase for this. libctf/ * ctf-impl.h (ctf_write_thresholded): New... * ctf-serialize.c (ctf_write_thresholded): ... defined here, a wrapper around... (ctf_write_mem): ... this. Don't check compressibility. (ctf_compress_write): Reimplement as a ctf_write_thresholded wrapper. (ctf_write): Likewise. * ctf-archive.c (arc_write_one_ctf): Just call ctf_write_thresholded rather than trying to work out whether to compress. * testsuite/libctf-writable/ctf-compressed.*: New test.	2024-07-31 21:02:05 +01:00
Nick Alcock	b404bf7270	libctf, string: split the movable refs out of the ref list In commit `149ce5c263` we introduced the concept of "movable" refs, which are refs that can be moved in batches, to let us maintain valid ref lists even when adding refs to blocks of memory that can be realloced (which is any type containing a vlen which can expand, like names contained within enum or struct members). Movable refs need a backpointer to the movable refs dynhash for this dict; since non-movable refs are very common, we tried to save memory by having a slightly bigger struct for moveable refs with a backpointer in it, and casting appropriately, indicating which sort of ref we were dealing with via a flag on the atom. Unfortunately this doesn't work reliably, because you can perfectly well have a string ("foo", say) which has both non-movable refs (say, an external symbol and a variable name) and movable refs (say, a structure member name) to the same atom. Indicate which struct we're dealing with with an atom flag and suddenly you're casting a ctf_str_atom_ref to a ctf_str_atom_ref_movable (which is bigger) and dereferencing random memory off the end of it and interpreting it as a backpointer to the movable refs dynhash. This is unlikely to work well. So bite the bullet and split refs into two separate lists, one for movable refs, one for immovable refs. It means some annoying code duplication, but there's not very much of it, and it means we can keep the movable refs hashtab (which in turn means we don't have to do linear searches to find all relevant refs when moving refs, which in turn means that structure/union/enum member additions remain amortized O(n) time, not O(n^2). Callers can now purge movable and non-movable refs independently of each other. We don't use this yet, but a use is coming. libctf/ * ctf-impl.h (CTF_STR_ATOM_MOVABLE): Delete. (struct ctf_str_atom) [csa_movable_refs]: New. (struct ctf_dict): Adjust comment. (ctf_str_purge_refs): Add MOVABLE arg. * ctf-string.c (ctf_str_purge_movable_atom_refs): Split out of... (ctf_str_purge_atom_refs): ... this. (ctf_str_free_atom): Call it. (ctf_str_purge_one_atom_refs): Likewise. (aref_create): Adjust accordingly. (ctf_str_move_refs): Likewise. (ctf_str_remove_ref): Remove movable refs too, including deleting the ref from ctf_str_movable_refs. (ctf_str_purge_refs): Add MOVABLE arg. (ctf_str_update_refs): Update movable refs. (ctf_str_write_strtab): Check, and purge, movable refs.	2024-07-31 21:02:04 +01:00
Nick Alcock	68720e03f5	libctf, dedup: drop unnecessary arg from ctf_dedup() The PARENTS arg is carefully passed down through all the layers of hash functions and then never used for anything. (In the distant past it was used for cycle detection, but the algorithm eventually committed doesn't need to do cycle detection...) The PARENTS arg is still used by ctf_dedup_emit(), but even there we can loosen the requirements and state that you can just leave entries corresponding to dicts with no parents at zero (which will be useful in an upcoming commit). libctf/ * ctf-dedup.c (ctf_dedup_hash_type): Drop PARENTS arg. (ctf_dedup_rhash_type): Likewise. (ctf_dedup): Likewise. (ctf_dedup_emit_struct_members): Mention what you can do to PARENTS entries for parent dicts. * ctf-impl.h (ctf_dedup): Adjust accordingly. * ctf-link.c (ctf_link_deduplicating_per_cu): Likewise. (ctf_link_deduplicating): Likewise.	2024-07-31 21:02:04 +01:00
Nick Alcock	2fa4b6e6df	libctf, include: new functions for looking up enumerators Three new functions for looking up the enum type containing a given enumeration constant, and optionally that constant's value. The simplest, ctf_lookup_enumerator, looks up a root-visible enumerator by name in one dict: if the dict contains multiple such constants (which is possible for dicts created by older versions of the libctf deduplicator), ECTF_DUPLICATE is returned. The next simplest, ctf_lookup_enumerator_next, is an iterator which returns all enumerators with a given name in a given dict, whether root-visible or not. The most elaborate, ctf_arc_lookup_enumerator_next, finds all enumerators with a given name across all dicts in an entire CTF archive, whether root-visible or not, starting looking in the shared parent dict; opened dicts are cached (as with all other ctf_arc_lookup functions) so that repeated use does not incur repeated opening costs. All three of these return enumerator values as int64_t: unfortunately, API compatibility concerns prevent us from doing the same with the other older enum-related functions, which all return enumerator constant values as ints. We may be forced to add symbol-versioning compatibility aliases that fix the other functions in due course, bumping the soname for platforms that do not support such things. ctf_arc_lookup_enumerator_next is implemented as a nested ctf_archive_next iterator, and inside that, a nested ctf_lookup_enumerator_next iterator within each dict. To aid in this, add support to ctf_next_t iterators for iterators that are implemented in terms of two simultaneous nested iterators at once. (It has always been possible for callers to use as many nested or semi-overlapping ctf_next_t iterators as they need, which is one of the advantages of this style over the _iter style that calls a function for each thing iterated over: the iterator change here permits ctf_next_t iterators themselves* to be implemented by iterating using multiple other iterators as part of their internal operation, transparently to the caller.) Also add a testcase that tests all these functions (which is fairly easy because ctf_arc_lookup_enumerator_next is implemented in terms of ctf_lookup_enumerator_next) in addition to enumeration addition in ctf_open()ed dicts, ctf_add_enumerator duplicate enumerator addition, and conflicting enumerator constant deduplication. include/ * ctf-api.h (ctf_lookup_enumerator): New. (ctf_lookup_enumerator_next): Likewise. (ctf_arc_lookup_enumerator_next): Likewise. libctf/ * libctf.ver: Add them. * ctf-impl.h (ctf_next_t) <ctn_next_inner>: New. * ctf-util.c (ctf_next_copy): Copy it. (ctf_next_destroy): Destroy it. * ctf-lookup.c (ctf_lookup_enumerator): New. (ctf_lookup_enumerator_next): New. * ctf-archive.c (ctf_arc_lookup_enumerator_next): New. * testsuite/libctf-lookup/enumerator-iteration.: New test. testsuite/libctf-lookup/enum-ctf-2.c: New test CTF, used by the above.	2024-06-18 13:20:32 +01:00
Nick Alcock	4bbc4b1f5c	libctf: make the ctf_next ctn_fp non-const This was always an error, because the ctn_fp routinely has errors set on it, which is not something you can (or should) do to a const object. libctf/ * ctf-impl.h (ctf_next_) <cu.ctn_fp>: Make non-const.	2024-06-18 13:20:32 +01:00
Nick Alcock	6e09d4a6e6	libctf: prohibit addition of enums with overlapping enumerator constants libctf has long prohibited addition of enums with overlapping constants in a single enum, but now that we are properly considering enums with overlapping constants to be conflciting types, we can go further and prohibit addition of enumeration constants to a dict if they already exist in any enum in that dict: the same rules as C itself. We do this in a fashion vaguely similar to what we just did in the deduplicator, by considering enumeration constants as identifiers and adding them to the core type/identifier namespace, ctf_dict_t.ctf_names. This is a little fiddly, because we do not want to prohibit opening of existing dicts into which the deduplicator has stuffed enums with overlapping constants! We just want to prohibit the addition of new enumerators that violate that rule. Even then, it's fine to add overlapping enumerator constants as long as at least one of them is in a non-root type. (This is essential for proper deduplicator operation in cu-mapped mode, where multiple compilation units can be smashed into one dict, with conflicting types marked as hidden: these types may well contain overlapping enumerators.) So, at open time, keep track of all enums observed, then do a third pass through the enums alone, adding each enumerator either to the ctf_names table as a mapping from the enumerator name to the enum it is part of (if not already present), or to a new ctf_conflicting_enums hashtable that tracks observed duplicates. (The latter is not used yet, but will be soon.) (We need to do a third pass because it's quite possible to have an enum containing an enumerator FOO followed by a type FOO: since they're processed in order, the enumerator would be processed before the type, and at that stage it seems nonconflicting. The easiest fix is to run through the enumerators after all type names are interned.) At ctf_add_enumerator time, if the enumerator to which we are adding a type is root-visible, check for an already-present name and error out if found, then intern the new name in the ctf_names table as is done at open time. (We retain the existing code which scans the enum itself for duplicates because it is still an error to add an enumerator twice to a non-root-visible enum type; but we only need to do this if the enum is non-root-visible, so the cost of enum addition is reduced.) Tested in an upcoming commit. libctf/ * ctf-impl.h (ctf_dict_t) <ctf_names>: Augment comment. <ctf_conflicting_enums>: New. (ctf_dynset_elements): New. * ctf-hash.c (ctf_dynset_elements): Implement it. * ctf-open.c (init_static_types): Split body into... (init_static_types_internal): ... here. Count enumerators; keep track of observed enums in pass 2; populate ctf_names and ctf_conflicting_enums with enumerators in a third pass. (ctf_dict_close): Free ctf_conflicting_enums. * ctf-create.c (ctf_add_enumerator): Prohibit addition of duplicate enumerators in root-visible enum types. include/ * ctf-api.h (CTF_ADD_NONROOT): Describe what non-rootness means for enumeration constants. (ctf_add_enumerator): The name is not a misnomer. We now require that enumerators have unique names. Document the non-rootness of enumerators.	2024-06-18 13:20:32 +01:00
Nick Alcock	f8da1a05db	libctf: dedup: enums with overlapping enumerators are conflicting The CTF deduplicator was not considering enumerators inside enum types to be things that caused type conflicts, so if the following two TUs were linked together, you would end up with the following in the resulting dict: 1.c: enum foo { A, B }; 2.c: enum bar { A, B }; linked: enum foo { A, B }; enum bar { A, B }; This does work -- but it's not something that's valid C, and the general point of the shared dict is that it is something that you could potentially get from any valid C TU. So consider such types to be conflicting, but obviously don't consider actually identical enums to be conflicting, even though they too have (all) their identifiers in common. This involves surprisingly little code. The deduplicator detects conflicting types by counting types in a hash table of hash tables: decorated identifier -> (type hash -> count) where the COUNT is the number of times a given hash has been observed: any name with more than one hash associated with it is considered conflicting (the count is used to identify the most common such name for promotion to the shared dict). Before now, those identifiers were all the identifiers of types (possibly decorated with their namespace on the front for enumerator identifiers), but we can equally well put enumeration constant names in there, undecorated like the identifiers of types in the global namespace, with the type hash being the hash of each enum containing that enumerator. The existing conflicting-type-detection code will then accurately identify distinct enums with enumeration constants in common. The enum that contains the most commonly-appearing enumerators will be promoted to the shared dict. libctf/ * ctf-impl.h (ctf_dedup_t) <cd_name_counts>: Extend comment. * ctf-dedup.c (ctf_dedup_count_name): New, split out of... (ctf_dedup_populate_mappings): ... here. Call it for all * enumeration constants in an enum as well as types. ld/ * testsuite/ld-ctf/enum-3.c: New test CTF. * testsuite/ld-ctf/enum-4.c: Likewise. * testsuite/ld-ctf/overlapping-enums.d: New test. * testsuite/ld-ctf/overlapping-enums-2.d: Likewise.	2024-06-18 13:20:31 +01:00
Nick Alcock	483546ce4f	libctf: make ctf_serialize() actually serialize ctf_serialize() evolved from the old ctf_update(), which mutated the in-memory CTF dict to make all the dynamic in-memory types into static, unchanging written-to-the-dict types (by deserializing and reserializing it): back in the days when you could only do type lookups on static types, this meant you could see all the types you added recently, at the small, small cost of making it impossible to change those older types ever again and inducing an amortized O(n^2) cost if you actually wanted to add references to types you added at arbitrary times to later types. It also reset things so that ctf_discard() would throw away only types you added after the most recent ctf_update() call. Some time ago this was all changed so that you could look up dynamic types just as easily as static types: ctf_update() changed so that only its visible side-effect of affecting ctf_discard() remained: the old ctf_update() was renamed to ctf_serialize(), made internal to libctf, and called from the various functions that wrote files out. ... but it was still working by serializing and deserializing the entire dict, swapping out its guts with the newly-serialized copy in an invasive and horrible fashion that coupled ctf_serialize() to almost every field in the ctf_dict_t. This is totally useless, and fixing it is easy: just rip all that code out and have ctf_serialize return a serialized representation, and let everything use that directly. This simplifies most of its callers significantly. (It also points up another bug: ctf_gzwrite() failed to call ctf_serialize() at all, so it would only ever work for a dict you just ctf_write_mem()ed yourself, just for its invisible side-effect of serializing the dict!) This lets us simplify away a bunch of internal-only open-side functionality for overriding the syn_ext_strtab and some just-added functionality for forcing in an existing atoms table, without loss of functionality, and lets us lift the restriction on reserializing a dict that was ctf_open()ed rather than being ctf_create()d: it's now perfectly OK to open a dict, modify it (except for adding members to existing structs, unions, or enums, which fails with -ECTF_RDONLY), and write it out again, just as one would expect. libctf/ * ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos. (ctf_type_sect_size): Add static type sizes too. (ctf_serialize): Return the new dict rather than updating the existing dict. No longer fail for dicts with static types; copy them onto the start of the new types table. (ctf_gzwrite): Actually serialize before gzwriting. (ctf_write_mem): Improve forced (test-mode) endian-flipping: flip dicts even if they are too small to be compressed. Improve confusing variable naming. * ctf-archive.c (arc_write_one_ctf): Don't bother to call ctf_serialize: both the functions we call do so. * ctf-string.c (ctf_str_create_atoms): Drop serializing case (atoms arg). * ctf-open.c (ctf_simple_open): Call ctf_bufopen directly. (ctf_simple_open_internal): Delete. (ctf_bufopen_internal): Delete/rename to ctf_bufopen: no longer bother with syn_ext_strtab or forced atoms table, serialization no longer needs them. * ctf-create.c (ctf_create): Call ctf_bufopen directly. * ctf-impl.h (ctf_str_create_atoms): Drop atoms arg. (ctf_simple_open_internal): Delete. (ctf_bufopen_internal): Likewise. (ctf_serialize): Adjust. * testsuite/libctf-lookup/add-to-opened.c: Adjust now that this is supposed to work.	2024-04-19 16:14:47 +01:00
Nick Alcock	cf9da3b0b6	libctf: rethink strtab writeout This commit finally adjusts strtab writeout so that repeated writeouts, or writeouts of a dict that was read in earlier, only sorts the portion of the strtab that was newly added. There are three intertwined changes here: - pull the contents of strtabs from newly ctf_bufopened dicts into the atoms table, so that future additions will reuse the existing offset etc rather than adding new identical strings - allow the internal ctf_bufopen done by serialization to contribute its existing atoms table, so that existing atoms can be used for the remainder of the open process (like name table construction): this atoms table currente gets thrown away in the mass reassignment done later in ctf_serialize in any case, but it needs to be there during the open. - rewrite ctf_str_write_strtab so that a) it uses iterators rather than ctf__iter, reducing pointless structures which serve no other purpose than to implement ordinary variable scope, but more clunkily, and b) retains the existing strtab on the front of the new one, with its sort retained, rather than resorting, so all existing already-written strtab offsets remain valid across the call. This latter change finally permits repeated serializations, and reserializations of ctf_open()ed dicts, to work, but for now we keep the code that prevents that because serialization is about to change again in a way that will make it more obvious that doing such things is safe, and we can take it out then. (There are also some smaller changes like moving the purge of the refs table into ctf_str_write_strtab(), since that's where the changes happen that invalidate it, rather than doing it in ctf_serialize(). We also prohibit something that has never worked, opening a dict and then reporting symbols to it via ctf_link_add_strtab() et al: you must do that to newly-created dicts which have had stuff ctf_link()ed into them. This is very unlikely ever to be a problem in practice: linkers just don't do that sort of thing.) libctf/ ctf-create.c (ctf_create): Add (temporary) atoms arg. * ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New. (ctf_str_create_atoms): Adjust. (ctf_str_write_strtab): Likewise. (ctf_simple_open_internal): Likewise. * ctf-open.c (ctf_simple_open_internal): Add atoms arg. (ctf_bufopen): Likewise. (ctf_bufopen_internal): Initialize just enough of an atoms table: pre-init from the atoms arg if supplied. (ctf_simple_open): Adjust. * ctf-serialize.c (ctf_serialize): Constify the strtab. Move ref list purging into ctf_str_write_strtab. Initialize the new dict with the old dict's atoms table. Accept the new strtab from ctf_str_write_strtab. Adjust for addition of ctf_dynstrtab. * ctf-string.c (ctf_strraw_explicit): Improve comments. (ctf_str_create_atoms): Prepopulate from an existing atoms table, or alternatively pull in all strings from the strtab and turn them into atoms. (ctf_str_free_atoms): Free the dynstrtab and its strtab. (struct ctf_strtab_write_state): Remove. (ctf_str_count_strtab): Fold this... (ctf_str_populate_sorttab): ... and this... (ctf_str_write_strtab): ... into this. Prepend existing strings to the strtab rather than resorting them (and wrecking their offsets). Keep the dynstrtab updated. Update refs for all atoms with refs, whether or not they are strings newly added to the strtab.	2024-04-19 16:14:47 +01:00
Nick Alcock	149ce5c263	libctf: replace 'pending refs' abstraction A few years ago we introduced a 'pending refs' abstraction to fix one problem: serializing a dict, then changing it would tend to corrupt the dict because the strtab sort we do on strtab writeout (to improve compression efficiency) would modify the offset of any strings that sorted lexicographically earlier in the strtab: so we added a new restriction that all strings are added only at serialization time, and maintained a set of 'pending' refs that were added earlier, whose offsets we could update (like other refs) at writeout time. This was in hindsight seriously problematic for maintenance (because serialization has to traverse all strings in all datatypes in the entire dict), and has become impossible to sustain now that we can read in existing dicts, modify them, and reserialize them again. We really don't want to have to dig through the entire dict we jut read in just in order to dig out all its strtab offsets, then change it, just for the sake of a sort that adds a frankly trivial amount of compression efficiency. Sorting is still worthwhile -- but it sacrifices very little to only sort newly-added portions of the strtab, reusing older portions as necessary. As a first stage in this, discard the whole "pending refs" abstraction and replace it with "movable" refs, which are exactly like all other refs (addresses containing the strtab offset of some string, which are updated wiht the final strtab offset on serialization) except that we track them in a reverse dict so that we can move the refs around (which we do whenever we realloc() a buffer containing a bunch of structure members or something when we add members to the structure). libctf/ * ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add a movable ref. (ctf_add_member_offset): Likewise. * ctf-util.c (ctf_realloc): Delete. * ctf-serialize.c (ctf_serialize): No longer use it. Adjust to new fields. * ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs. (ctf_str_free_atom): Free freeable atoms' strings. (ctf_str_create_atoms): Create the movable refs dynhash if needed. (ctf_str_free_atoms): Destroy it. (CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous reversion). Add new flag. (aref_create): New, populate movable refs if need be. (ctf_str_add_ref_internal): Switch back to flags, update refs directly for nonprovisional strings (with already-known fixed offsets); create refs via aref_create. Allocate strings only if not within an mmapped strtab. (ctf_str_add_movable_ref): New. (ctf_str_add): Adjust to CTF_STR_* reintroduction. (ctf_str_add_external): LIkewise. (ctf_str_move_refs): New, move refs via ctf_str_movable_refs backpointer. (ctf_str_purge_refs): Drop ctf_str_num_refs. (ctf_str_update_refs): Fix indentation. * ctf-impl.h (struct ctf_str_atom_movable): New. (struct ctf_dict.ctf_str_num_refs): Drop. (struct ctf_dict.ctf_str_movable_refs): New. (ctf_str_add_movable_ref): Declare. (ctf_str_move_refs): Likewise. (ctf_realloc): Drop.	2024-04-19 16:14:46 +01:00
Nick Alcock	3301ddba1b	Revert "libctf: do not corrupt strings across ctf_serialize" This reverts commit `986e9e3aa0`. (We do not revert the testcase -- it remains valid -- but we are taking a different, less complex and more robust approach.) This also deletes the pending refs abstraction without (yet) replacing it, so some tests will fail for a commit or two.	2024-04-19 16:14:46 +01:00
Nick Alcock	629acbe4a3	libctf: rename ctf_dict.ctf_{symtab,strtab} These two fields are constantly confusing because CTF dicts contain both a symtypetab and strtab, but these fields are not that: they are the symtab and strtab from the ELF file. We have enough string tables now (internal, external, synthetic external, dynamic) that we need to at least name them better than this to avoid getting totally confused. Rename them to ctf_ext_symtab and ctf_ext_strtab. libctf/ * ctf-dump.c (ctf_dump_objts): Rename ctf_symtab -> ctf_ext_symtab. * ctf-impl.h (struct ctf_dict.ctf_symtab): Rename to... (struct ctf_dict.ctf_ext_strtab): ... this. (struct ctf_dict.ctf_strtab): Rename to... (struct ctf_dict.ctf_ext_strtab): ... this. * ctf-lookup.c (ctf_lookup_symbol_name): Adapt. (ctf_lookup_symbol_idx): Adapt. (ctf_lookup_by_sym_or_name): Adapt. * ctf-open.c (ctf_bufopen_internal): Adapt. (ctf_dict_close): Adapt. (ctf_getsymsect): Adapt. (ctf_getstrsect): Adapt. (ctf_symsect_endianness): Adapt.	2024-04-19 16:14:46 +01:00
Nick Alcock	bb2a9a465e	libctf: fix a comment typo ctf_update has been called ctf_serialize for years now. libctf/ * ctf-impl.h: Fix comment typo.	2024-04-19 16:14:46 +01:00
Nick Alcock	4fa4e3d92a	libctf: delete LCTF_DIRTY This flag was meant as an optimization to avoid reserializing dicts unnecessarily. It was critically necessary back when serialization was done by ctf_update() and you had to call that every time you wanted any new modifications to the type table to be usable by other types, but that has been unnecessary for years now, and serialization is only done once when writing out, which one would naturally assume would always serialize the dict. Worse, it never really worked: it only tracked newly-added types, not things like added symbols which might equally well require reserialization, and it gets in the way of an upcoming change. Delete entirely. libctf/ * ctf-create.c (ctf_create): Drop LCTF_DIRTY. (ctf_discard): Likewise. (ctf_rollback): Likewise. (ctf_add_generic): Likewise. (ctf_set_array): Likewise. (ctf_add_enumerator): Likewise. (ctf_add_member_offset): Likewise. (ctf_add_variable_forced): Likewise. * ctf-link.c (ctf_link_intern_extern_string): Likewise. (ctf_link_add_strtab): Likewise. * ctf-serialize.c (ctf_serialize): Likewise. * ctf-impl.h (LCTF_DIRTY): Likewise. (LCTF_LINKING): Renumber.	2024-04-19 16:14:46 +01:00
Nick Alcock	8a60c93096	libctf: support addition of types to dicts read via ctf_open() libctf has long declared deserialized dictionaries (out of files or ELF sections or memory buffers or whatever) to be read-only: back in the furthest prehistory this was not the case, in that you could add a few sorts of type to such dicts, but attempting to do so often caused horrible memory corruption, so I banned the lot. But it turns out real consumers want it (notably DTrace, which synthesises pointers to types that don't have them and adds them to the ctf_open()ed dicts if it needs them). Let's bring it back again, but without the memory corruption and without the massive code duplication required in days of yore to distinguish between static and dynamic types: the representation of both types has been identical for a few years, with the only difference being that types as a whole are stored in a big buffer for types read in via ctf_open and per-type hashtables for newly-added types. So we discard the internally-visible concept of "readonly dictionaries" in favour of declaring the range of types that were already present when the dict was read in to be read-only: you can't modify them (say, by adding members to them if they're structs, or calling ctf_set_array on them), but you can add more types and point to them. (The API remains the same, with calls sometimes returning ECTF_RDONLY, but now they do so less often.) This is a fairly invasive change, mostly because code written since the ban was introduced didn't take the possibility of a static/dynamic split into account. Some of these irregularities were hard to define as anything but bugs. Notably: - The symbol handling was assuming that symbols only needed to be looked for in dynamic hashtabs or static linker-laid-out indexed/ nonindexed layouts, but now we want to check both in case people added more symbols to a dict they opened. - The code that handles type additions wasn't checking to see if types with the same name existed at all (so you could do ctf_add_typedef (fp, "foo", bar) repeatedly without error). This seems reasonable for types you just added, but we probably do want to ban addition of types with names that override names we already used in the ctf_open()ed portion, since that would probably corrupt existing type relationships. (Doing things this way also avoids causing new errors for any existing code that was doing this sort of thing.) - ctf_lookup_variable entirely failed to work for variables just added by ctf_add_variable: you had to write the dict out and read it back in again before they appeared. - The symbol handling remembered what symbols you looked up but didn't remember their types, so you could look up an object symbol and then find it popping up when you asked for function symbols, which seems less than ideal. Since we had to rejig things enough to be able to distinguish function and object symbols internally anyway (in order to give suitable errors if you try to add a symbol with a name that already existed in the ctf_open()ed dict), this bug suddenly became more visible and was easily fixed. We do not (yet) support writing out dicts that have been previously read in via ctf_open() or other deserializer (you can look things up in them, but not write them out a second time). This never worked, so there is no incompatibility; if it is needed at a later date, the serializer is a little bit closer to having it work now (the only table we don't deal with is the types table, and that's because the upcoming CTFv4 changes are likely to make major changes to the way that table is represented internally, so adding more code that depends on its current form seems like a bad idea). There is a new testcase that tests much of this, in particular that modification of existing types is still banned and that you can add new ones and chase them without error. libctf/ * ctf-impl.h (struct ctf_dict.ctf_symhash): Split into... (ctf_dict.ctf_symhash_func): ... this and... (ctf_dict.ctf_symhash_objt): ... this. (ctf_dict.ctf_stypes): New, counts static types. (LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR. (LCTF_RDWR): Deleted. (LCTF_DIRTY): Renumbered. (LCTF_LINKING): Likewise. (ctf_lookup_variable_here): New. (ctf_lookup_by_sym_or_name): Likewise. (ctf_symbol_next_static): Likewise. (ctf_add_variable_forced): Likewise. (ctf_add_funcobjt_sym_forced): Likewise. (ctf_simple_open_internal): Adjust. (ctf_bufopen_internal): Likewise. * ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with. (ctf_create): Migrate a bunch of initializations into bufopen. Force recreation of name tables. Do not forcibly override the model, let ctf_bufopen do it. (ctf_static_type): New. (ctf_update): Drop LCTF_RDWR check. (ctf_dynamic_type): Likewise. (ctf_add_function): Likewise. (ctf_add_type_internal): Likewise. (ctf_rollback): Check ctf_stypes, not LCTF_RDWR. (ctf_set_array): Likewise. (ctf_add_struct_sized): Likewise. (ctf_add_union_sized): Likewise. (ctf_add_enum): Likewise. (ctf_add_enumerator): Likewise (only on the target dict). (ctf_add_member_offset): Likewise. (ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types with colliding names. (ctf_add_forward): Note safety under the new rules. (ctf_add_variable): Split all but the existence check into... (ctf_add_variable_forced): ... this new function. (ctf_add_funcobjt_sym): Likewise... (ctf_add_funcobjt_sym_forced): ... for this new function. * ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts with any stypes. (ctf_link_add_strtab): Likewise. (ctf_link_shuffle_syms): Likewise. (ctf_link_intern_extern_string): Note pre-existing prohibition. * ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check. (ctf_lookup_variable): Split out looking in a dict but not its parent into... (ctf_lookup_variable_here): ... this new function. (ctf_lookup_symbol_idx): Track whether looking up a function or object: cache them separately. (ctf_symbol_next): Split out looking in non-dynamic symtypetab entries to... (ctf_symbol_next_static): ... this new function. Don't get confused by the simultaneous presence of static and dynamic symtypetab entries. (ctf_try_lookup_indexed): Don't waste time looking up symbols by index before there can be any idea how symbols are numbered. (ctf_lookup_by_sym_or_name): Distinguish between function and data object lookups. Drop LCTF_RDWR. (ctf_lookup_by_symbol): Adjust. (ctf_lookup_by_symbol_name): Likewise. * ctf-open.c (init_types): Rename to... (init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes. (ctf_simple_open): Drop writable arg. (ctf_simple_open_internal): Likewise. (ctf_bufopen): Likewise. (ctf_bufopen_internal): Populate fields only used for writable dicts. Drop LCTF_RDWR. (ctf_dict_close): Cater for symhash cache split. * ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR. * ctf-types.c (ctf_variable_next): Drop LCTF_RDWR. * testsuite/libctf-lookup/add-to-opened*: New test.	2024-04-19 16:14:46 +01:00
Nick Alcock	54a0219150	libctf: remove static/dynamic name lookup distinction libctf internally maintains a set of hash tables for type name lookups, one for each valid C type namespace (struct, union, enum, and everything else). Or, rather, it maintains two sets of hash tables: one, a ctf_hash , is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the other, a ctf_dynhash , is meant for lookups in ctf_create()d dicts. This distinction was somewhat valuable in the far pre-binutils past when two different hashtable implementations were used (one expanding, the other fixed-size), but those days are long gone: the hash table implementations are almost identical, both wrappers around the libiberty hashtab. The ctf_dynhash has many more capabilities than the ctf_hash (iteration, deletion, etc etc) and has no downsides other than starting at a fixed, arbitrary small size. That limitation is easy to lift (via a new ctf_dynhash_create_sized()), following which we can throw away nearly all the ctf_hash implementation, and all the code to choose between readable and writable hashtabs; the few convenience functions that are still useful (for insertion of name -> type mappings) can also be generalized a bit so that the extra string verification they do is potentially available to other string lookups as well. (libctf still has two hashtable implementations, ctf_dynhash, above, and ctf_dynset, which is a key-only hashtab that can avoid a great many malloc()s, used for high-volume applications in the deduplicator.) libctf/ * ctf-create.c (ctf_create): Eliminate ctn_writable. (ctf_dtd_insert): Likewise. (ctf_dtd_delete): Likewise. (ctf_rollback): Likewise. (ctf_name_table): Eliminate ctf_names_t. * ctf-hash.c (ctf_dynhash_create): Comment update. Reimplement in terms of... (ctf_dynhash_create_sized): ... this new function. (ctf_hash_create): Remove. (ctf_hash_size): Remove. (ctf_hash_define_type): Remove. (ctf_hash_destroy): Remove. (ctf_hash_lookup_type): Rename to... (ctf_dynhash_lookup_type): ... this. (ctf_hash_insert_type): Rename to... (ctf_dynhash_insert_type): ... this, moving validation to... * ctf-string.c (ctf_strptr_validate): ... this new function. * ctf-impl.h (struct ctf_names): Extirpate. (struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t. (struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t. (ctf_name_table): Now returns a ctf_dynhash_t. (ctf_lookup_by_rawhash): Remove. (ctf_hash_create): Likewise. (ctf_hash_insert_type): Likewise. (ctf_hash_define_type): Likewise. (ctf_hash_lookup_type): Likewise. (ctf_hash_size): Likewise. (ctf_hash_destroy): Likewise. (ctf_dynhash_create_sized): New. (ctf_dynhash_insert_type): New. (ctf_dynhash_lookup_type): New. (ctf_strptr_validate): New. * ctf-lookup.c (ctf_lookup_by_name_internal): Adapt. * ctf-open.c (init_types): Adapt. (ctf_set_ctl_hashes): Adapt. (ctf_dict_close): Adapt. * ctf-serialize.c (ctf_serialize): Adapt. * ctf-types.c (ctf_lookup_by_rawhash): Remove.	2024-04-19 16:14:46 +01:00
Alan Modra	fd67aa1129	Update year range in copyright notice of binutils files Adds two new external authors to etc/update-copyright.py to cover bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then updates copyright messages as follows: 1) Update cgen/utils.scm emitted copyrights. 2) Run "etc/update-copyright.py --this-year" with an extra external author I haven't committed, 'Kalray SA.', to cover gas testsuite files (which should have their copyright message removed). 3) Build with --enable-maintainer-mode --enable-cgen-maint=yes. 4) Check out /po/.pot which we don't update frequently.	2024-01-04 22:58:12 +10:30
Torbjörn SVENSSON	998a4f589d	libctf: Sanitize error types for PR 30836 Made sure there is no implicit conversion between signed and unsigned return value for functions setting the ctf_errno value. An example of the problem is that in ctf_member_next, the "offset" value is either 0L or (ctf_id_t)-1L, but it should have been 0L or -1L. The issue was discovered while building a 64 bit ld binary to be executed on the Windows platform. Example object file that demonstrates the issue is attached in the PR. libctf/ Affected functions adjusted. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com> Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>	2023-10-17 17:31:20 +02:00
Alan Modra	d87bef3a7b	Update year range in copyright notice of binutils files The newer update-copyright.py fixes file encoding too, removing cr/lf on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and embedded cr in binutils/testsuite/binutils-all/ar.exp string match.	2023-01-01 21:50:11 +10:30
Nick Alcock	6bd2318f32	libctf: fix linking together multiple objects derived from the same source Right now, if you compile the same .c input repeatedly with CTF enabled and different compilation flags, then arrange to link all of these together, then things misbehave in various ways. libctf may conflate either inputs (if the .o files have the same name, say if they are stored in different .a archives), or per-CU outputs when conflicting types are found: the latter can lead to entirely spurious errors when it tries to produce multiple per-CU outputs with the same name (discarding all but the last, but then looking for types in the earlier ones which have just been thrown away). Fixing this is multi-pronged. Both inputs and outputs need to be differentiated in the hashtables libctf keeps them in: inputs with the same cuname and filename need to be considered distinct as long as they have different associated CTF dicts, and per-CU outputs need to be considered distinct as long as they have different associated input dicts. Right now there is nothing tying the two together other than the CU name: fix this by introducing a new field in the ctf_dict_t named ctf_link_in_out, which (for input dicts) points to the associated per-CU output dict (if any), and for output dicts points to the associated input dict. At creation time the name used is completely arbitrary: it's only important that it be distinct if CTF dicts are distinct. So, when a clash is found, adjust the CU name by sticking the number of elements in the input on the end. At output time, the CU name will appear in the linked object, so it matters a little more that it look slightly less ugly: in conflicting cases, append an incrementing integer, starting at 0. This naming scheme is not very helpful, but it's hard to see what else we can do. The input .o name may be the same. The input .a name is not even visible to ctf_link, and even that might be the same, because .a's can contain many members with the same name, all of which participate in the link. All we really know is that the two have distinct dictionaries with distinct types in them, and at least this way they are all represented, any any symbols, variables etc referring to those types are accurately stored. (As a side-effect this also fixes a use-after-free and double-free when errors are found during variable or symbol emission.) Use the opportunity to prevent a couple of sources of problems, to wit changing the active CU mappings when a link has already been done (no effect on ld, which doesn't use CU mappings at all), and causing multiple consecutive ctf_link's to have the same net effect as just doing the last one (no effect on ld, which only ever does one ctf_link) rather than having the links be a sort of half-incremental not-really-intended mess. libctf/ChangeLog: PR libctf/29242 * ctf-impl.h (struct ctf_dict) [ctf_link_in_out]: New. * ctf-dedup.c (ctf_dedup_emit_type): Set it. * ctf-link.c (ctf_link_add_ctf_internal): Set the input CU name uniquely when clashes are found. (ctf_link_add): Document what repeated additions do. (ctf_new_per_cu_name): New, come up with a consistent name for a new per-CU dict. (ctf_link_deduplicating): Use it. (ctf_create_per_cu): Use it, and ctf_link_in_out, and set ctf_link_in_out properly. Don't overwrite per-CU dicts with per-CU dicts relating to different inputs. (ctf_link_add_cu_mapping): Prevent per-CU mappings being set up if we already have per-CU outputs. (ctf_link_one_variable): Adjust ctf_link_per_cu call. (ctf_link_deduplicating_one_symtypetab): Likewise. (ctf_link_empty_outputs): New, delete all the ctf_link_outputs and blank out ctf_link_in_out on the corresponding inputs. (ctf_link): Clarify the effect of multiple ctf_link calls. Empty ctf_link_outputs if it already exists rather than having the old output leak into the new link. Fix a variable name. * testsuite/config/default.exp (AR): Add. (OBJDUMP): Likewise. * testsuite/libctf-regression/libctf-repeat-cu.exp: New test. * testsuite/libctf-regression/libctf-repeat-cu*: Main program, library, and expected results for the test.	2022-06-21 19:27:15 +01:00
Nick Alcock	faf5e6ace8	libctf: add LIBCTF_WRITE_FOREIGN_ENDIAN debugging option libctf has always handled endianness differences by detecting foreign-endian CTF dicts on the input and endian-flipping them: dicts are always written in native endianness. This makes endian-awareness very low overhead, but it means that the foreign-endian code paths almost never get routinely tested, since "make check" usually reads in dicts ld has just written out: only a few corrupted-CTF tests are actually in fixed endianness, and even they only test the foreign- endian code paths when you run make check on a big-endian machine. (And the fix is surely not to add more .s-based tests like that, because they are a nightmare to maintain compared to the C-code-based ones.) To improve on this, add a new environment variable, LIBCTF_WRITE_FOREIGN_ENDIAN, which causes libctf to unconditionally endian-flip at ctf_write time, so the output is always in the wrong endianness. This then tests the foreign-endian read paths properly at open time. Make this easier by restructuring the writeout code in ctf-serialize.c, which duplicates the maybe-gzip-and-write-out code three times (once for ctf_write_mem, with thresholding, and once each for ctf_compress_write and ctf_write just so those can avoid thresholding and/or compression). Instead, have the latter two call the former with thresholds of 0 or (size_t) -1, respectively. The endian-flipping code itself gains a bit of complexity, because one single endian-flipper (flip_types) was assuming the input to be in foreign-endian form and assuming it could pull things out of the input once they had been flipped and make sense of them. At the cost of a few lines of duplicated initializations, teach it to read before flipping if we're flipping to foreign-endianness instead of away from it. libctf/ * ctf-impl.h (ctf_flip_header): No longer static. (ctf_flip): Likewise. * ctf-open.c (flip_header): Rename to... (ctf_flip_header): ... this, now it is not private to one file. (flip_ctf): Rename... (ctf_flip): ... this too. Add FOREIGN_ENDIAN arg. (flip_types): Likewise. Use it. (ctf_bufopen_internal): Adjust calls. * ctf-serialize.c (ctf_write_mem): Add flip_endian path via a newly-allocated bounce buffer. (ctf_compress_write): Move below ctf_write_mem and reimplement in terms of it. (ctf_write): Likewise. (ctf_gzwrite): Note that this obscure writeout function does not support endian-flipping.	2022-03-23 13:48:32 +00:00
Alan Modra	a2c5833233	Update year range in copyright notice of binutils files The result of running etc/update-copyright.py --this-year, fixing all the files whose mode is changed by the script, plus a build with --enable-maintainer-mode --enable-cgen-maint=yes, then checking out /po/.pot which we don't update frequently. The copy of cgen was with commit d1dd5fcc38ead reverted as that commit breaks building of bfp opcodes files.	2022-01-02 12:04:28 +10:30
Alan Modra	4821e618ad	Use htab_eq_string in libctf * ctf-impl.h (ctf_dynset_eq_string): Don't declare. * ctf-hash.c (ctf_dynset_eq_string): Delete function. * ctf-dedup.c (make_set_element): Use htab_eq_string. (ctf_dedup_atoms_init, ADD_CITER, ctf_dedup_init): Likewise. (ctf_dedup_conflictify_unshared): Likewise. (ctf_dedup_walk_output_mapping): Likewise.	2021-05-09 12:28:32 +09:30
Alan Modra	e93388417c	Provide an inline startswith function in bfd.h bfd/ * bfd-in.h (startswith): New inline. (CONST_STRNEQ): Use startswith. * bfd-in2.h: Regenerate. gdbsupport/ * common-utils.h (startswith): Delete version now supplied by bfd.h. libctf/ * ctf-impl.h: Include string.h.	2021-03-21 23:00:32 +10:30
Nick Alcock	d7b1416ef2	libctf: types: unify code dealing with small-vs-large struct members This completes the job of unifying what was once three separate code paths full of duplication for every function dealing with querying the properties of struct and union members. The dynamic code path was already removed: this change removes the distinction between small and large members, by adding a helper that copies out members from the vlen, expanding small members into large ones as it does so. This makes it possible to have more representations of things like structure members without needing to change the querying functions at all. It also lets us check for buffer overruns more effectively, verifying that we don't accidentally overrun the end of the vlen in either the dynamic or static type case. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_next_t) <ctn_tp>: New. <u.ctn_mp>: Remove. <u.ctn_lmp>: Remove. <u.ctn_vlen>: New. * ctf-types.c (ctf_struct_member): New. (ctf_member_next): Use it, dropping separate large/small code paths. (ctf_type_align): Likewise. (ctf_member_info): Likewise. (ctf_type_rvisit): Likewise.	2021-03-18 12:40:41 +00:00
Nick Alcock	08c428aff4	libctf: eliminate dtd_u, part 5: structs / unions Eliminate the dynamic member storage for structs and unions as we have for other dynamic types. This is much like the previous enum elimination, except that structs and unions are the only types for which a full-sized ctf_type_t might be needed. Up to now, this decision has been made in the individual ctf_add_{struct,union}_sized functions and duplicated in ctf_add_member_offset. The vlen machinery lets us simplify this, always allocating a ctf_lmember_t and setting the dtd_data's ctt_size to CTF_LSIZE_SENT: we figure out whether this is really justified and (almost always) repack things down into a ctf_stype_t at ctf_serialize time. This allows us to eliminate the dynamic member paths from the iterators and query functions in ctf-types.c in favour of always using the large-structure vlen stuff for dynamic types (the diff is ugly but that's just because of the volume of reindentation this calls for). This also means the large-structure vlen stuff gets more heavily tested, which is nice because it was an almost totally unused code path before now (it only kicked in for structures of size >4GiB, and how often do you see those?) The only extra complexity here is ctf_add_type. Back in the days of the nondeduplicating linker this was called a ridiculous number of times for countless identical copies of structures: eschewing the repeated lookups of the dtd in ctf_add_member_offset and adding the members directly saved an amazing amount of time. Now the nondeduplicating linker is gone, this is extreme overoptimization: we can rip out the direct addition and use ctf_member_next and ctf_add_member_offset, just like ctf_dedup_emit does. We augment a ctf_add_type test to try adding a self-referential struct, the only thing the ctf_add_type part of this change really perturbs. This completes the elimination of dtd_u. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtu_members>: Remove. <dtd_u>: Likewise. (ctf_dmdef_t): Remove. (struct ctf_next) <u.ctn_dmd>: Remove. * ctf-create.c (INITIAL_VLEN): New, more-or-less arbitrary initial vlen size. (ctf_add_enum): Use it. (ctf_dtd_delete): Do not free the (removed) dmd; remove string refs from the vlen on struct deletion. (ctf_add_struct_sized): Populate the vlen: do it by hand if promoting forwards. Always populate the full-size lsizehi/lsizelo members. (ctf_add_union_sized): Likewise. (ctf_add_member_offset): Set up the vlen rather than the dmd. Expand it as needed, repointing string refs via ctf_str_move_pending. Add the member names as pending strings. Always populate the full-size lsizehi/lsizelo members. (membadd): Remove, folding back into... (ctf_add_type_internal): ... here, adding via an ordinary ctf_add_struct_sized and _next iteration rather than doing everything by hand. * ctf-serialize.c (ctf_copy_smembers): Remove this... (ctf_copy_lmembers): ... and this... (ctf_emit_type_sect): ... folding into here. Figure out if a ctf_stype_t is needed here, not in ctf_add__sized. (ctf_type_sect_size): Figure out the ctf_stype_t stuff the same way here. ctf-types.c (ctf_member_next): Remove the dmd path and always use the vlen. Force large-structure usage for dynamic types. (ctf_type_align): Likewise. (ctf_member_info): Likewise. (ctf_type_rvisit): Likewise. * testsuite/libctf-regression/type-add-unnamed-struct-ctf.c: Add a self-referential type to this test. * testsuite/libctf-regression/type-add-unnamed-struct.c: Adjusted accordingly. * testsuite/libctf-regression/type-add-unnamed-struct.lk: Likewise.	2021-03-18 12:40:40 +00:00
Nick Alcock	77d724a7ec	libctf: eliminate dtd_u, part 4: enums This is the first tricky one, the first complex multi-entry vlen containing strings. To handle this in vlen form, we have to handle pending refs moving around on realloc. We grow vlen regions using a new ctf_grow_vlen function, and iterate through the existing enums every time a grow happens, telling the string machinery the distance between the old and new vlen region and letting it adjust the pending refs accordingly. (This avoids traversing all outstanding refs to find the refs that need adjusting, at the cost of having to traverse one enum: an obvious major performance win.) Addition of enums themselves (and also structs/unions later) is a bit trickier than earlier forms, because the type might be being promoted from a forward, and forwards have no vlen: so we have to spot that and create it if needed. Serialization of enums simplifies down to just telling the string machinery about the string refs; all the enum type-lookup code loses all its dynamic member lookup complexity entirely. A new test is added that iterates over (and gets values of) an enum with enough members to force a round of vlen growth. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_vlen_alloc>: New. (ctf_str_move_pending): Declare. * ctf-string.c (ctf_str_add_ref_internal): Fix error return. (ctf_str_move_pending): New. * ctf-create.c (ctf_grow_vlen): New. (ctf_dtd_delete): Zero out the vlen_alloc after free. Free the vlen later: iterate over it and free enum name refs first. (ctf_add_generic): Populate dtd_vlen_alloc from vlen. (ctf_add_enum): populate the vlen; do it by hand if promoting forwards. (ctf_add_enumerator): Set up the vlen rather than the dmd. Expand it as needed, repointing string refs via ctf_str_move_pending. Add the enumerand names as pending strings. * ctf-serialize.c (ctf_copy_emembers): Remove. (ctf_emit_type_sect): Copy the vlen into place and ref the strings. * ctf-types.c (ctf_enum_next): The dynamic portion now uses the same code as the non-dynamic. (ctf_enum_name): Likewise. (ctf_enum_value): Likewise. * testsuite/libctf-lookup/enum-many-ctf.c: New test. * testsuite/libctf-lookup/enum-many.lk: New test.	2021-03-18 12:40:40 +00:00
Nick Alcock	986e9e3aa0	libctf: do not corrupt strings across ctf_serialize The preceding change revealed a new bug: the string table is sorted for better compression, so repeated serialization with type (or member) additions in the middle can move strings around. But every serialization flushes the set of refs (the memory locations that are automatically updated with a final string offset when the strtab is updated), so if we are not to have string offsets go stale, we must do all ref additions within the serialization code (which walks the complete set of types and symbols anyway). Unfortunately, we were adding one ref in another place: the type name in the dynamic type definitions, which has a ref added to it by ctf_add_generic. So adding a type, serializing (via, say, one of the ctf_write functions), adding another type with a name that sorts earlier, and serializing again will corrupt the name of the first type because it no longer had a ref pointing to its dtd entry's name when its string offset was shifted later in the strtab to mae way for the other type. To ensure that we don't miss strings, we also maintain a set of pending refs that will be added later (during serialization), and remove entries from that set when the ref is finally added. We always use ctf_str_add_pending outside ctf-serialize.c, ensure that ctf_serialize adds all strtab offsets as refs (even those in the dtds) on every serialization, and mandate that no refs are live on entry to ctf_serialize and that all pending refs are gone before strtab finalization. (Of necessity ctf_serialize has to traverse all strtab offsets in the dtds in order to serialize them, so adding them as refs at the same time is easy.) (Note that we still can't erase unused atoms when we roll back, though we can erase unused refs: members and enums are still not removed by rollbacks and might reference strings added after the snapshot.) libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-hash.c (ctf_dynset_elements): New. * ctf-impl.h (ctf_dynset_elements): Declare it. (ctf_str_add_pending): Likewise. (ctf_dict_t) <ctf_str_pending_ref>: New, set of refs that must be added during serialization. * ctf-string.c (ctf_str_create_atoms): Initialize it. (CTF_STR_ADD_REF): New flag. (CTF_STR_MAKE_PROVISIONAL): Likewise. (CTF_STR_PENDING_REF): Likewise. (ctf_str_add_ref_internal): Take a flags word rather than int params. Populate, and clear out, ctf_str_pending_ref. (ctf_str_add): Adjust accordingly. (ctf_str_add_external): Likewise. (ctf_str_add_pending): New. (ctf_str_remove_ref): Also remove the potential ref if it is a pending ref. * ctf-serialize.c (ctf_serialize): Prohibit addition of strings with ctf_str_add_ref before serialization. Ensure that the ctf_str_pending_ref set is empty before strtab finalization. (ctf_emit_type_sect): Add a ref to the ctt_name. * ctf-create.c (ctf_add_generic): Add the ctt_name as a pending ref. * testsuite/libctf-writable/reserialize-strtab-corruption.*: New test.	2021-03-18 12:40:40 +00:00
Nick Alcock	81982d20fa	libctf: eliminate dtd_u, part 3: functions One more member vanishes from the dtd_u, leaving only the member for struct/union/enum members. There's not much to do here, since as of commit `afd78bd6f0` we use the same representation (type sizes, etc) in the dtu_argv as we will use in the final vlen, with one exception: the vlen has alignment padding, and the dtu_argv did not. Simplify things by adding suitable padding in both cases. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_argv>: Remove. * ctf-create.c (ctf_dtd_delete): No longer free it. (ctf_add_function): Use the dtd_vlen, not dtu_argv. Properly align. * ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen. * ctf-types.c (ctf_func_type_info): Just use the vlen. (ctf_func_type_args): Likewise.	2021-03-18 12:40:40 +00:00
Nick Alcock	534444b1ee	libctf: eliminate dtd_u, part 2: arrays This is even simpler than ints, floats and slices, with the only extra complication being the need to manually transfer the array parameter in the rarely-used function ctf_set_array. (Arrays are unique in libctf in that they can be modified post facto, not just created and appended to. I'm not sure why they got this exemption, but it's easy to maintain.) libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_arr>: Remove. * ctf-create.c (ctf_add_array): Use the dtd_vlen, not dtu_arr. (ctf_set_array): Likewise. * ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen. * ctf-types.c (ctf_array_info): Just use the vlen.	2021-03-18 12:40:40 +00:00
Nick Alcock	7879dd88ef	libctf: eliminate dtd_u, part 1: int/float/slice This series eliminates a lot of special-case code to handle dynamic types (types added to writable dicts and not yet serialized). Historically, when such types have variable-length data in their final CTF representations, libctf has always worked by adding such types to a special union (ctf_dtdef_t.dtd_u) in the dynamic type definition structure, then picking the members out of this structure at serialization time and packing them into their final form. This has the advantage that the ctf_add_* code doesn't need to know anything about the final CTF representation, but the significant disadvantage that all code that looks up types in any way needs two code paths, one for dynamic types, one for all others. Historically libctf "handled" this by not supporting most type lookups on dynamic types at all until ctf_update was called to do a complete reserialization of the entire dict (it didn't emit an error, it just emitted wrong results). Since commit `676c3ecbad`, which eliminated ctf_update in favour of the internal-only ctf_serialize function, all the type-lookup paths grew an extra branch to handle dynamic types. We can eliminate this branch again by dropping the dtd_u stuff and simply writing out the vlen in (close to) its final form at ctf_add_* time: type lookup for types using this approach is then identical for types in writable dicts and types that are in read-only ones, and serialization is also simplified (we just need to write out the vlen we already created). The only complexity lies in type kinds for which multiple vlen representations are valid depending on properties of the type, e.g. structures. But we can start simple, adjusting ints, floats, and slices to work this way, and leaving everything else as is. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_enc>: Remove. <dtd_u.dtu_slice>: Likewise. <dtd_vlen>: New. * ctf-create.c (ctf_add_generic): Perhaps allocate it. All callers adjusted. (ctf_dtd_delete): Free it. (ctf_add_slice): Use the dtd_vlen, not dtu_enc. (ctf_add_encoded): Likewise. Assert that this must be an int or float. * ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen. * ctf-dedup.c (ctf_dedup_rhash_type): Use the dtd_vlen, not dtu_slice. * ctf-types.c (ctf_type_reference): Likewise. (ctf_type_encoding): Remove most dynamic-type-specific code: just get the vlen from the right place. Report failure to look up the underlying type's encoding.	2021-03-18 12:40:36 +00:00
Nick Alcock	01cbfcba4b	libctf: fix comment above ctf_dict_t It is perfectly possible to have dynamically allocated data owned by a specific dict: you just have to teach ctf_serialize about it. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t): Fix comment.	2021-03-18 12:37:54 +00:00
Nick Alcock	f5060e5633	libctf: add a deduplicator-specific type mapping table When CTF linking is done, the linker has to track the association between types in the inputs and types in the outputs. The deduplicator does this via the cd_output_emission_hashes, which maps from hashes of types (valid in both the input and output) to the IDs of types in the specific dict in which the cd_emission_hashes is held. However, the nondeduplicating linker and ctf_add_type used a different mechanism, a dedicated hashtab stored in the ctf_link_type_mapping, populated via ctf_add_type_mapping and queried via the ctf_type_mapping function. To allow the same functions to be used for variable and symbol population in both the deduplicating and nondeduplicating linker, the deduplicator carefully transferred all its input->output mappings into this hashtab before returning. This is expensive. The number of entries in this hashtab scales as the number of input types, and unlike the hashing machinery the type mapping machinery (the only other thing which scales that way) has not been much optimized. Now the nondeduplicating linker is gone, we can throw this out, move the existing type mapping machinery to ctf-create.c and dedicate it to ctf_add_type alone, and add a new function ctf_dedup_type_mapping which uses the deduplicator's built-in knowledge of type mappings directly, without requiring an expensive repopulation phase. This speeds up a test link of nouveau.ko (a good worst-case candidate with a lot of types in each of a lot of input files) from 9.11s to 7.15s in my testing, a speedup of over 20%. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_link_type_mapping>: No longer used by the nondeduplicating linker. (ctf_add_type_mapping): Removed, now static. (ctf_type_mapping): Likewise. (ctf_dedup_type_mapping): New. (ctf_dedup_t) <cd_input_nums>: New. * ctf-dedup.c (ctf_dedup_init): Populate it. (ctf_dedup_fini): Free it again. Emphasise that this has to be the last thing called. (ctf_dedup): Populate it. (ctf_dedup_populate_type_mapping): Removed. (ctf_dedup_populate_type_mappings): Likewise. (ctf_dedup_emit): No longer call it. No longer call ctf_dedup_fini either. (ctf_dedup_type_mapping): New. * ctf-link.c (ctf_unnamed_cuname): New. (ctf_create_per_cu): Arguments must be non-null now. (ctf_in_member_cb_arg): Removed. (ctf_link): No longer populate it. No longer discard the mapping table. (ctf_link_deduplicating_one_symtypetab): Use ctf_dedup_type_mapping, not ctf_type_mapping. Use ctf_unnamed_cuname. (ctf_link_one_variable): Likewise. Pass in args individually: no longer a ctf_variable_iter callback. (empty_link_type_mapping): Removed. (ctf_link_deduplicating_variables): Use ctf_variable_next, not ctf_variable_iter. No longer pack arguments to ctf_link_one_variable into a struct. (ctf_link_deduplicating_per_cu): Call ctf_dedup_fini once all link phases are done. (ctf_link_deduplicating): Likewise. (ctf_link_intern_extern_string): Improve comment. (ctf_add_type_mapping): Migrate... (ctf_type_mapping): ... these functions... * ctf-create.c (ctf_add_type_mapping): ... here... (ctf_type_mapping): ... and make static, for the sole use of ctf_add_type.	2021-03-02 15:10:07 +00:00
Nick Alcock	f4f60336da	libctf, include: find types of symbols by name The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions suffice to look up the types of symbols if the caller already has a symbol number. But the caller often doesn't have one of those and only knows the name of the symbol: also, in object files, the caller might not have a useful symbol number in any sense (and neither does libctf: the 'symbol number' we use in that case literally starts at 0 for the lexicographically first-sorted symbol in the symtypetab and counts those symbols, so it corresponds to nothing useful). This means that even though object files have a symtypetab (generated by the compiler or by ld -r), the only way we can look up anything in it is to iterate over all symbols in turn with ctf_symbol_next until we find the one we want. This is unhelpful and pointlessly inefficient. So add a pair of functions to look up symbols by name in a dict and in a whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name. These are identical to the existing functions except that they take symbol names rather than symbol numbers. To avoid insane repetition, we do some refactoring in the process, so that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin wrappers around internal functions that do both lookup by symbol index and lookup by name. This massively reduces code duplication because even the existing lookup-by-index stuff wants to use a name sometimes (when looking up in indexed sections), and the new lookup-by-name stuff has to turn it into an index sometimes (when looking up in non-indexed sections): doing it this way lets us share most of that. The actual name->index lookup is done by ctf_lookup_symbol_idx. We do not anticipate this lookup to be as heavily used as ld.so symbol lookup by many orders of magnitude, so using the ELF symbol hashes would probably take more time to read them than is saved by using the hashes, and it adds a lot of complexity. Instead, do a linear search for the symbol name, caching all the name -> index mappings as we go, so that future searches are likely to hit in the cache. To avoid having to repeat this search over and over in a CTF archive when ctf_arc_lookup_symbol_name is used, have cached archive lookups (the sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator) pick out the first dict they cache in a given archive and store it in a new ctf_archive field, ctfi_crossdict_cache. This can be used to store cross-dictionary cached state that depends on things like the ELF symbol table rather than the contents of any one dict. ctf_lookup_symbol_idx then caches its name->index mappings in the dictionary named in the crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts in the same archive benefit from the previous linear search, and the symtab only needs to be scanned at most once. (Note that if you call ctf_lookup_by_symbol_name in one specific dict, and then follow it with a ctf_arc_lookup_symbol_name, the former will not use the crossdict cache because it's only populated by the dict opens in ctf_arc_lookup_symbol_name. This is harmless except for a small one-off waste of memory and time: it's only a cache, after all. We can fix this later by using the archive caching machinery more aggressively.) In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into a wrapper around a new function that does both index -> ID and name -> ID lookups across all dicts in an archive. We add a new ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that it was found in (so that linear searches for symbols don't need to be repeated): but we also remove a cache, the ctfi_syms cache that was memoizing the actual ctf_id_t returned from every call to ctf_arc_lookup_symbol. This is pointless: all it saves is one call to ctf_lookup_by_symbol, and that's basically an array lookup and nothing more so isn't worth caching. (Equally, given that symbol -> index mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly free after the first call, so there's no point caching the ctf_id_t in that case either.) We fix up one test that was doing manual symbol lookup to use ctf_arc_lookup_symbol instead, and enhance it to check that the caching layer is not totally broken: we also add a new test to do lookups in a .o file, and another to do lookups in an archive with conflicted types and make sure that sort of multi-dict lookup is actually working. include/ChangeLog 2021-02-17 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_arc_lookup_symbol_name): New. (ctf_lookup_by_symbol_name): Likewise. libctf/ChangeLog 2021-02-17 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_symhash>: New. <ctf_symhash_latest>: Likewise. (struct ctf_archive_internal) <ctfi_crossdict_cache>: New. <ctfi_symnamedicts>: New. <ctfi_syms>: Remove. (ctf_lookup_symbol_name): Remove. * ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from parent properly. Make static. (ctf_lookup_symbol_idx): New, linear search for the symbol name, cached in the crossdict cache's ctf_symhash (if available), or this dict's (otherwise). (ctf_try_lookup_indexed): Allow the symname to be passed in. (ctf_lookup_by_symbol): Turn into a wrapper around... (ctf_lookup_by_sym_or_name): ... this, supporting name lookup too, using ctf_lookup_symbol_idx in non-writable dicts. Special-case name lookup in dynamic dicts without reported symbols, which have no symtab or dynsymidx but where name lookup should still work. (ctf_lookup_by_symbol_name): New, another wrapper. * ctf-archive.c (enosym): Note that this is present in ctfi_symnamedicts too. (ctf_arc_close): Adjust for removal of ctfi_syms. Free the ctfi_symnamedicts. (ctf_arc_flush_caches): Likewise. (ctf_dict_open_cached): Memoize the first cached dict in the crossdict cache. (ctf_arc_lookup_symbol): Turn into a wrapper around... (ctf_arc_lookup_sym_or_name): ... this. No longer cache ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but still cache the dicts those lookups succeed in). Add lookup-by-name support, with dicts of successful lookups cached in ctfi_symnamedicts. Refactor the caching code a bit. (ctf_arc_lookup_symbol_name): New, another wrapper. * ctf-open.c (ctf_dict_close): Free the ctf_symhash. * libctf.ver (LIBCTF_1.2): New version. Add ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name. * testsuite/libctf-lookup/enum-symbol.c (main): Use ctf_arc_lookup_symbol rather than looking up the name ourselves. Fish it out repeatedly, to make sure that symbol caching isn't broken. (symidx_64): Remove. (symidx_32): Remove. * testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup in an unlinked object file (indexed symtypetab sections only). * testsuite/libctf-writable/symtypetab-nonlinker-writeout.c (try_maybe_reporting): Check symbol types via ctf_lookup_by_symbol_name as well as ctf_symbol_next. * testsuite/libctf-lookup/conflicting-type-syms.*: New test of lookups in a multi-dict archive.	2021-02-20 16:37:08 +00:00

1 2 3

111 Commits