Slices had a bunch of horrible usability problems. In particular, while
towers of cv-quals are resolved away by functions that need to do it, towers
of cv-quals with slices in the middle are not resolved away by functions
like ctf_enum_value that can see through slices: resolving volatile -> slice
-> const -> enum will leave it with a 'const', which will error pointlessly,
annoying callers, who reasonably expect slices to be more invisible than
this. (The user-callable ctf_type_resolve still does not resolve away
slices, because this is the only way users can see that the slices are there
at all.)
This is induced by a fix for another wart: ctf_add_enumerator does not
resolve anything away at all, so you can't even add enumerators to const or
volatile enums -- and more problematically, you can't add enumerators to
enums with an explicit encoding without resolving away the types by hand,
since ctf_add_enum_encoded works by returning a slice! ctf_add_enumerator
now resolves away all of those, so any cvr-or-typedef-or-slice-qual
terminating in an enum can be added to, exactly as callers likely expect.
(New tests added.)
libctf/
* ctf-create.c (ctf_add_enumerator): Resolve away cvr-qualness.
* ctf-types.c (ctf_type_resolve_unsliced): Don't terminate at
the first slice.
* testsuite/libctf-writable/slice-of-slice.*: New test.
This commit moves provisional (not-yet-serialized) string refs towards the
scheme to be used for CTF IDs in the future. In particular
- provisional string offsets now count downwards from just under the
external string offset space (all bits on but the high bit). This makes
it possible to detect an overflowing strtab, and also makes it trivial to
determine whether any string offset (ref) updates were missed -- where
before we might get a slightly corrupted or incorrect string, we now get
a huge high strtab offset corresponding to no string, and an error is
emitted at read time.
- refs are emitted at serialization time during the pass through the types.
They are strictly associated with the newly-written-out buffer: the
existing opened CTF dict is not changed, though it does still get the new
strtab so that new refs to the same string can just refer directly to it.
The provisional strtab hash table that contains these strings is not
deleted after serialization (because we might serialize again): instead,
we keep track in the parent of the lowest-yet-used ("latest") provisional
strtab offset, and any strtab offset above that, but not external
(high-bit-on) is considered provisional.
This is sort-of-enforced by moving most of the ref-addition function
declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is
not included by ctf-create.c or ctf-open.c.
- because we don't add refs when adding types, we don't need to handle the
case where we add things to expanding vlens (enums, struct members) and
have to realloc() them. So the entire painful movable refs system can
just be deleted, along with the ability to remove refs piecemeal at all
(purging all of them is still possible). Strings added during type
addition are added via ctf_str_add(), which adds no refs: the strings are
picked up at serialization time and refs to their final, serialized
resting place added. The DTDs never have any refs in them, and their
provisional strtab offsets are never updated by the ref system.
This caused several bugs to fall out of the earlier work and get fixed.
In particular, attempts to look up a string in a child dict now search
the parent's provisional strtab too: we add some extra special casing
for the null string so we don't need to worry about deduplication
moving it somewhere other than offset zero.
Finally, the optimization that removes an unreferenced synthetic external
strtab (the record of the strings the linker has told us about, kept around
internally for lookup during late serialization) is faulty: references to a
strtab entry will only produce CTF-level refs if their value might change,
and an external string's offset won't change, so it produces no refs: worse
yet, even if we did get a ref (say, if the string was originally believed
to be internal and only later were we told that the linker knew about it
too), when we serialize a strtab, all its refs are dropped (since they've
been updated and can no longer change); so if we serialized it a second
time, its synthetic external strtab would be considered empty and dropped,
even though the same external strings as before still exist, referencing
it. We must keep the synthetic external strtab around as long as external
strings exist that reference it, i.e. for the life of the dict.
One benefit of all this: now we're emitting provisional string offsets at
a really high value, it's out of the way of the consecutive, deduplicated
string offsets in child dicts. So we can drop the constraint that you
cannot add strings to a dict with children, which allows us to add types
freely to parent dicts again. What you can't do is write that dict out
again: when we serialize, we currently update the dict being serialized
with the updated strtabs: when you write a dict out, its provisional
strings become real strings, and suddenly the offsets would overlap once
more. But opening a dict and its children, adding to it, and then
writing it out again is rare indeed, and we have a workaround: anyone
wanting to do this can just use ctf_link instead.
The initial_vlen parameter to ctf_add_generic is misnamed: it's not the
initial vlen (the initial number of members of a struct, etc), but rather
the initial size of the vlen region. We have a term for that, vbytes: use
it.
Amazingly this doesn't seem to have caused any bugs to creep in.
Making these functions is unnecessary right now, but will become much
clearer shortly.
While we're at it, we can drop the third child argument to
LCTF_INDEX_TO_TYPE: it's only used for nontrivial purposes that aren't
literally the same as getting the result from the fp in one place,
in ctf_lookup_by_name_internal, and that place is easily fixed by just
looking in the right dictionary in the first place.
They're meant to be inverses, which makes it unfortunate that
they check different bounds. No visible effect yet, since
ctf_typemax and ctf_stypes currently cover the entire type ID
space, but will have an effect shortly.
Parent/child determination is about to become rather more complex, making a
macro impractical. Use the ctf_type_isparent/ischild function calls
everywhere and remove the macro. Make them more const-correct too, to
make them more widely usable.
While we're about it, change several places that hand-implemented
ctf_get_dict() to call it instead, and armour several functions against
the null returns that were always possible in this case (but previously
unprotected-against).
Despite the removal of the separate movable ref list, the ref system as
a whole is more than complex enough to be worth generalizing now that
we are adding different kinds of ref.
Refs now are lists of uint32_t * which can be updated through the
pointer for all entries in the list and moved to new sites for all
pointers in a given range: they are no longer references to string
offsets in particular and can be references to other uint32_t-sized
things instead (note that ctf_id_t is a typedef to a uint32_t).
ctf-string.c has been adjusted accordingly (the adjustments are tiny,
more or less just turning a bunch of references to atom into
&atom->csa_refs).
Ever since pending refs were replaced with movable refs, we were failing
to remove movable ref backpointers properly on ctf_remove_ref. I don't
see how this could cause any problem but a memory leak, but since we
do ultimately write down refs, leaking references to refs is still
risky: best to fix this.
This was added last year to let us maintain a backpointer to the movable
refs dynhash in movable ref atoms without spending space for the
backpointer on the majority of (non-movable) refs and also without
causing an atom which had some refs movable and some refs not movable to
dereference unallocated storage when freed.
The backpointer's only purpose was to let us locate the
ctf_str_movable_refs dynhash during item freeing, when we had nothing
but a pointer to the atom being freed. Now we have a proper freeing
arg, we don't need the backpointer at all: we can just pass a pointer to
the dict in to the atoms dynhash as a freeing arg for the atom freeing
functions, and throw the whole backpointer and separate movable ref list
complexity away.
There are a bunch of places in libctf where the code is complicated
by the fact that freeing a hash key or value requires access to the
dict: more generally, they want an arg pointer to *something*.
But for the sake of being able to use free() as a freeing function,
we can't do this at all times. We also don't want to bloat up the
hash itself with an arg value unless necessary (in the same way we
already avoid storing the key or value freeing functions unless at
least one of them is specified).
So from the outside this change is simple: add a new
ctf_dynhash_create_arg which takes a new sort of freeing function
which takes an argument. Internally, we store the arg only when
the key or owner is set, and cast from the one freeing function
to the other iff the arg is non-NULL. This means it's impossible
to pass a value that may or may not be NULL to the freeing
function, but that's harmless for all current uses, and allows
significant simplifications elsewhere.
Everything in ctf-util.c is in some way associated with data
structures in general in some way, except for the ctf_*_to_link_sym
functions, which are straight translators between the Elf*_Sym
type and the ctf_link_sym_t type used by ctf-link.c.
Move them into ctf-link.c where they belong.
This file is a bit of a grab-bag of dict-wide API functions like
ctf_set_open_errno or ctf_errwarning_next and portabilty functions and
wrappers like ctf_mmap or ctf_pread.
Split the latter out, and move other dict-wide functions that got
stuck in ctf-util.c (because it was so hard to tell the two files
apart) into ctf-api.c where they belong.
The distinction between the citer and citers variables in
ctf_dedup_rhash_type is somewhat opaque (it's a micro-optimization to avoid
having to allocate entire sets when we know in advance that we'll only have
to store one value). Add a comment.
libctf/
* ctf-dedup.c (ctf_dedup_rhash_type): Comment on citers variables.
This commit finally implements strtab deduplication, putting together all
the pieces assembled in the earlier commits.
The magic is entirely localized to ctf_link_write, which preserializes all
the dicts (parent first), and calls ctf_dedup_strings on the parent.
(The error paths get tweaked a bit too.)
Calling ctf_dedup_strings has implications elsewhere: the lifetime rules for
the inputs versus outputs change a bit now that the child output dicts
contain references to the parent dict's atoms table. We also pre-purge
movable refs from all the deduplicated strings before freeing any of this
because movable refs contain backreferences into the dict they came from,
which means the parent contains references to all the children! Purging
the refs first makes those references go away so we can free the children
without creating any wild pointers, even temporarily.
There's a new testcase that identifies a regression whereby offset 0 (the
null string) and index 0 (in children now often the parent dict name,
".ctf") got mixed up, leading to anonymous structs and unions getting the
not entirely C-valid name ".ctf" instead.
May other testcases get adjusted to no longer depend on the precise layout
of the strtab.
TODO: add new tests to verify that strings are actually being deduplicated.
libctf/
* ctf-link.c (ctf_link_write): Deduplicate strings.
* ctf-open.c (ctf_dict_close): Free refs, then the link outputs,
then the out cu_mapping, then the inputs, in that order.
* ctf-string.c (ctf_str_purge_refs): Not static any more.
* ctf-impl.h: Declare it.
ld/
* testsuite/ld-ctf/conflicting-cycle-2.A-1.d: Don't depend on
strtab contents.
* testsuite/ld-ctf/conflicting-cycle-2.A-2.d: Likewise.
* testsuite/ld-ctf/conflicting-cycle-2.parent.d: Likewise.
* testsuite/ld-ctf/conflicting-cycle-3.C-1.d: Likewise.
* testsuite/ld-ctf/conflicting-cycle-3.C-2.d: Likewise.
* testsuite/ld-ctf/anonymous-conflicts*: New test.
This is a pretty simple two-phase process (count duplicates that are
actually going to end up in the strtab and aren't e.g. strings without refs,
strings with external refs etc, and move them into the parent) with one
wrinkle: we sorta-abuse the csa_external_offset field in the deduplicated
child atom (normally used to indicate that this string is located in the ELF
strtab) to indicate that this atom is in the *parent*. If you think of
"external" as meaning simply "is in some other strtab, we don't care which
one", this still makes enough sense to not need to change the name, I hope.
This is still not called from anywhere, so strings are (still!) not
deduplicated, and none of the dedup machinery added in earlier commits does
anything yet.
libctf/
* ctf-dedup.c (ctf_dedup_emit_struct_members): Note that strtab
dedup happens (well) after struct member emission.
(ctf_dedup_strings): New.
* ctf-impl.h (ctf_dedup_strings): Declare.
It is unreasonable to expect users to ctf_import the parent before being
able to understand the header -- doubly so because the only string in the
header which is likely to be deduplicable is the parent name, which is the
same in every child, yet without the parent name being *available* in the
child's strtab you cannot call ctf_parent_name to figure out which parent
to import!
libctf/
* ctf-serialize.c (ctf_preserialize): Prevent deduplication of header string
fields.
* ctf-open.c (ctf_set_base): Note this.
* ctf-string.c (ctf_str_free_atom): Likewise.
The next stage of strtab sharing is actual lookup of strings in such
strtabs, interning of strings in such strtabs and writing out of
such strtabs (but not actually figuring out which strings should
be shared: that's next).
We introduce several new internal ctf_str_* API functions to augment the
existing rather large set: ctf_str_add_copy, which adds a string and always
makes a copy of it (used when deduplicating to stop dedupped strings holding
permanent references on the input dicts), and ctf_str_no_dedup_ref (which
adds a ref to a string while preventing it from ever being deduplicated,
used for header fields like the parent name, which is the same for almost
all child dicts but had still better not be stored in the parent!).
ctf_strraw_explicit, the ultimate underlying "look up a string" function
that backs ctf_strptr et al, gains the ability to automatically find strings
in the parent if the offset is < cth_parent_strlen, and generally make all
offsets parent-relative (so something at offset 1 in the child strlen will
need to be looked up at offset 257 if cth_parent_strlen is 256). This
suffices to paste together the parent and child from the perspective
of lookup.
We do quite a lot of new checks in here, simply because it's called all over
the place and it's preferable to emit a nice error into the ctf_err_warning
stream if things go wrong. Among other things this traps cases where you
accidentally added a string to the parent, throwing off all the offsets.
Completely invalid offsets also now add a message to the err_warning
stream.
Insertion of new atoms (the deduplicated entities underlying strings in a
given dict), already a flag-heavy operation, gains more flags, corresponding
to the new ctf_str_add_copy and ctf_str_no_dedup_ref functions: atom
addition also checks the ctf_max_children set by ctf_import and prevents
addition of new atoms to any dicts with ctf_imported children and an
already-serialized strtab.
strtab writeout gains more checks as well: you can't write out a strtab for
a child dict whose parent hasn't been serialized yet (and thus doesn't have
a serialized strtab itself); you can't write it out if the child already
depended on a shared parent strtab and that strtab has changed length. The
null atom at offset 0 is only written to the parent strtab; and ref updating
changes to look up offsets in the parent's atoms table iff a new
CTF_STR_ATOM_IN_PARENT flag is set on the atom (this will be set by
deduplication to ensure that serializing a dict will update all its refs
properly even though a bunch of them have moved to the parent dict).
None of this actually has any *effect* yet because no string deduplication
is being carried out, and the cth_parent_strlen is still locked at 0.
include/
* ctf-api.h (_CTF_ERRORS) [ECTF_NOTSERIALIZED]: New.
(ECTF_NERR): Updated.
libctf/
* ctf-impl.h (CTF_STR_ATOM_IN_PARENT): New.
(CTF_STR_ATOM_NO_DEDUP): Likewise.
(ctf_str_add_no_dedup_ref): New.
(ctf_str_add_copy): New.
* ctf-string.c (ctf_strraw_explicit): Look in parents if necessary:
use parent-relative offsets.
(ctf_strptr_validate): Avoid duplicating errors.
(ctf_str_create_atoms): Update comment.
(CTF_STR_COPY): New.
(CTF_STR_NO_DEDUP): Likewise.
(ctf_str_add_ref_internal): Use them, setting the corresponding
csa_flags, prohibiting addition to serialized parents, and copying
strings if so requested.
(ctf_str_add): Turn into a wrapper around...
(ctf_str_add_flagged): ... this new function. The offset is now
parent-relative.
(ctf_str_add_ref): Likewise.
(ctf_str_add_movable_ref): Likewise.
(ctf_str_add_copy): New.
(ctf_str_add_no_dedup_ref): New.
(ctf_str_write_strtab): Prohibit writes when the parent has
changed length or is not serialized. Only write the null atom
to parent strtabs. Chase refs to the parent if necessary.
The next stage in sharing the strtab involves tearing two core parts
of libctf into two pieces.
Large parts of init_static_types, called at open time, involve traversing
the types table and initializing the hashtabs used by the type name lookup
functions and the enumerator conflicting checks. If the string table is
partly located in the parent dict, this is obviously not going to work: so
split out that code into a new init_static_types_names function (which
also means moving the wrapper around init_static_types that was used
to simplify the enumerator code into being a wrapper around
init_static_types_names instead) and call that from init_static_types
(for parent dicts, and < v4 dicts), and from ctf_import (for v4 dicts).
At the same time as doing this we arrange to set LCTF_NO_STR (recently
introduced) iff this is a v4 child dict with a nonzero cth_parent_strlen:
this then blocks more or less everything that involves string operations
until a ctf_import has actually imported the strtab it depends on. (No
string oeprations that actually use this have been introduced yet, but
since no string deduplication is happening yet either this is harmless.)
For v4 dicts, at import time we also validate that the cth_parent_strlen has
the same value as the parent's strlen (zero is also a valid value,
indicating a non-shared strtab, as is commonplace in older dicts, dicts
emitted by the compiler, parent dicts etc). This makes ctf_import more
complex, so we simplify things again by dropping all the repeated code in
the obscure used-only-by-ctf_link ctf_import_unref and turning both into
wrappers around an internal function. We prohibit repeated ctf_imports
(except of NULL or the same dict repeatedly), and set up some new fields
which will be used later to prevent people from adding strings to parent
dicts with pre-existing serialized strtabs once they have children imported
into them (which would change their string length and corrupt all those
strtabs).
Serialization also needs to be torn in two. The problem here is that
currently serialization does too much: it emits everything including the
strtab, does things that depend on the strtab being finalized (notably
variable table sorting), and then writes it out. Much of this emission
itself involves strtab writes, so the strtab is not actually complete until
halfway through ctf_serialize. But when deduplicating, we want to use
machinery in ctf-link and ctf-dedup to deduplicate the strtab after it is
complete, and only then write it out.
We could do this via having ctf_serialize call some sort of horrible
callback, but it seems much simpler to just cut ctf_serialize in two,
and introduce a new ctf_preserialize which can optionally be called to do
all this "everything but the strtab" work. (If it's not called,
ctf_serialize calls it itself.)
This means pulling some internal variables out of ctf_serialize into the
ctf_dict_t, and slightly abusing LCTF_NO_STR to mean (in addition to its
"no, you can't do much between opening a child dict and importing its
parent" semantics), "no, you can't do much between calling ctf_preserialize
and ctf_serialize". The requirements of both are not quite identical -- you
definitely can do things that involve string lookups after ctf_preserialize
-- but it serves to stop callers from accidentally adding more types after
the types table has been written out, and that's good enough.
ctf_preserialize isn't public API anyway.
libctf/
* ctf-impl.h (struct ctf_dict) [ctf_serializing_buf]: New.
[ctf_serializing_buf_size]: Likewise.
[ctf_serializing_vars]: Likewise.
[ctf_serializing_nvars]: Likewise.
[ctf_max_children]: Likewise.
(LCTF_PRESERIALIZED): New.
(ctf_preserialize): New.
(ctf_depreserialize): New.
* ctf-open.c (init_static_types): Rename to...
(init_static_types_names): ... this, wrapping a different
function.
(init_static_types_internal): Rename to...
(init_static_types): ... this, and set LCTF_NO_STR if neecessary.
Tear out the name-lookup guts into...
(init_static_types_names_internal): ... this new function. Fix a few
comment typos.
(ctf_bufopen): Emphasise that you cannot rely on looking up strings
at any point in ctf_bufopen any more.
(ctf_dict_close): Free ctf_serializing_buf.
(ctf_import): Turn into a wrapper, calling...
(ctf_import_internal): ... this. Prohibit repeated ctf_imports of
different parent dicts, or "unimporting" by setting it back to NULL
again. Validate the parent we do import using cth_parent_strlen.
Call init_static_types_names if the strtab is shared with the
parent.
(ctf_import_unref): Turn into a wrapper.
* ctf-serialize.c (ctf_serialize): Split out everything before
strtab serialization into...
(ctf_preserialize): ... this new function.
(ctf_depreserialize): New, undo preserialization on error.
The first format difference between v3 and v4 is a cth_parent_strlen header
field. This field (obviously not present in BTF) is populated from the
string table length of the parent at serialization time (protection against
being serialized before the parent is will be added in a later commit in
this series), and will be used at open time to prohibit opening of dicts
with a different strlen (which would corrupt the child's string table
if it was shared with the parent).
For now, just add the field, populate it at serialization time when linking
(when not linking, no deduplication is done and the correct value remains
unchanged), and dump it.
include/
* ctf.h (ctf_header) [cth_parent_strlen]: New.
libctf/
* ctf-dump.c (ctf_dump_header_sizefield): New.
(ctf_dump_header): Use to dump the cth_parent_strlen.
* ctf-open.c (upgrade_header_v2): Populate cth_parent_strlen.
(upgrade_header_v3): Likewise.
(ctf_flip_header): Flip it.
(ctf_bufopen): Drop unnecessary initialization.
* ctf-serialize.c (ctf_serialize): Write it out when linking.
ld/
* testsuite/ld-ctf/data-func-conflicted-vars.d: Skip the nwe dump output.
* testsuite/ld-ctf/data-func-conflicted.d: Likewise.
We are about to add machinery that deduplicates a child dict's strtab
against its parent. Obviously if you open such a dict but do not import its
parent, all strtab lookups must fail: so add an LCTF_NO_STR flag that is set
in that window and make most operations fail if it's not set. (Two more
that will be set in future commits are serialization and string lookup
itself.)
Notably, not all symbol lookup is impossible in this window: you can still
look up by symbol index, as long as this dict is not using an indexed
strtypetab (which obviously requires string lookups to get the symbol name).
include/
* ctf-api.h (_CTF_ERRORS) [ECTF_HASPARENT]: New.
[ECTF_WRONGPARENT]: Likewise.
(ECTF_NERR): Update.
Update comments to note the new limitations on ctf_import et al.
libctf/
* ctf-impl.h (LCTF_NO_STR): New.
* ctf-create.c (ctf_rollback): Error out when LCTF_NO_STR.
(ctf_add_generic): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_unknown): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_add_type): Likewise (on either dict).
* ctf-dump.c (ctf_dump): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Likewise.
(ctf_lookup_variable): Likewise. Likewise.
(ctf_lookup_enumerator): Likewise.
(ctf_lookup_enumerator_next): Likewise.
(ctf_symbol_next): Likewise.
(ctf_lookup_by_sym_or_name): Likewise, if doing indexed lookups.
* ctf-types.c (ctf_member_next): Likewise.
(ctf_enum_next): Likewise.
(ctf_type_aname): Likewise.
(ctf_type_name_raw): Likewise.
(ctf_type_compat): Likewise, for either dict.
(ctf_member_info): Likewise.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_variable_next): Note that we don't need to test LCTF_NO_STR.
We are about to move to a regime where there are very few things you can do
with most dicts before you ctf_import them. So emit a warning if
ctf_archive_next()'s convenience ctf_import of parents fails. Rip out the
buggy code in ctf_link_deduplicating_open_inputs which opened the parent by
hand (with a hardwired name), and instead rely on ctf_archive_next to do it
for us (which also means we don't end up opening it twice, once in
ctf_archive_next, once in ctf_link_deduplicating_open_inputs).
While we're there, arrange to close the inputs we already opened if opening
of some inputs fails, rather than leaking them. (There are still some leaks
here, so add a comment to remind us to clean them up later.)
libctf/
* ctf-archive.c (ctf_arc_import_parent): Emit a warning if importing
fails.
* ctf-link.c (ctf_link_deduplicating_open_inputs): Rely on the
ctf_archive_next to open parent dicts.
This format is a superset of BTF, but for now we just do the minimum to
declare a new file format version, without actually introducing any format
changes.
From now on, we refuse to reserialize CTFv1 dicts: these have a distinct
parent/child boundary which obviously cannot change upon reserialization
(that would change the type IDs): instead, we encoded this by stuffing in
a unique CTF version for such dicts. We can't do that now we have one
version for all CTFv4 dicts, and testing such old dicts is very hard these
days anyway, and is not automated: so just drop support for writing them out
entirely. (You still *can* write them out, but you have to do a full-blown
ctf_link, which generates an all-new fresh dict and recomputes type IDs as
part of deduplication.)
To prevent this extremely-not-ready format escaping into the wild, add a
new mechanism whereby any format version higher than the new #define
CTF_STABLE_VERSION cannot be serialized unless I_KNOW_LIBCTF_IS_UNSTABLE is
set in the environment.
include/
* ctf-api.h (_CTF_ERRORS) [ECTF_CTFVERS_NO_SERIALIZE]: New.
[ECTF_UNSTABLE]: New.
(ECTF_NERR): Update.
* ctf.h: Small comment improvements..
(ctf_header_v3): New, copy of ctf_header.
(CTF_VERSION_4): New.
(CTF_VERSION): Now CTF_VERSION_4.
(CTF_STABLE_VERSION): Still 4, CTF_VERSION_3.
ld/
* testsuite/ld-ctf/*.d: Update to CTF_VERSION_4.
libctf/
* ctf-impl.h (LCTF_NO_SERIALIZE): New.
* ctf-dump.c (ctf_dump_header): Add CTF_VERSION_4.
* ctf-open.c (ctf_dictops): Likewise.
(upgrade_header): Rename to...
(upgrade_header_v2): ... this.
(upgrade_header_v3): New.
(upgrade_types): Support upgrading from CTF_VERSION_3.
Turn on LCTF_NO_SERIALIZE for CTFv1.
(init_static_types_internal): Upgrade all types tables older than
* CTF_VERSION_4.
(ctf_bufopen): Support CTF_VERSION_4: error out if we forget to
update this switch in future. Add header upgrading from v3 and
below. Improve comments slightly.
* ctf-serialize.c (ctf_serialize): Block serialization of unstable
file formats, and of file formats for which LCTF_NO_SERIALIZE is
turned on (v1).
When building gdb, I run into:
...
ctf-spec.texi:809: warning: @xref should not appear on @multitable line
...
The line in question is:
...
@multitable {Kind} {@code{CTF_K_VOLATILE}} {Indicates a type that cannot be represented in CTF, or that} {@xref{Pointers typedefs and cvr-quals}}
...
which defines a prototype row with 4 columns.
However, the table only has 3 colums:
...
@headitem Kind @tab Macro @tab Purpose
...
Fix the warning by removing the item in the prototype row representing a fourth column.
Tested on aarch64-linux.
PR libctf/32044
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32044
Previous code included the full $srcdir pathnames in the individual
subtest PASS/FAIL names, which makes it difficult to compute
comparisons or regressions between test runs on different machines.
This version switches to the basename only, which are common.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
This failed to properly byteswap its return value.
The ctf_archive format predates the idea of "just write natively and
flip on open", and byteswaps all over the place. It's too easy to
forget one. The next revision of the archive format (not versioned,
so we just tweak the magic number instead) should be native-endianned
like the dicts inside it are.
libctf/
* ctf-archive.c (ctf_archive_count): Byteswap return value.
If you asprintf something and then use it only as input to another asprintf,
it helps to free it afterwards.
libctf/
* ctf-dump.c (ctf_dump_header): Free the flagstr after use.
(ctf_dump): Make a NULL return slightly clearer.
A bug in ctf_dtd_delete led to refs in the string table to the
names of non-root-visible types not being removed when the DTD
was. This seems harmless, but actually it would lead to a write
down a pointer into freed memory if such a type was ctf_rollback()ed
over and then the dict was serialized (updating all the refs as the
strtab was serialized in turn).
Bug introduced in commit fe4c2d5563
("libctf: create: non-root-visible types should not appear in name tables")
which is included in binutils 2.35.
libctf/
* ctf-create.c (ctf_dtd_delete): Remove refs for all types
with names, not just root-visible ones.
The dict and archive opening code in libctf is somewhat unusual, because
unlike everything else, it cannot report errors by setting an error on the
dict, because in case of error there isn't one. They get passed an error
integer pointer that is set on error instead.
Inside ctf_bufopen this is implemented by calling ctf_set_open_errno and
passing it a positive error value. In turn this means that most things it
calls (including init_static_types) return zero on success and a *positive*
ECTF_* or errno value on error.
This trickles down to ctf_dynhash_insert_type, which is used by
init_static_types to add newly-detected types to the name tables. This was
returning the error value it received from a variety of functions without
alteration. ctf_dynhash_insert conformed to this contract by returning a
positive value on error (usually OOM), which is unfortunate for multiple
reasons:
- ctf_dynset_insert returns a *negative* value
- ctf_dynhash_insert and ctf_dynset_insert don't take an fp, so the value
they return is turned into the errno, so it had better be right, callers
don't just check for != 0 here
- more or less every single caller of ctf_dyn*_insert in libctf other than
ctf_dynhash_insert_type (and there are a *lot*, mostly in the
deduplicator) assumes that ctf_dynhash_insert returns a negative value
on error, even though it doesn't. In practice the only possible error is
OOM, but if OOM does happen we end up with a nonsense error value.
The simplest fix for this seems to be to make ctf_dynhash_insert and
ctf_dynset_insert conform to the usual interface contract: negative
values are errors. This in turn means that ctf_dynhash_insert_type
needs to change: let's make it consistent too, returning a negative
value on error, putting the error on the fp in non-negated form.
init_static_types_internal adapts to this by negating the error return from
ctf_dynhash_insert_type, so the value handed back to ctf_bufopen is still
positive: the new call site in ctf_track_enumerator does not need to change.
(The existing tests for this reliably detect when I get it wrong.
I know, because they did.)
libctf/
* ctf-hash.c (ctf_dynhash_insert): Negate return value.
(ctf_dynhash_insert_type): Set de-negated error on the dict:
return negated error.
* ctf-open.c (init_static_types_internal): Adapt to this change.
The recent change to detect duplicate enum values and return ECTF_DUPLICATE
when found turns out to perturb a great many callers. In particular, the
pahole-created kernel BTF has the same problem we historically did, and
gleefully emits duplicated enum constants in profusion. Handling the
resulting duplicate errors from BTF -> CTF converters reasonably is
unreasonably difficult (it amounts to forcing them to skip some types or
reimplement the deduplicator).
So let's step back a bit. What we care about mostly is that the
deduplicator treat enums with conflicting enumeration constants as
conflicting types: programs that want to look up enumeration constant ->
value mappings using the new APIs to do so might well want the same checks
to apply to any ctf_add_* operations they carry out (and since they're
*using* the new APIs, added at the same time as this restriction was
imposed, there is likely to be no negative consequence of this).
So we want some way to allow processes that know about duplicate detection
to opt into it, while allowing everyone else to stay clear of it: but we
want ctf_link to get this behaviour even if its caller has opted out.
So add a new concept to the API: dict-wide CTF flags, set via
ctf_dict_set_flag, obtained via ctf_dict_get_flag. They are not bitflags
but simple arbitrary integers and an on/off value, stored in an unspecified
manner (the one current flag, we translate into an LCTF_* flag value in the
internal ctf_dict ctf_flags word). If you pass in an invalid flag or value
you get a new ECTF_BADFLAG error, so the caller can easily tell whether
flags added in future are valid with a particular libctf or not.
We check this flag in ctf_add_enumerator, and set it around the link
(including on child per-CU dicts). The newish enumerator-iteration test is
souped up to check the semantics of the flag as well.
The fact that the flag can be set and unset at any time has curious
consequences. You can unset the flag, insert a pile of duplicates, then set
it and expect the new duplicates to be detected, not only by
ctf_add_enumerator but also by ctf_lookup_enumerator. This means we now
have to maintain the ctf_names and conflicting_enums enum-duplication
tracking as new enums are added, not purely as the dict is opened.
Move that code out of init_static_types_internal and into a new
ctf_track_enumerator function that addition can also call.
(None of this affects the file format or serialization machinery, which has
to be able to handle duplicate enumeration constants no matter what.)
include/
* ctf-api.h (CTF_ERRORS) [ECTF_BADFLAG]: New.
(ECTF_NERR): Update.
(CTF_STRICT_NO_DUP_ENUMERATORS): New flag.
(ctf_dict_set_flag): New function.
(ctf_dict_get_flag): Likewise.
libctf/
* ctf-impl.h (LCTF_STRICT_NO_DUP_ENUMERATORS): New flag.
(ctf_track_enumerator): Declare.
* ctf-dedup.c (ctf_dedup_emit_type): Set it.
* ctf-link.c (ctf_create_per_cu): Likewise.
(ctf_link_deduplicating_per_cu): Likewise.
(ctf_link): Likewise.
(ctf_link_write): Likewise.
* ctf-subr.c (ctf_dict_set_flag): New function.
(ctf_dict_get_flag): New function.
* ctf-open.c (init_static_types_internal): Move enum tracking to...
* ctf-create.c (ctf_track_enumerator): ... this new function.
(ctf_add_enumerator): Call it.
* libctf.ver: Add the new functions.
* testsuite/libctf-lookup/enumerator-iteration.c: Test them.
We set this flag at the top of ctf_link_write (to tell ctf_serialize, way
down under the archive file writing functions, to do the various link- time
serialization things like symbol filtering and the like), but we never
remember to clear it except on error. This is probably bad if you want to
serialize the dict yourself directly in the future after linking it (which
is... definitely a *possible* use of the API, if rather strange).
libctf/
* ctf-link.c (ctf_link_write): Clear LCTF_LINKING before exit.
libctf's dynsets are a straight wrapper around libiberty hashtab, storing
the key directly in the hashtab slot. However, we'd often like to be able
to store 0 and 1 (HTAB_EMPTY_ENTRY and HTAB_DELETED_ENTRY) in there, so we
move them out of the way and replace them with huge unlikely values
instead. Unfortunately we failed to do this replacement in one place, so
insertion of 0 or 1 ended up misinforming the hashtab machinery that an
entry was empty or deleted when it wasn't.
libctf/
* ctf-hash.c (ctf_dynset_insert): Call key_to_internal properly.
Drop an unnecessary variable, and fix a buggy comment.
No effect on generated code.
libctf/
* ctf-dedup.c (ctf_dedup_detect_name_ambiguity): Drop unnecessary
variable.
(ctf_dedup_rwalk_output_mapping): Fix comment.
Commit 483546ce4f ("libctf: make ctf_serialize() actually serialize")
accidentally broke dict compression. There were two bugs:
- ctf_arc_write_one_ctf was still making its own decision about
whether to compress the dict via direct ctf_size comparison, which is
unfortunate because now that it no longer calls ctf_serialize itself,
ctf_size is always zero when it does this: it should let the writing
functions decide on the threshold, which they contain code to do which is
simply not used for lack of one trivial wrapper to write to an fd and
also provide a compression threshold
- ctf_write_mem, the function underlying all writing as of the commit
above, was calling zlib's compressBound and avoiding compression if this
returned a value larger than the input. Unfortunately compressBound does
not do a trial compression and determine whether the result is
compressible: it just adds zlib header sizes to the value passed in, so
our test would *always* have concluded that the value was incompressible!
Avoid by simply always compressing if the raw size is larger than the
threshold: zlib is quite clever enough to avoid actually compressing
if the data is incompressible.
Add a testcase for this.
libctf/
* ctf-impl.h (ctf_write_thresholded): New...
* ctf-serialize.c (ctf_write_thresholded): ... defined here,
a wrapper around...
(ctf_write_mem): ... this. Don't check compressibility.
(ctf_compress_write): Reimplement as a ctf_write_thresholded
wrapper.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Just call
ctf_write_thresholded rather than trying to work out whether
to compress.
* testsuite/libctf-writable/ctf-compressed.*: New test.
If you deduplicate non-root-visible types, the resulting type should still
be non-root-visible! We were promoting all such types to root-visible, and
re-demoting them only if their names collided (which might happen on
cu-mapped links if multiple compilation units with conflicting types are
fused into one child dict).
This "worked" before now, in that linking at least didn't fail (if you don't
mind having your non-root flag value destroyed if you're adding
non-root-visible types), but now that conflicting enumerators cause their
containing enums to become conflicted (enums which might have *different
names*), this caused the linker to crash when it hit two enumerators with
conflicting values.
Not testable in ld because cu-mapped links are not exposed to ld, but can be
tested via direct creation of libraries and calls to ctf_link directly.
(This also tests the ctf_dump non-root type printout, which before now
was untested.)
libctf/
* ctf-dedup.c (ctf_dedup_emit_type): Non-root-visible input types
should be emitted as non-root-visible output types.
* testsuite/libctf-writable/ctf-nonroot-linking.c: New test.
* testsuite/libctf-writable/ctf-nonroot-linking.lk: New test.
The flag test when dumping non-root-visible tyeps was doubly wrong: the
flags word is a *bitfield* containing CTF_ADD_ROOT as one possible
value, so needs | and & testing, not just ==, and CTF_ADD_NONROOT is 0,
so cannot be tested for this way: one must check for the non-presence of
CTF_ADD_ROOT.
libctf/
* ctf-dump.c (ctf_dump_format_type): Fix non-root flag test.
In commit 149ce5c263 we introduced the concept of "movable" refs,
which are refs that can be moved in batches, to let us maintain valid ref
lists even when adding refs to blocks of memory that can be realloced (which
is any type containing a vlen which can expand, like names contained within
enum or struct members). Movable refs need a backpointer to the movable
refs dynhash for this dict; since non-movable refs are very common, we tried
to save memory by having a slightly bigger struct for moveable refs with a
backpointer in it, and casting appropriately, indicating which sort of ref
we were dealing with via a flag on the atom.
Unfortunately this doesn't work reliably, because you can perfectly well
have a string ("foo", say) which has both non-movable refs (say, an external
symbol and a variable name) and movable refs (say, a structure member name)
to the same atom. Indicate which struct we're dealing with with an atom
flag and suddenly you're casting a ctf_str_atom_ref to a
ctf_str_atom_ref_movable (which is bigger) and dereferencing random memory
off the end of it and interpreting it as a backpointer to the movable refs
dynhash. This is unlikely to work well.
So bite the bullet and split refs into two separate lists, one for movable
refs, one for immovable refs. It means some annoying code duplication, but
there's not very much of it, and it means we can keep the movable refs
hashtab (which in turn means we don't have to do linear searches to find all
relevant refs when moving refs, which in turn means that
structure/union/enum member additions remain amortized O(n) time, not
O(n^2).
Callers can now purge movable and non-movable refs independently of each
other. We don't use this yet, but a use is coming.
libctf/
* ctf-impl.h (CTF_STR_ATOM_MOVABLE): Delete.
(struct ctf_str_atom) [csa_movable_refs]: New.
(struct ctf_dict): Adjust comment.
(ctf_str_purge_refs): Add MOVABLE arg.
* ctf-string.c (ctf_str_purge_movable_atom_refs): Split out of...
(ctf_str_purge_atom_refs): ... this.
(ctf_str_free_atom): Call it.
(ctf_str_purge_one_atom_refs): Likewise.
(aref_create): Adjust accordingly.
(ctf_str_move_refs): Likewise.
(ctf_str_remove_ref): Remove movable refs too, including
deleting the ref from ctf_str_movable_refs.
(ctf_str_purge_refs): Add MOVABLE arg.
(ctf_str_update_refs): Update movable refs.
(ctf_str_write_strtab): Check, and purge, movable refs.
The PARENTS arg is carefully passed down through all the layers of hash
functions and then never used for anything. (In the distant past it was
used for cycle detection, but the algorithm eventually committed doesn't
need to do cycle detection...)
The PARENTS arg is still used by ctf_dedup_emit(), but even there we can
loosen the requirements and state that you can just leave entries
corresponding to dicts with no parents at zero (which will be useful
in an upcoming commit).
libctf/
* ctf-dedup.c (ctf_dedup_hash_type): Drop PARENTS arg.
(ctf_dedup_rhash_type): Likewise.
(ctf_dedup): Likewise.
(ctf_dedup_emit_struct_members): Mention what you can do to
PARENTS entries for parent dicts.
* ctf-impl.h (ctf_dedup): Adjust accordingly.
* ctf-link.c (ctf_link_deduplicating_per_cu): Likewise.
(ctf_link_deduplicating): Likewise.
The worry that caused this to not be supported was because we don't
bother endian-flipping version-related fields before checking them.
But they're all unsigned chars anyway, and don't need any flipping at
all.
This should be supported and should already work. Enable it.
libctf/
* ctf-open.c (ctf_bufopen): Don't prohibit foreign-endian
upgrades.
Most of these are harmless, but some of the type confusions and especially
a missing ctf_strerror() on an error path were actual bugs that could
have resulted in test failures crashing rather than printing an error
message.
libctf/
* testsuite/libctf-lookup/enumerator-iteration.c: Fix type
confusion, signedness confusion and a missing ctf_errmsg().
* testsuite/libctf-regression/libctf-repeat-cu-main.c: Return 0 from
the test function.
* testsuite/libctf-regression/open-error-free.c: Fix signedness
confusion.
* testsuite/libctf-regression/zrewrite.c: Remove unused label.
Three new functions for looking up the enum type containing a given
enumeration constant, and optionally that constant's value.
The simplest, ctf_lookup_enumerator, looks up a root-visible enumerator by
name in one dict: if the dict contains multiple such constants (which is
possible for dicts created by older versions of the libctf deduplicator),
ECTF_DUPLICATE is returned.
The next simplest, ctf_lookup_enumerator_next, is an iterator which returns
all enumerators with a given name in a given dict, whether root-visible or
not.
The most elaborate, ctf_arc_lookup_enumerator_next, finds all
enumerators with a given name across all dicts in an entire CTF archive,
whether root-visible or not, starting looking in the shared parent dict;
opened dicts are cached (as with all other ctf_arc_*lookup functions) so
that repeated use does not incur repeated opening costs.
All three of these return enumerator values as int64_t: unfortunately, API
compatibility concerns prevent us from doing the same with the other older
enum-related functions, which all return enumerator constant values as ints.
We may be forced to add symbol-versioning compatibility aliases that fix the
other functions in due course, bumping the soname for platforms that do not
support such things.
ctf_arc_lookup_enumerator_next is implemented as a nested ctf_archive_next
iterator, and inside that, a nested ctf_lookup_enumerator_next iterator
within each dict. To aid in this, add support to ctf_next_t iterators for
iterators that are implemented in terms of two simultaneous nested iterators
at once. (It has always been possible for callers to use as many nested or
semi-overlapping ctf_next_t iterators as they need, which is one of the
advantages of this style over the _iter style that calls a function for each
thing iterated over: the iterator change here permits *ctf_next_t iterators
themselves* to be implemented by iterating using multiple other iterators as
part of their internal operation, transparently to the caller.)
Also add a testcase that tests all these functions (which is fairly easy
because ctf_arc_lookup_enumerator_next is implemented in terms of
ctf_lookup_enumerator_next) in addition to enumeration addition in
ctf_open()ed dicts, ctf_add_enumerator duplicate enumerator addition, and
conflicting enumerator constant deduplication.
include/
* ctf-api.h (ctf_lookup_enumerator): New.
(ctf_lookup_enumerator_next): Likewise.
(ctf_arc_lookup_enumerator_next): Likewise.
libctf/
* libctf.ver: Add them.
* ctf-impl.h (ctf_next_t) <ctn_next_inner>: New.
* ctf-util.c (ctf_next_copy): Copy it.
(ctf_next_destroy): Destroy it.
* ctf-lookup.c (ctf_lookup_enumerator): New.
(ctf_lookup_enumerator_next): New.
* ctf-archive.c (ctf_arc_lookup_enumerator_next): New.
* testsuite/libctf-lookup/enumerator-iteration.*: New test.
* testsuite/libctf-lookup/enum-ctf-2.c: New test CTF, used by the
above.