libctf: dedup: datasecs and vars

These are a bit trickier than previous things.  Datasecs are unusual: the
content they contain for a given variable is conceptually part of that
variable, in that a variable can only appear in one datasec: so if two TUs
have different datasec values for a variable, you'll want to emit two
conflicting variables with different datasec entries.  Equally, if they
have entries in different datasecs, they're conflicting.  But the *index*
of a variable in a datasec has nothing to do with the variable: it's just
a property of how many other variables are in the datasec.

So we turn the type graph upside down for them.  We track the variable ->
datasec mappings for every variable we are dedupping, and use this to hash
variables with datasec entries *twice*: firstly, as purely variable type,
name, and promoted-to-non-extern linkage, and secondly with all of that plus
the datasec name, offset and size: we indicate that the non-extern hash
*replaces* the extern one, and use this later on.  The datasec itself is not
hashed at all!  We skip it at both hashing and emission time (without
breaking anything else, because nothing points at datasecs, so nothing will
ever recurse down into one).

The popcount code (used to find the "most popular" type, the one to put in
the shared dict) changes to say that replaced types (extern vars) popcounts
are added to the counts of the types that replace them (the corresponding
non-extern vars).

At emission time, replaced variables (extern variables) are skipped,
ensuring that extern vars with non-conflicting non-extern counterparts are
skipped in favour of the non-extern ones.  ctf_add_section_variable then
takes care of emitting both the var and its corresponding datasec for us.
This commit is contained in:
Nick Alcock
2025-04-25 19:22:42 +01:00
parent 6b8885cfc9
commit 4db605353c
2 changed files with 380 additions and 23 deletions

View File

@@ -285,8 +285,9 @@ typedef struct ctf_dedup
/* Atoms tables of decorated names: maps undecorated name to decorated name.
(The actual allocations are in the CTF dict for the former and the real
atoms table for the latter). Uses the same namespaces as ctf_lookups,
below, but has no need for null-termination. */
ctf_dynhash_t *cd_decorated_names[4];
below, with the addition of type and decl tags, and with no need for
null-termination. */
ctf_dynhash_t *cd_decorated_names[6];
/* Map type names to a hash from type hash value -> number of times each value
has appeared. Enumeration constants are tracked via the enum they appear
@@ -308,11 +309,21 @@ typedef struct ctf_dedup
can be cited from multiple TUs. Only populated in that link mode. */
ctf_dynhash_t *cd_struct_origin;
/* Maps from the GID of variables on the input to a (type ID, component_idx)
pair identifying the corresponding datasec row. */
ctf_dynhash_t *cd_var_datasec;
/* Maps the type hash values of things with linkages (vars, functions) to the
intended final linkage of that type, accumulated from all types with that
ID across all inputs (so far). Subject to hash replacement (see below). */
ctf_dynhash_t *cd_linkages;
/* Maps from the type hash values of hashes that should be considered
"replaced" with the hash values that replace them. Used to merge types
together at conflict-marking and emission time. Only works for some type
kinds: right now, CTF_K_VAR. */
ctf_dynhash_t *cd_replacing_hashes;
/* Maps type hash values to a set of hash values of the types that cite them:
i.e., pointing backwards up the type graph. Used for recursive conflict
marking. Citations from tagged structures, unions, and forwards do not