libctf: dedup: datasecs and vars

These are a bit trickier than previous things. Datasecs are unusual: the content they contain for a given variable is conceptually part of that variable, in that a variable can only appear in one datasec: so if two TUs have different datasec values for a variable, you'll want to emit two conflicting variables with different datasec entries. Equally, if they have entries in different datasecs, they're conflicting. But the *index* of a variable in a datasec has nothing to do with the variable: it's just a property of how many other variables are in the datasec. So we turn the type graph upside down for them. We track the variable -> datasec mappings for every variable we are dedupping, and use this to hash variables with datasec entries *twice*: firstly, as purely variable type, name, and promoted-to-non-extern linkage, and secondly with all of that plus the datasec name, offset and size: we indicate that the non-extern hash *replaces* the extern one, and use this later on. The datasec itself is not hashed at all! We skip it at both hashing and emission time (without breaking anything else, because nothing points at datasecs, so nothing will ever recurse down into one). The popcount code (used to find the "most popular" type, the one to put in the shared dict) changes to say that replaced types (extern vars) popcounts are added to the counts of the types that replace them (the corresponding non-extern vars). At emission time, replaced variables (extern variables) are skipped, ensuring that extern vars with non-conflicting non-extern counterparts are skipped in favour of the non-extern ones. ctf_add_section_variable then takes care of emitting both the var and its corresponding datasec for us.
2025-04-25 19:22:42 +01:00
parent 6b8885cfc9
commit 4db605353c
2 changed files with 380 additions and 23 deletions
--- a/libctf/ctf-impl.h
+++ b/libctf/ctf-impl.h
@@ -285,8 +285,9 @@ typedef struct ctf_dedup
  /* Atoms tables of decorated names: maps undecorated name to decorated name.
     (The actual allocations are in the CTF dict for the former and the real
     atoms table for the latter).  Uses the same namespaces as ctf_lookups,
-     below, but has no need for null-termination.  */
-  ctf_dynhash_t *cd_decorated_names[4];
+     below, with the addition of type and decl tags, and with no need for
+     null-termination.  */
+  ctf_dynhash_t *cd_decorated_names[6];

  /* Map type names to a hash from type hash value -> number of times each value
     has appeared.  Enumeration constants are tracked via the enum they appear
@@ -308,11 +309,21 @@ typedef struct ctf_dedup
     can be cited from multiple TUs.  Only populated in that link mode.  */
  ctf_dynhash_t *cd_struct_origin;

+  /* Maps from the GID of variables on the input to a (type ID, component_idx)
+     pair identifying the corresponding datasec row.  */
+  ctf_dynhash_t *cd_var_datasec;
+
  /* Maps the type hash values of things with linkages (vars, functions) to the
     intended final linkage of that type, accumulated from all types with that
     ID across all inputs (so far).  Subject to hash replacement (see below).  */
  ctf_dynhash_t *cd_linkages;

+  /* Maps from the type hash values of hashes that should be considered
+     "replaced" with the hash values that replace them.  Used to merge types
+     together at conflict-marking and emission time.  Only works for some type
+     kinds: right now, CTF_K_VAR.  */
+  ctf_dynhash_t *cd_replacing_hashes;
+
  /* Maps type hash values to a set of hash values of the types that cite them:
     i.e., pointing backwards up the type graph.  Used for recursive conflict
     marking.  Citations from tagged structures, unions, and forwards do not