libctf: tear opening and serialization in two

The next stage in sharing the strtab involves tearing two core parts
of libctf into two pieces.

Large parts of init_static_types, called at open time, involve traversing
the types table and initializing the hashtabs used by the type name lookup
functions and the enumerator conflicting checks.  If the string table is
partly located in the parent dict, this is obviously not going to work: so
split out that code into a new init_static_types_names function (which
also means moving the wrapper around init_static_types that was used
to simplify the enumerator code into being a wrapper around
init_static_types_names instead) and call that from init_static_types
(for parent dicts, and < v4 dicts), and from ctf_import (for v4 dicts).

At the same time as doing this we arrange to set LCTF_NO_STR (recently
introduced) iff this is a v4 child dict with a nonzero cth_parent_strlen:
this then blocks more or less everything that involves string operations
until a ctf_import has actually imported the strtab it depends on.  (No
string oeprations that actually use this have been introduced yet, but
since no string deduplication is happening yet either this is harmless.)

For v4 dicts, at import time we also validate that the cth_parent_strlen has
the same value as the parent's strlen (zero is also a valid value,
indicating a non-shared strtab, as is commonplace in older dicts, dicts
emitted by the compiler, parent dicts etc).  This makes ctf_import more
complex, so we simplify things again by dropping all the repeated code in
the obscure used-only-by-ctf_link ctf_import_unref and turning both into
wrappers around an internal function.  We prohibit repeated ctf_imports
(except of NULL or the same dict repeatedly), and set up some new fields
which will be used later to prevent people from adding strings to parent
dicts with pre-existing serialized strtabs once they have children imported
into them (which would change their string length and corrupt all those
strtabs).

Serialization also needs to be torn in two.  The problem here is that
currently serialization does too much: it emits everything including the
strtab, does things that depend on the strtab being finalized (notably
variable table sorting), and then writes it out.  Much of this emission
itself involves strtab writes, so the strtab is not actually complete until
halfway through ctf_serialize.  But when deduplicating, we want to use
machinery in ctf-link and ctf-dedup to deduplicate the strtab after it is
complete, and only then write it out.

We could do this via having ctf_serialize call some sort of horrible
callback, but it seems much simpler to just cut ctf_serialize in two,
and introduce a new ctf_preserialize which can optionally be called to do
all this "everything but the strtab" work.  (If it's not called,
ctf_serialize calls it itself.)

This means pulling some internal variables out of ctf_serialize into the
ctf_dict_t, and slightly abusing LCTF_NO_STR to mean (in addition to its
"no, you can't do much between opening a child dict and importing its
parent" semantics), "no, you can't do much between calling ctf_preserialize
and ctf_serialize". The requirements of both are not quite identical -- you
definitely can do things that involve string lookups after ctf_preserialize
-- but it serves to stop callers from accidentally adding more types after
the types table has been written out, and that's good enough.
ctf_preserialize isn't public API anyway.

libctf/
	* ctf-impl.h (struct ctf_dict) [ctf_serializing_buf]: New.
        [ctf_serializing_buf_size]: Likewise.
        [ctf_serializing_vars]: Likewise.
        [ctf_serializing_nvars]: Likewise.
        [ctf_max_children]: Likewise.
	(LCTF_PRESERIALIZED): New.
	(ctf_preserialize): New.
	(ctf_depreserialize): New.
	* ctf-open.c (init_static_types): Rename to...
	(init_static_types_names): ... this, wrapping a different
        function.
        (init_static_types_internal): Rename to...
        (init_static_types): ... this, and set LCTF_NO_STR if neecessary.
        Tear out the name-lookup guts into...
	(init_static_types_names_internal): ... this new function. Fix a few
        comment typos.
	(ctf_bufopen): Emphasise that you cannot rely on looking up strings
        at any point in ctf_bufopen any more.
	(ctf_dict_close): Free ctf_serializing_buf.
	(ctf_import): Turn into a wrapper, calling...
	(ctf_import_internal): ... this.  Prohibit repeated ctf_imports of
        different parent dicts, or "unimporting" by setting it back to NULL
        again.  Validate the parent we do import using cth_parent_strlen.
        Call init_static_types_names if the strtab is shared with the
        parent.
	(ctf_import_unref): Turn into a wrapper.
	* ctf-serialize.c (ctf_serialize): Split out everything before
        strtab serialization into...
	(ctf_preserialize): ... this new function.
	(ctf_depreserialize): New, undo preserialization on error.
This commit is contained in:
Nick Alcock
2024-07-15 21:11:40 +01:00
parent 6c77689963
commit a14fb397b2
3 changed files with 278 additions and 135 deletions

View File

@@ -399,6 +399,10 @@ struct ctf_dict
unsigned char *ctf_dynbase; /* Freeable CTF file pointer. */
unsigned char *ctf_buf; /* Uncompressed CTF data buffer. */
size_t ctf_size; /* Size of CTF header + uncompressed data. */
unsigned char *ctf_serializing_buf; /* CTF buffer in mid-serialization. */
size_t ctf_serializing_buf_size; /* Length of that buffer. */
ctf_varent_t *ctf_serializing_vars; /* Unsorted vars in mid-serialization. */
size_t ctf_serializing_nvars; /* Number of those vars. */
uint32_t *ctf_sxlate; /* Translation table for unindexed symtypetab
entries. */
unsigned long ctf_nsyms; /* Number of entries in symtab xlate table. */
@@ -440,6 +444,7 @@ struct ctf_dict
uint32_t ctf_parmax; /* Highest type ID of a parent type. */
uint32_t ctf_refcnt; /* Reference count (for parent links). */
uint32_t ctf_flags; /* Libctf flags (see below). */
uint32_t ctf_max_children; /* Max number of child dicts. */
int ctf_errno; /* Error code for most recent error. */
int ctf_version; /* CTF data version. */
ctf_dynhash_t *ctf_dthash; /* Hash of dynamic type definitions. */
@@ -606,6 +611,7 @@ struct ctf_next
#define LCTF_STRICT_NO_DUP_ENUMERATORS 0x0004 /* Duplicate enums prohibited. */
#define LCTF_NO_STR 0x0008 /* No string lookup possible yet. */
#define LCTF_NO_SERIALIZE 0x0010 /* Serialization of this dict prohibited. */
#define LCTF_PRESERIALIZED 0x0020 /* Already serialized all but the strtab. */
extern ctf_dynhash_t *ctf_name_table (ctf_dict_t *, int);
extern const ctf_type_t *ctf_lookup_by_id (ctf_dict_t **, ctf_id_t);
@@ -753,6 +759,9 @@ extern void ctf_str_remove_ref (ctf_dict_t *, const char *, uint32_t *ref);
extern void ctf_str_rollback (ctf_dict_t *, ctf_snapshot_id_t);
extern const ctf_strs_writable_t *ctf_str_write_strtab (ctf_dict_t *);
extern int ctf_preserialize (ctf_dict_t *fp);
extern void ctf_depreserialize (ctf_dict_t *fp);
extern struct ctf_archive_internal *
ctf_new_archive_internal (int is_archive, int unmap_on_close,
struct ctf_archive *, ctf_dict_t *,