include, libctf: string lookup and writeout of a parent-shared strtab

The next stage of strtab sharing is actual lookup of strings in such
strtabs, interning of strings in such strtabs and writing out of
such strtabs (but not actually figuring out which strings should
be shared: that's next).

We introduce several new internal ctf_str_* API functions to augment the
existing rather large set: ctf_str_add_copy, which adds a string and always
makes a copy of it (used when deduplicating to stop dedupped strings holding
permanent references on the input dicts), and ctf_str_no_dedup_ref (which
adds a ref to a string while preventing it from ever being deduplicated,
used for header fields like the parent name, which is the same for almost
all child dicts but had still better not be stored in the parent!).

ctf_strraw_explicit, the ultimate underlying "look up a string" function
that backs ctf_strptr et al, gains the ability to automatically find strings
in the parent if the offset is < cth_parent_strlen, and generally make all
offsets parent-relative (so something at offset 1 in the child strlen will
need to be looked up at offset 257 if cth_parent_strlen is 256).  This
suffices to paste together the parent and child from the perspective
of lookup.

We do quite a lot of new checks in here, simply because it's called all over
the place and it's preferable to emit a nice error into the ctf_err_warning
stream if things go wrong.  Among other things this traps cases where you
accidentally added a string to the parent, throwing off all the offsets.
Completely invalid offsets also now add a message to the err_warning
stream.

Insertion of new atoms (the deduplicated entities underlying strings in a
given dict), already a flag-heavy operation, gains more flags, corresponding
to the new ctf_str_add_copy and ctf_str_no_dedup_ref functions: atom
addition also checks the ctf_max_children set by ctf_import and prevents
addition of new atoms to any dicts with ctf_imported children and an
already-serialized strtab.

strtab writeout gains more checks as well: you can't write out a strtab for
a child dict whose parent hasn't been serialized yet (and thus doesn't have
a serialized strtab itself); you can't write it out if the child already
depended on a shared parent strtab and that strtab has changed length.  The
null atom at offset 0 is only written to the parent strtab; and ref updating
changes to look up offsets in the parent's atoms table iff a new
CTF_STR_ATOM_IN_PARENT flag is set on the atom (this will be set by
deduplication to ensure that serializing a dict will update all its refs
properly even though a bunch of them have moved to the parent dict).

None of this actually has any *effect* yet because no string deduplication
is being carried out, and the cth_parent_strlen is still locked at 0.

include/
	* ctf-api.h (_CTF_ERRORS) [ECTF_NOTSERIALIZED]: New.
        (ECTF_NERR): Updated.

libctf/
	* ctf-impl.h (CTF_STR_ATOM_IN_PARENT): New.
	(CTF_STR_ATOM_NO_DEDUP): Likewise.
	(ctf_str_add_no_dedup_ref): New.
	(ctf_str_add_copy): New.
	* ctf-string.c (ctf_strraw_explicit): Look in parents if necessary:
        use parent-relative offsets.
	(ctf_strptr_validate): Avoid duplicating errors.
	(ctf_str_create_atoms): Update comment.
	(CTF_STR_COPY): New.
	(CTF_STR_NO_DEDUP): Likewise.
	(ctf_str_add_ref_internal): Use them, setting the corresponding
        csa_flags, prohibiting addition to serialized parents, and copying
        strings if so requested.
	(ctf_str_add): Turn into a wrapper around...
	(ctf_str_add_flagged): ... this new function.  The offset is now
        parent-relative.
	(ctf_str_add_ref): Likewise.
	(ctf_str_add_movable_ref): Likewise.
	(ctf_str_add_copy): New.
	(ctf_str_add_no_dedup_ref): New.
	(ctf_str_write_strtab): Prohibit writes when the parent has
        changed length or is not serialized.  Only write the null atom
        to parent strtabs.  Chase refs to the parent if necessary.
This commit is contained in:
Nick Alcock
2024-07-15 21:56:15 +01:00
parent a14fb397b2
commit 3bec4f1f3c
3 changed files with 233 additions and 74 deletions

View File

@@ -202,12 +202,21 @@ typedef struct ctf_err_warning
string, so that ctf_serialize() can instantiate all the strings using the
ctf_str_atoms and then reassociate them with the real string later.
The csa_offset is the offset within *this particular strtab*: no matter
how many strings the parent has, the childrens' csa_offsets are unchanged.
So csa_offset may not be the value actually returned as the offset of this
string.
Strings can be interned into ctf_str_atom without having refs associated
with them, for values that are returned to callers, etc. Items are only
removed from this table on ctf_close(), but on every ctf_serialize(), all
the csa_refs in all entries are purged. */
the csa_refs in all entries are purged. Refs may also be removed if they are
migrated from one atoms table to another as a consequence of strtab
deduplication. */
#define CTF_STR_ATOM_FREEABLE 0x1
#define CTF_STR_ATOM_IN_PARENT 0x2
#define CTF_STR_ATOM_NO_DEDUP 0x4
typedef struct ctf_str_atom
{
@@ -228,7 +237,7 @@ typedef struct ctf_str_atom_ref
uint32_t *caf_ref; /* A single ref to this string. */
} ctf_str_atom_ref_t;
/* Like a ctf_str_atom_ref_t, but specific to movable refs. */
/* Like a ctf_str_atom_ref_t, but specific to movable refs. */
typedef struct ctf_str_atom_ref_movable
{
@@ -750,7 +759,10 @@ extern const char *ctf_strptr_validate (ctf_dict_t *, uint32_t);
extern int ctf_str_create_atoms (ctf_dict_t *);
extern void ctf_str_free_atoms (ctf_dict_t *);
extern uint32_t ctf_str_add (ctf_dict_t *, const char *);
extern uint32_t ctf_str_add_copy (ctf_dict_t *, const char *);
extern uint32_t ctf_str_add_ref (ctf_dict_t *, const char *, uint32_t *ref);
extern uint32_t ctf_str_add_no_dedup_ref (ctf_dict_t *, const char *,
uint32_t *ref);
extern uint32_t ctf_str_add_movable_ref (ctf_dict_t *, const char *,
uint32_t *ref);
extern int ctf_str_move_refs (ctf_dict_t *fp, void *src, size_t len, void *dest);