libctf: archive: format v2

This commit does a bunch of things, all tangled together tightly enough that
disentangling them seemed no to be worth doing.

The biggest is a new archive format, v2, identified by a magic number which
is one higher than the v1 format's magic number.  As usual with libctf we
can only write out the new format, but can still read the old one.

The new format has multiple improvements over the old:

 - It is written native-endian and aggressively endian-swapped at open time,
   just like CTF and BTF dicts; format v1 was little-endian, necessitating
   byteswapping all over the place at read and write time rather than
   localized in one pair of functions at read time.

 - The modent array of name-offset -> archive-offset mappings for the CTF
   archives is explicitly pointed at via a new ctfa_modents header member
   rather than just starting after the end of the header.

 - The length that prepends each archive member actually indicates its
   length rather than always being sizeof (uint64_t) bytes too high (this
   was an outright bug)

 - There is a new shared properties table which in future we may be able to
   use to unify common values from the constituent CTF headers, reducing the
   size overhead of these (repeated, uncompressed) entities.  Right now it
   only contains one value, parent_name, which is the parent dict name if
   one is common across all dicts in the archive (always true for any
   archives derived from ctf_link()).  This is used to let
   ctf_archive_next() et al reliably open dicts in the archive even if they
   are child BTF dicts (which do not contain a header name).

   The properties table shares its property names with the CTF members,
   and uses the same format (and shared code) for the property values as for
   CTF archive members: length-prepended.  The archive members and
   name->value table ("modents") use distinct tables for properties and CTF
   dicts, to ensure they are spatially separated in the file, to maximize
   compressibility if we end up with a lot of properties and people compress
   the whole thing.

We can also restrict various old bug-workaround kludges that only apply to
dicts found in v1 archives: in particular, we needed to dig out the preamble
of some CTF dicts without opening them to figure out whether they used the
.dynstr or .strtab sections: this whole bug workaround is now unnecessary
for v2 and above.

There are other changes for readability and consistency:

 - The archive wrapper data structure, known outside ctf-archive.c as
   ctf_archive_t, is now consistently referred to inside ctf-archive.c as
   'struct ctf_archive_internal' and given the parameter name 'arci' rather
   than sometimes using ctf_archive_t and sometimes using 'wrapper' or 'arc'
   as parameter names.  The archive itself is always called 'struct
   ctf_archive' to emphasise that it is *not* a ctf_archive_t.
   ctf_archive_t remains the public typedef: the fact that it's not actually
   the same thing as the archive file format is an internal implementation
   detail.

 - We keep the archive header around in a new ctfi_hdr member, distinct
   from the actual archive itself, to make upgrading from v1 and cross-
   endianness support easier.  The archive itself is now kept as a char *
   and used only to root pointer arithmetic.
This commit is contained in:
Nick Alcock
2025-05-28 13:42:11 +01:00
parent 4bdc7aed03
commit 16e0dd9aab
5 changed files with 636 additions and 358 deletions

View File

@@ -829,17 +829,36 @@ typedef struct ctf_enum64
greater care taken with integral types. All CTF files in an archive
must have the same data model. (This is not validated.)
All integers in this structure are stored in little-endian byte order.
All integers in the ctfa_archive_v1 structure are stored in little-endian byte
order.
The code relies on the fact that everything in this header is a uint64_t
and thus the header needs no padding (in particular, that no padding is
needed between ctfa_ctfs and the unnamed ctfa_archive_modent array
that follows it).
The generation code relies on the fact that everything in this header is a
uint64_t and thus the header needs no padding (in particular, that no padding
is needed between ctfa_ctfs and the unnamed ctfa_modent array that follows
it. However, this is only an assumption of the generation code: the
read-side code in libctf and the file format do not have any such
requirements).
The shared properties and CTF dict storage have the same (length-prepended)
format and identical string/value mapping via struct ctf_archive_modent, but
are pointed to by different header fields: ctfa_modents for CTFs,
ctfa_propents for properties: their names are intermingled in ctfa_names but
the CTF dicts and property values are stashed in distinct tables, ctfa_ctfs
and ctfa_prop_values. Implementations may interpret properties however they
wish, and their presence must not be mandatory (though dictionaries may be
modified given the presence of a particular property, making use of that
property mandatory for reading those dicts: the intent here is to allow
optional movement of shared header fields into the shared properties table in
the future. For now, only parent_name=... is present.)
In format v1, the dict size uint64_t prepended to dictionaries is one
uint64_t too long: it contains the length of the size byte too. In dict v2,
this is corrected (at open time, libctf fixes up v1 dicts too).
This is *not* the same as the data structure returned by the ctf_arc_*()
functions: this is the low-level on-disk representation. */
#define CTFA_MAGIC 0x8b47f2a4d7623eeb /* Random. */
#define CTFA_MAGIC 0x8b47f2a4d7623eec /* V1, below, incremented. */
struct ctf_archive
{
/* Magic number. (In loaded files, overwritten with the file size
@@ -852,6 +871,43 @@ struct ctf_archive
/* Number of CTF dicts in the archive. */
uint64_t ctfa_ndicts;
/* Number of shared properties. */
uint64_t ctfa_nprops;
/* Offset of the name table, used for both CTF member names and property
names. */
uint64_t ctfa_names;
/* Offset of the CTF table. Each element starts with a size (a little-
endian uint64_t) then a ctf_dict_t of that size. */
uint64_t ctfa_ctfs;
/* Offset of the shared properties value table: identical format, except the
size is followed by an arbitrary (property-dependent) binary blob. */
uint64_t ctfa_prop_values;
/* Offset of the modent table mapping names to CTFs. */
uint64_t ctfa_modents;
/* Offset of the modent table mapping names to properties. Ignored if
nprops is 0. */
uint64_t ctfa_propents;
};
#define CTFA_V1_MAGIC 0x8b47f2a4d7623eeb /* Random. */
struct ctf_archive_v1
{
/* Magic number. (In loaded files, overwritten with the file size
so ctf_arc_close() knows how much to munmap()). */
uint64_t ctfa_magic;
/* CTF data model. */
uint64_t ctfa_model;
/* Number of CTF dicts in the archive. */
uint64_t ctfa_ndicts;
/* Offset of the name table. */
uint64_t ctfa_names;
@@ -860,9 +916,16 @@ struct ctf_archive
uint64_t ctfa_ctfs;
};
/* An array of ctfa_ndicts of this structure lies at
ctf_archive[sizeof(struct ctf_archive)] and gives the ctfa_ctfs or
ctfa_names-relative offsets of each name or ctf_dict_t. */
/* An array of ctfa_ndicts of this structure lies at the offset given by
ctfa_modents (or, in v1, at ctf_archive[sizeof(struct ctf_archive)]) and gives
the ctfa_ctfs or ctfa_names-relative offsets of each name or ctf_dict_t.
Another array of ctfa_nprops of this structure lies at the ctfa_propents
offset: for this, the ctf_offset is the ctfa_propents-relative offset of
proprty values.
Both property values and CTFs are prepended by a uint64 giving their length.
The names are just a strtab (\0-separated). */
typedef struct ctf_archive_modent
{

File diff suppressed because it is too large Load Diff

View File

@@ -552,7 +552,10 @@ struct ctf_archive_internal
int ctfi_is_archive;
int ctfi_unmap_on_close;
ctf_dict_t *ctfi_dict;
struct ctf_archive *ctfi_archive;
unsigned char *ctfi_archive;
struct ctf_archive *ctfi_hdr; /* Always malloced. Header only. */
size_t ctfi_hdr_len;
int ctfi_archive_v1; /* If set, this is a v1 archive. */
ctf_dynhash_t *ctfi_dicts; /* Dicts we have opened and cached. */
ctf_dict_t *ctfi_crossdict_cache; /* Cross-dict caching. */
ctf_dict_t **ctfi_symdicts; /* Array of index -> ctf_dict_t *. */
@@ -815,13 +818,12 @@ extern int ctf_preserialize (ctf_dict_t *fp, int force_ctf);
extern void ctf_depreserialize (ctf_dict_t *fp);
extern struct ctf_archive_internal *
ctf_new_archive_internal (int is_archive, int unmap_on_close,
struct ctf_archive *, ctf_dict_t *,
const ctf_sect_t *symsect,
ctf_new_archive_internal (int is_archive, int is_v1, int unmap_on_close,
struct ctf_archive *, size_t,
ctf_dict_t *, const ctf_sect_t *symsect,
const ctf_sect_t *strsect, int *errp);
extern struct ctf_archive *ctf_arc_open_internal (const char *, int *);
extern void ctf_arc_close_internal (struct ctf_archive *);
extern const ctf_preamble_t *ctf_arc_bufpreamble (const ctf_sect_t *);
extern struct ctf_archive_internal *ctf_arc_open_internal (const char *, int *);
extern const ctf_preamble_t *ctf_arc_bufpreamble_v1 (const ctf_sect_t *);
extern void *ctf_set_open_errno (int *, int);
extern int ctf_flip_header (void *, int, int);
extern int ctf_flip (ctf_dict_t *, ctf_header_t *, unsigned char *,

View File

@@ -1177,7 +1177,7 @@ ctf_link_deduplicating_per_cu (ctf_dict_t *fp)
equal to the CU name. We have to wrap it in an archive wrapper
first. */
if ((in_arc = ctf_new_archive_internal (0, 0, NULL, outputs[0], NULL,
if ((in_arc = ctf_new_archive_internal (0, 0, 0, NULL, 0, outputs[0], NULL,
NULL, &err)) == NULL)
{
ctf_set_errno (fp, err);

View File

@@ -119,9 +119,20 @@ ctf_bfdopen_ctfsect (struct bfd *abfd _libctf_unused_,
bfderrstr = N_("CTF section is NULL");
goto err;
}
preamble = ctf_arc_bufpreamble (ctfsect);
if (preamble->ctp_flags & CTF_F_DYNSTR)
/* v3 dicts may cite the symtab or the dynsymtab, without using sh_link to
indicate which: pick the right one. v4 dicts always use the dynsymtab (for
now). */
errno = 0;
preamble = ctf_arc_bufpreamble_v1 (ctfsect);
if (!preamble && errno == EOVERFLOW)
{
bfderrstr = N_("section too short to be CTF or BTF");
goto err;
}
if (!preamble || (preamble && preamble->ctp_flags & CTF_F_DYNSTR))
{
symhdr = &elf_tdata (abfd)->dynsymtab_hdr;
strtab_name = ".dynstr";
@@ -301,21 +312,16 @@ ctf_fdopen (int fd, const char *filename, const char *target, int *errp)
fp->ctf_data_mmapped = data;
fp->ctf_data_mmapped_len = (size_t) st.st_size;
return ctf_new_archive_internal (0, 1, NULL, fp, NULL, NULL, errp);
return ctf_new_archive_internal (0, 0, 1, NULL, 0, fp, NULL, NULL, errp);
}
if ((nbytes = ctf_pread (fd, &arc_magic, sizeof (arc_magic), 0)) <= 0)
return (ctf_set_open_errno (errp, nbytes < 0 ? errno : ECTF_FMT));
if ((size_t) nbytes >= sizeof (uint64_t) && le64toh (arc_magic) == CTFA_MAGIC)
{
struct ctf_archive *arc;
if ((arc = ctf_arc_open_internal (filename, errp)) == NULL)
return NULL; /* errno is set for us. */
return ctf_new_archive_internal (1, 1, arc, NULL, NULL, NULL, errp);
}
if ((size_t) nbytes >= sizeof (uint64_t)
&& (arc_magic == CTFA_MAGIC || bswap_64 (arc_magic) == CTFA_MAGIC
|| le64toh (arc_magic) == CTFA_V1_MAGIC))
return ctf_arc_open_internal (filename, errp);
/* Attempt to open the file with BFD. We must dup the fd first, since bfd
takes ownership of the passed fd. */