mirror of
https://github.com/seL4/seL4.git
synced 2026-03-27 10:29:57 +00:00
tools: add manual for bitfield_gen.py
Add a user manual for the bitfield generator that documents syntax, semantics and basic concepts. Should also be able to serve as main spec for what the tool is supposed to do. Signed-off-by: Gerwin Klein <gerwin.klein@proofcraft.systems>
This commit is contained in:
673
tools/bitfield_gen.md
Normal file
673
tools/bitfield_gen.md
Normal file
@@ -0,0 +1,673 @@
|
||||
<!--
|
||||
Copyright 2023 Proofcraft Pty Ltd
|
||||
SPDX-License-Identifier: CC-BY-SA-4.0
|
||||
-->
|
||||
|
||||
# Bitfield Generator Manual
|
||||
|
||||
## Introduction
|
||||
|
||||
The tool `bitfield_gen.py` in the `seL4/tools/` directory generates bitfield
|
||||
accessor code in C with formal specifications and proofs in Isabelle/HOL.
|
||||
|
||||
It takes specifications of bitfield layouts in memory such as the following
|
||||
|
||||
```bf_gen
|
||||
block VMFault {
|
||||
field address 32
|
||||
|
||||
field FSR 5
|
||||
padding 7
|
||||
field instructionFault 1
|
||||
padding 15
|
||||
field seL4_FaultType 4
|
||||
}
|
||||
```
|
||||
|
||||
and generates a C struct with constructor and getter/setter functions with
|
||||
call-by-value and call-by-pointer semantics. It also, separately, generates a
|
||||
corresponding data type definition in Isabelle/HOL with corresponding functions
|
||||
in HOL for that data type, as well as proofs that the generated C code correctly
|
||||
implements these functions. This includes absence of undefined behaviour in C.
|
||||
|
||||
The purpose of the tool is to provide bit-level access to structures in memory
|
||||
with precise, deterministic and reliable layout semantics, with proofs of
|
||||
correctness, and without the need for manual masking and shifting operations.
|
||||
The C standard leaves C bitfields too underspecified to be fit for that purpose.
|
||||
|
||||
In addition to [basic blocks](#blocks) of bitfields such as above, the tool
|
||||
supports [tagged unions](#tagged-unions) and various layout options for these
|
||||
blocks.
|
||||
|
||||
## Bit Field Generator by Example
|
||||
|
||||
This section walks through the features of the bitfield specification languages
|
||||
by example. The [full syntax](#syntax-reference) is described in the next
|
||||
section for reference.
|
||||
|
||||
### Blocks
|
||||
|
||||
The basic concepts of the specification languages are blocks which contain a
|
||||
list of fields. The example used in the introduction declares a block with
|
||||
name `VMFault`, which contains four fields and two padding entries:
|
||||
|
||||
```bf_gen
|
||||
block VMFault {
|
||||
field address 32
|
||||
|
||||
field FSR 5
|
||||
padding 7
|
||||
field instructionFault 1
|
||||
padding 15
|
||||
field seL4_FaultType 4
|
||||
}
|
||||
```
|
||||
|
||||
This will generate a type `VMFault_t` in C. Assuming that the word size of the
|
||||
underlying architecture is 32 bits, this `VMFault_t` type will have a size of
|
||||
two machine words. The sizes of the content and padding fields in a block must
|
||||
add up to a multiple of the machine word size.
|
||||
|
||||
The bits in the block are laid out in memory in the order they are mentioned in
|
||||
the specification. That is, the left-most 32 bits (i.e. bits 63 to 32) of
|
||||
`VMFault` will correspond to the `address` field, the next 5 bits to `FSR`, the
|
||||
next 7 bits are inaccessible padding which is set to 0 on creation of the
|
||||
bitfield, then one bit for `instructionFault`, 15 bits of padding, and finally 4
|
||||
bits (bits 0 to 3 of the entire type) for `seL4_FaultType`.
|
||||
|
||||
The following diagram corresponds to the block specification above.
|
||||
|
||||
```bf_dia
|
||||
│------------------------ address ----------------------------│
|
||||
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
|
||||
│ │ ..
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
63 32
|
||||
|
||||
instruction fault seL4_FaultType
|
||||
│-- FSR --│ │-| │-------|
|
||||
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
|
||||
.. │ │ │ │ │ │
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
31 27 19 3 0
|
||||
```
|
||||
|
||||
For creating new values of the type `VMFault`, the generator will provide a
|
||||
function `VMFault` that takes an argument of the word size of the machine for
|
||||
the value of each field -- for the example above, this is `uint32_t`
|
||||
|
||||
```c
|
||||
VMFault_t VMFault(uint32_t address, uint32_t FSR, uint32_t instructionFault,
|
||||
uint32_t seL4_FaultType);
|
||||
```
|
||||
|
||||
Values that do not fit in the bit size of the target field will be cut off
|
||||
accordingly, i.e., as if using `val & mask(size)`.
|
||||
|
||||
To read the value of each field, the generator will provide the following
|
||||
functions:
|
||||
|
||||
```c
|
||||
uint32_t VMFault_get_address(VMFault_t vm_fault);
|
||||
uint32_t VMFault_get_FSR(VMFault_t vm_fault);
|
||||
uint32_t VMFault_get_instructionFault(VMFault_t vm_fault);
|
||||
uint32_t VMFault_get_seL4_FaultType(VMFault_t vm_fault);
|
||||
```
|
||||
|
||||
For setting one of the fields in an existing value of type `VMFault_t`:
|
||||
|
||||
```c
|
||||
VMFault_t VMFault_set_address(VMFault_t vm_fault, uint32_t address);
|
||||
VMFault_t VMFault_set_FSR(VMFault_t vm_fault, uint32_t FSR);
|
||||
VMFault_t VMFault_set_instructionFault(VMFault_t vm_fault, uint32_t instructionFault);
|
||||
VMFault_t VMFault_set_seL4_FaultType(VMFault_t vm_fault, uint32_t seL4_FaultType);
|
||||
```
|
||||
|
||||
All of the above (constructor, getter, setter) are pure functions without side
|
||||
effects or dependency on global state. They do contain `assert` statements that
|
||||
are in effect in debug configurations. The generator declares all of them as
|
||||
`static inline CONST` to enable corresponding compiler optimisations.
|
||||
|
||||
Often we expect such bitfields to be accessed via pointers, e.g. as part of a
|
||||
buffer or a larger struct in memory or for instance a page table. Therefore the
|
||||
generator also produces getter/setter functions with pointer access:
|
||||
|
||||
```c
|
||||
uint32_t VMFault_ptr_get_address(VMFault_t *vm_fault);
|
||||
uint32_t VMFault_ptr_get_FSR(VMFault_t *vm_fault);
|
||||
uint32_t VMFault_ptr_get_instructionFault(VMFault_t *vm_fault);
|
||||
uint32_t VMFault_ptr_get_seL4_FaultType(VMFault_t *vm_fault);
|
||||
```
|
||||
|
||||
These are declared `PURE`, but not `CONST`, since they depend on global state.
|
||||
|
||||
The pointer set functions are:
|
||||
|
||||
```c
|
||||
void VMFault_set_address(VMFault_t *vm_fault, uint32_t address);
|
||||
void VMFault_set_FSR(VMFault_t *vm_fault, uint32_t FSR);
|
||||
void VMFault_set_instructionFault(VMFault_t *vm_fault, uint32_t instructionFault);
|
||||
void VMFault_set_seL4_FaultType(VMFault_t *vm_fault, uint32_t seL4_FaultType);
|
||||
```
|
||||
|
||||
These are neither `PURE` nor `CONST`, but still declared as `static inline`.
|
||||
|
||||
> Note that the generator has a command line option `--prune` to skip functions
|
||||
that are not used, and the seL4 build system makes use of that option. That
|
||||
means, not necessarily all of the combinations above will appear in the
|
||||
generated files for seL4, only the ones that are actually used in the rest of
|
||||
the code.
|
||||
|
||||
### Field High
|
||||
|
||||
The usual access patterns for fields is that if we provide e.g. a value of 5 to
|
||||
a setter function, we expect a corresponding 8-bit field to be set to the bit
|
||||
pattern `0000 0101`.
|
||||
|
||||
In general, on a 32-bit architecture, we are providing a 32-bit value to store
|
||||
in an n-bit field. One way of looking at this is that we know the top 32-n bits
|
||||
of the value are always 0.
|
||||
|
||||
Sometimes we instead know that the *bottom* n bits of the value we provide are
|
||||
0. This can happen for instance when we want to store an address that is always
|
||||
aligned to a page boundary, as it would in a page table entry. This means we are
|
||||
interested in the high bits of the address. The `field_high` tag allows us to
|
||||
store only the significant high bits of such addresses.
|
||||
|
||||
For instance the base pointer stored in a page table cap in seL4 for 32-bit
|
||||
RISC-V is such a case:
|
||||
|
||||
```bf_gen
|
||||
block page_table_cap {
|
||||
field capPTMappedASID 9
|
||||
field_high capPTBasePtr 20
|
||||
padding 3
|
||||
|
||||
padding 7
|
||||
field capPTIsMapped 1
|
||||
field_high capPTMappedAddress 20
|
||||
field capType 4
|
||||
}
|
||||
```
|
||||
|
||||
The generated access methods of the `field_high capPTBasePtr` have the same type
|
||||
as the ones for `field`, but only the top 20 bits of the provided value will be
|
||||
stored in the setter methods as opposed to the bottom 20 bits for normal fields.
|
||||
|
||||
The getter methods will automatically correctly return a value that has the top
|
||||
20 bits set to the stored value and the bottom 12 bits set to 0. That is, getter
|
||||
and setter remain inverses of each other as long as the values provided to the
|
||||
setter have the bottom 12 bits set to 0.
|
||||
|
||||
On most 64-bit architectures, the top bits of pointers are also determined,
|
||||
because not all 64 bits are available (e.g. only 48 of them on x64). The
|
||||
generator assumes `field_high` values to be used for pointers and to need to store
|
||||
only the significant bits of the pointer, along with potential sign extension.
|
||||
See the [Architecture Parameters][] section below for details.
|
||||
|
||||
For instance, if the platform is declared to have base 64 and 48 significant
|
||||
bits, and we have a pointer that we know is aligned to 12 bits, we need to store
|
||||
only 48-12 = 36 bits to recreate the correct original pointer value. That means
|
||||
if we declare
|
||||
|
||||
```bf_gen
|
||||
block store_more {
|
||||
field some_info 16
|
||||
field_high ptr 36
|
||||
field other_info 12
|
||||
}
|
||||
```
|
||||
|
||||
we can store a 12-aligned pointer with `sm = store_more_set_ptr(sm, p)` and
|
||||
extract it again with `p = store_more_get_ptr(sm)` without losing information.
|
||||
This means, we can store 28 bits of additional information in the space that
|
||||
would be taken up by a single pointer. For this it does not matter where in the
|
||||
`store_more` block the `ptr` field is declared, just that it is a `field_high`
|
||||
that stores 36 bits (on a 64 bit base platform with 48 significant bits).
|
||||
|
||||
[Architecture Parameters]: #architecture-parameters-and-canonical-pointers
|
||||
|
||||
### Visible Field Order
|
||||
|
||||
Sometimes it is useful for the constructor to take its arguments in a particular
|
||||
order that is different from the field layout order in memory. It is possible to
|
||||
specify such an order, enumerating the constructor field order in parentheses
|
||||
after the block name:
|
||||
|
||||
|
||||
```bf_gen
|
||||
block page_table_cap(capPTBasePtr, capPTMappedASID) {
|
||||
field capPTMappedASID 9
|
||||
field_high capPTBasePtr 20
|
||||
padding 3
|
||||
}
|
||||
```
|
||||
|
||||
All fields must be enumerated, including the tag field in tagged unions
|
||||
described in the [next section](#tagged-unions).
|
||||
|
||||
Only the constructor is affected, the memory layout as well as getters and
|
||||
setters remain the same.
|
||||
|
||||
### Tagged Unions
|
||||
|
||||
Multiple blocks that share a type field can be aggregated as a type in C into a
|
||||
tagged union. For example to express an algebraic data type such as the
|
||||
following
|
||||
|
||||
```isabelle
|
||||
datatype cap = NullCap | ObjectCap (ref : "20 word")
|
||||
```
|
||||
|
||||
in C, one might specify the following:
|
||||
|
||||
```bf_gen
|
||||
block null_cap {
|
||||
padding 28
|
||||
cap_type 4
|
||||
}
|
||||
|
||||
block object_cap {
|
||||
field ref 20
|
||||
padding 8
|
||||
cap_type 4
|
||||
}
|
||||
|
||||
tagged_union cap cap_type {
|
||||
tag null_cap 0
|
||||
tag object_cap 1
|
||||
}
|
||||
```
|
||||
|
||||
> Note that both of the block declarations for `null_cap` and `object_cap` contain
|
||||
a field with the name `cap_type` at the same bit position with the same size,
|
||||
and that all blocks have the same size. We will later relax the size of the tag
|
||||
field, but the tool does require all blocks in a tagged union to be of the same
|
||||
size and the tag field to always be at the same position.
|
||||
|
||||
The fragment `tagged_union cap cap_type` means that we want the generator to
|
||||
produce a C type `cap_t` that is a tagged union, and that we want to use
|
||||
`cap_type` as the tag field that distinguishes which variant of the union we are
|
||||
operating on.
|
||||
|
||||
Listing `tag null_cap 0` means that we want the block `null_cap` to be part of
|
||||
the tagged union and that the value `0` in field `cap_type` will indicate that
|
||||
the rest of the fields are `null_cap` fields. Similarly `tag object_cap 1`
|
||||
means that a value of `1` in the tag field makes the fields of the block
|
||||
`object_cap` available.
|
||||
|
||||
This mapping must be unique, i.e., it is an error for a block to be mentioned in
|
||||
more than one tagged union.
|
||||
|
||||
If a block is mentioned in a tagged union, its memory layout stays the same as
|
||||
before, but the code that is generated for it changes types to represent the
|
||||
tagged union instead. All getters and setters for the block will be prefixed
|
||||
with the name of the union, and instead of the block type, they will take and
|
||||
return the union type instead (e.g. `cap_t` instead of `object_cap_t` or
|
||||
`null_cap_t`). The tag field will be omitted from the constructor, because it is
|
||||
determined by the constructor name, and no setter methods for the tag field will
|
||||
be generated. Instead there will be a generic getter function for the tag field
|
||||
which does not need to know which variant of the type it is presented with. It
|
||||
returns an enum whose names correspond to the block names and whose values
|
||||
correspond to the associated tag values.
|
||||
|
||||
For instance for the specification above, we will get:
|
||||
|
||||
```c
|
||||
// variant tags:
|
||||
enum cap_tag {
|
||||
cap_null_cap = 0,
|
||||
cap_object_cap = 1
|
||||
};
|
||||
typedef enum cap_tag cap_tag_t;
|
||||
|
||||
// discrimator:
|
||||
cap_tag_t cap_get_cap_type(cap_t cap);
|
||||
|
||||
// constructors:
|
||||
cap_t cap_null_cap_new(void);
|
||||
cap_t cap_object_cap_new(uint64_t ref);
|
||||
|
||||
// getters and setters for object_cap variant:
|
||||
uint64_t cap_object_cap_get_ref(cap_t cap);
|
||||
cap_t cap_object_cap_set_ref(cap_t cap, uint64_t ref);
|
||||
|
||||
uint64_t cap_object_cap_ptr_get_ref(cap_t *cap);
|
||||
void cap_object_cap_ptr_set_ref(cap_t *cap, uint64_t ref);
|
||||
```
|
||||
|
||||
The getter names are constructed as `<union>_<block>_get_<field>`. Setters and
|
||||
pointer versions have names with `set`, `ptr_get`, and `ptr_set` respectively.
|
||||
All functions for the tagged union `cap` will start with `cap_`.
|
||||
|
||||
> It is a runtime error to use a getter or setter function on the wrong block
|
||||
variant. That means it is the caller's responsibility to know or check which
|
||||
variant a value of `cap_t` represents before using e.g.
|
||||
`cap_object_cap_get_ref()` on it. The tool generates `assert` statements in the
|
||||
getters and setters. Common idioms are to check the tag in a `switch` or `if`
|
||||
statement and work on the contents accordingly.
|
||||
|
||||
### Advanced Tag Fields
|
||||
|
||||
#### Tag Slices
|
||||
|
||||
Occasionally the tag value for a tagged union cannot be encoded in a single
|
||||
field of a block, for instance when the hardware prescribes using e.g. bits 0
|
||||
and 5..6 for the type of a structure and bits 1..4 for something else.
|
||||
|
||||
An example would be the following
|
||||
|
||||
```bf_gen
|
||||
block null {
|
||||
padding 25
|
||||
field typ_h 2
|
||||
padding 4
|
||||
field typ_l 1
|
||||
}
|
||||
|
||||
block content {
|
||||
padding 25
|
||||
field typ_h 2
|
||||
field some 4
|
||||
field typ_l 1
|
||||
}
|
||||
|
||||
tagged_union example ex_type(typ_h, typ_l) {
|
||||
null (0,0)
|
||||
content (2,0)
|
||||
}
|
||||
```
|
||||
|
||||
Instead of a single tag field, the `example` union has two tag fields. We also
|
||||
say that the tag `ex_type` has two *slices* `typ_h` and `typ_l`. The same
|
||||
restrictions as for normal tags apply: they must fit the tag values and all tag
|
||||
slice fields must be mentioned in all blocks of the union.
|
||||
|
||||
The fragment `ex_type(typ_h, typ_l)` means that we want the discriminator
|
||||
function to be called `example_get_ex_type` and that the tag is made up of the
|
||||
two fields `(typ_h, typ_l)`. Any positive number of fields is allowed. The
|
||||
generated enumeration will still be called `example_tag` as it would be for a
|
||||
single tag, and it will still contain a name for each alternative bound to a
|
||||
value unique for that variant. (The values consist of the concatenation of the
|
||||
bit pattern of the tag slice values).
|
||||
|
||||
Similarly to a single tag field, the tool will not generate getters and setters
|
||||
for any of the tag fields mentioned in the union specification.
|
||||
|
||||
The values in the `null (0,0)` and `content (2,0)` are positional, i.e., the
|
||||
example assigns the bit pattern `0b10` in field `typ_h` and `0` in field `typ_l`
|
||||
to mean the `content` block and `0` in both to mean the `null` block.
|
||||
|
||||
#### Masks
|
||||
|
||||
The restriction that all tag fields have to have the same size can be relaxed
|
||||
slightly. One can declare multiple possible tag sizes together with a bit mask
|
||||
each that determines which of these sizes applies.
|
||||
|
||||
Say for instance that we have more than 16 different `cap` types, and that some
|
||||
of the `cap` alternatives only have 4 bits of storage left to store the cap tag,
|
||||
but others have further bits available, e.g. 8. We can declare a bit mask that
|
||||
determines if the tag size is 4 or 8 and use the size 4 tags in the blocks that
|
||||
have fewer bits left.
|
||||
|
||||
The following example achieves this:
|
||||
|
||||
```bf_gen
|
||||
block null_cap {
|
||||
padding 24
|
||||
cap_type 8
|
||||
}
|
||||
|
||||
block object_cap {
|
||||
field ref 28
|
||||
cap_type 4
|
||||
}
|
||||
|
||||
tagged_union cap cap_type {
|
||||
mask 4 0x0e
|
||||
mask 8 0x0e
|
||||
tag null_cap 15
|
||||
tag object_cap 1
|
||||
}
|
||||
```
|
||||
|
||||
The block `null_cap` has an 8-bit tag field and the block `object_cap` has a
|
||||
4-bit tag field of the same name. Note that both still must start at the same
|
||||
bit positions.
|
||||
|
||||
The declarations `mask 4 0x0e` and `mask 8 0x0e` are read from the smallest to
|
||||
the largest mask size. `mask 4 0x0e` declares that the default tag size is 4
|
||||
bits wide, but if the bits `0b1110` (0x0e) are set, i.e. if `tag_value & 0x0e ==
|
||||
0x0e`, then the next larger mask will be used. This means values 0 to 13 are
|
||||
using 4 bit tags, 14 and 15 are using 8 bit tags, and all 8 bit tag values must
|
||||
have at least the bits 1, 2, and 3 set.
|
||||
|
||||
Since there is no larger mask size after 8 in the example, the mask value there
|
||||
is not important. However, the generator expects the mask of each larger level
|
||||
to have at least all bits set that the masks of all smaller sizes have set, so
|
||||
we chose the same value `0x0e` in the example. If we had another level, e.g. 16
|
||||
bits wide afterwards, we might have chosen `0b11110` (0x1e) for the mask value
|
||||
at 8 bit.
|
||||
|
||||
The sub mask inclusion means that the generator can check mask sizes
|
||||
consecutively. I.e. if `tag_value & 0xe != 0xe`, we always know it is a 4 bit
|
||||
tag, no matter how many other tag sizes are declared.
|
||||
|
||||
The tag mask value declared for each alternative in the tagged union must fit
|
||||
the mask for the size of the tag field in the block. Remember that with `mask 4
|
||||
0x0e` the `0x0e` bit pattern means that size 4 does *not* apply, and instead
|
||||
the next larger size applies. So in the example, because `object_cap` has a
|
||||
4-bit tag field, `object_cap` must have a tag value where the bits `0x0e` are
|
||||
not all set (e.g. 1 is a legal value). Conversely, `null_cap` has an 8-bit tag,
|
||||
and so at least the bits `0x0e` must all be set in the tag value (e.g. 14 or 15
|
||||
works, as does 255, but not 1).
|
||||
|
||||
Masks and tag slices are mutually exclusive, i.e. only one of these can be used
|
||||
for any particular tagged union.
|
||||
|
||||
### Raw Value Access
|
||||
|
||||
The C representation of blocks, whether in a tagged union or not, is a struct
|
||||
containing an array of the architecture word type (`uint32` or `uint64`) of the
|
||||
length computed from the number and width of fields in the block. Given that the
|
||||
memory layout is fully specified it is possible to operate on these raw values
|
||||
directly without breaking the semantics of the generated functions.
|
||||
|
||||
This is usually neither recommended nor necessary, and for code undergoing
|
||||
formal verification strongly discouraged, because it introduces additional
|
||||
verification overhead.
|
||||
|
||||
However, one special case is occasionally convenient: when initialising an area
|
||||
of memory with 0, and reinterpreting that memory as a bitfield structure, we are
|
||||
guaranteed to get a value where all fields are 0. Similarly, returning an all-0
|
||||
struct from a function instead of calling the constructor with only 0 parameters
|
||||
may be acceptable, although one should check the generated assembly if the
|
||||
compiler was not able to optimise the constructor away -- if it does then the
|
||||
constructor form is clearer and easier to use in proofs.
|
||||
|
||||
The tool will not access padding fields, but is not guaranteed to preserve them
|
||||
when copying. Writing/reading padding fields is not guaranteed to be stable.
|
||||
|
||||
### Architecture Parameters and Canonical Pointers
|
||||
|
||||
The tool supports architectures with 32 and 64 bit word sizes. The command
|
||||
|
||||
```bf_gen
|
||||
base 32
|
||||
```
|
||||
|
||||
selects a 32 bit native word size. The declaration covers all subsequent blocks
|
||||
and unions in the specification. It is possible to give multiple declarations,
|
||||
each covering the subsequent blocks and unions, but it rarely makes sense to
|
||||
pick any other value than the word size of the architecture.
|
||||
|
||||
Many 64 bit architectures have the concept of canonical pointers where not all
|
||||
64 bits of the available space are usable for pointers, but only for example 48
|
||||
bits. In this setting a pointer value would be canonical if and only if all top
|
||||
bits are set when bit 47 (counting from 0) is set. This is equivalent to sign
|
||||
extension, i.e. treating bit 47 as the sign bit and extending that sign bit up
|
||||
to bit 63.
|
||||
|
||||
A potential use for this is for kernel pointers to always have bit 47 set (and
|
||||
all bits above) and for user pointers to always have bit 47 unset (and all bits
|
||||
above).
|
||||
|
||||
The bitfield generator allows specifying this canonical bit and whether the top
|
||||
bits for pointers should be sign extended or not like this:
|
||||
|
||||
```bf_gen
|
||||
base 64(48,1)
|
||||
```
|
||||
|
||||
This means: word size is 64, pointers are 48 bits `[0..47]`, and bit 48 and
|
||||
above should be set to 1 if bit 47 is set.
|
||||
|
||||
This only affects `field_high` fields which are assumed to be pointers.
|
||||
|
||||
```bf_gen
|
||||
base 64(48,0)
|
||||
```
|
||||
|
||||
means word size is 64, pointers are at most 48 bits, and no sign extension. This
|
||||
is currently equivalent to just `base 64`.
|
||||
|
||||
It may make sense to use the `base` command multiple times for the sign
|
||||
extension case. For instance, if the intention is to usually store pointers on
|
||||
x64, one would declare `base 64(48,1)`, but if there is a specific block that
|
||||
wants to not store a pointer, but some other value as `field_high`, one could
|
||||
declare `base 64` just before and `base 64(48,1)` again just after that block.
|
||||
|
||||
|
||||
## Syntax Reference
|
||||
|
||||
### Lexical Structure
|
||||
|
||||
The generator distinguishes identifiers, integer literals, comments, white
|
||||
space, keywords, and operators. Keywords and operators are shown inline in the
|
||||
grammar in the next section. The rest are defined below.
|
||||
|
||||
A multi-character ident must start with a letter or underscore, followed by
|
||||
letters, underscores or numbers. A one-character ident must consist of one
|
||||
letter. Letters are Latin letters `[a-z]` and `[A-Z]`, digits are `[0-9]`.
|
||||
|
||||
```bnf
|
||||
IDENT ::= [A-Za-z_][a-zA-Z0-9_]+ | [A-Za-z]
|
||||
```
|
||||
|
||||
In addition to the regular expression above, identifiers that consist only of
|
||||
underscores are invalid. Valid examples are `a`, `a_20`, `_a_20_b`. Invalid
|
||||
examples are `0`, `_`, `0a` or `0_a`.
|
||||
|
||||
Integer literals can be decimal, octal, hex, or binary, followed by an optional
|
||||
`l` or `L` specifier.
|
||||
|
||||
```bnf
|
||||
INTLIT ::= ( [1-9][0-9]*
|
||||
| 0[oO]?[0-7]+
|
||||
| 0[xX][0-9a-fA-F]+
|
||||
| 0[bB][01]+
|
||||
| 0 ) [lL]?'
|
||||
```
|
||||
|
||||
Valid examples are `15`, `017`, `0o17`, `0x0F`, `0xF`, `0XF`, `0b1111`, `0`,
|
||||
`0l`, and `15L`. Invalid examples are `10A0`, `0xFG0`, `0b20`, `090`.
|
||||
|
||||
Comments start with `--` or with `#` and go until the end of the line. There
|
||||
are no block comments.
|
||||
|
||||
```bf_gen
|
||||
-- this is a valid comment
|
||||
## as is this
|
||||
```
|
||||
|
||||
Whitespace are spaces and tab characters. They delimit other tokens. Consecutive
|
||||
whitespace is ignored.
|
||||
|
||||
### Syntactic Structure
|
||||
|
||||
```ebnf
|
||||
spec :== ( base | block | tagged_union )*
|
||||
|
||||
base ::= "base" ("32"|"64") canonical_spec?
|
||||
canonical_spec ::= "(" INTLIT "," ("0"|"1") ")"
|
||||
|
||||
block ::= "block" IDENT visible_order_spec_opt? "{" fields "}"
|
||||
visible_order_spec_opt ::= "(" visible_order_spec? ")"
|
||||
visible_order_spec ::= IDENT ("," IDENT)*
|
||||
fields ::= ( "field" IDENT INTLIT | "field_high" IDENT INTLIT | "padding" INTLIT )*
|
||||
|
||||
tagged_union ::= "tagged_union" IDENT IDENT tag_slices? "{" masks tags "}"
|
||||
|
||||
tag_slices ::= "(" IDENT ("," IDENT)* ")"
|
||||
masks ::= ( "mask" INTLIT INTLIT )*
|
||||
tags ::= ( "tag" IDENT tag_value )*
|
||||
tag_value ::= INTLIT | "(" INTLIT ( "," INTLIT )* ")"
|
||||
```
|
||||
|
||||
Forward references to names that are declared later in the file are allowed.
|
||||
|
||||
### Restrictions
|
||||
|
||||
The following restrictions are not represented by the grammar in the previous
|
||||
section, but will be checked by the generator.
|
||||
|
||||
- the canonical bit must be smaller than the `base` size
|
||||
- field sizes must not be larger than the `base` size
|
||||
- the field sizes in a block must add up to multiples of the `base` size
|
||||
- all identifiers mentioned as tags in a union must be blocks
|
||||
- all blocks in a tagged union must have the same size
|
||||
- all blocks in a tagged union must mention the tag field(s)
|
||||
- all blocks in a tagged union must have the tag field(s) at the same bit
|
||||
position
|
||||
- all blocks in a tagged union must have tag fields of one of the sizes declared
|
||||
in the union, or if no mask size is declared, the tag fields must all be of
|
||||
the same size.
|
||||
- each block can be used in at most one tagged union
|
||||
- visible_order_spec must enumerate all fields if used
|
||||
- tag values must fit into the size of the tag fields
|
||||
- tag values must not be larger than an `int` of the underlying platform,
|
||||
because they will be used in an `enum` in C.
|
||||
|
||||
## Correctness
|
||||
|
||||
Since the tool generates both the implementation and the specification, the
|
||||
question arises what kind of correctness property we gain. Even if the generated
|
||||
proofs are checked for correctness by Isabelle/HOL, they could still be
|
||||
expressing properties that are not useful or specify behaviour that we do not
|
||||
want the code to have. What we do know after Isabelle/HOL has checked the proofs
|
||||
is that the code implements the generated specification and that the generated C
|
||||
code is free from undefined behaviour.
|
||||
|
||||
To gain confidence that the generated specification is meaningful and itself has
|
||||
the right properties it needs human inspection or it needs to be used in further
|
||||
proofs. In the seL4 verification both are the case. In particular, the generated
|
||||
specification together with the generated implementation proofs are the
|
||||
interface that the C verification uses when it encounters bitfield functions. If
|
||||
the generated specifications are trivial, not useful, wrong, or do not express
|
||||
the behaviour of the generated C (e.g. if the generated correctness proof was
|
||||
trivial), these proofs could not succeed, because the specification would then
|
||||
not provide the needed semantics of the code (e.g. that loading a value after
|
||||
storing it does not lose information). If the specification provides strong
|
||||
enough properties to make the rest of the C verification succeed, we know that
|
||||
the specification is strong enough for our needs, even though it might still be
|
||||
missing additional facts or properties that the generated C code might have.
|
||||
This is sufficient for strong functional correctness.
|
||||
|
||||
## Limitations
|
||||
|
||||
The tool does not currently support proofs for all combinations of pointer
|
||||
access functions for variants, but it does support all those that are used in
|
||||
seL4. If proof support for new combinations is needed, the tool could be
|
||||
extended relatively easily.
|
||||
|
||||
<!--
|
||||
TODO:
|
||||
|
||||
- semantics (Isabelle?)
|
||||
- pruning
|
||||
- command line options
|
||||
- preprocessing used in seL4
|
||||
- file extension .bf
|
||||
- where can I find the generated code
|
||||
- where can I find the bitfield definitions in seL4
|
||||
-->
|
||||
Reference in New Issue
Block a user