In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This
leads to re-consideration on the imply for AMX-AVX512.
Since it is using zmm register and using zmm register only, we
need to at least imply AVX512F. AVX512VL is not needed since it
is not using xmm/ymms.
On the other hand, if we imply AVX10.1 for AMX-AVX512, disabling
avx10.1 will lead to disabling AMX-AVX512. This would be a surprise
for users.
Based on the two reasons above, the patch is decoupling AMX-AVX512
from AVX10.2 and imply AVX512F.
opcodes/ChangeLog:
* i386-gen.c: Imply AVX512F instead of AVX10.2.
* i386-init.h: Regenerated.
... to be all in one group. This helps code generation for code like
|| is_cpu (&i.tm, CpuPadLock)
|| is_cpu (&i.tm, CpuPadLockRNG2)
|| is_cpu (&i.tm, CpuPadLockPHE2)
|| is_cpu (&i.tm, CpuPadLockXMODX)
that we have (effectively) twice.
Like for RMPUPDATE documentation is about to change as far as operands
are concerned. They're merely the other way around here.
While adjustind gas documentation, also add the missing RMPQUERY
counterparts there.
There are separate CPUID feature bits for SM2 and CCS instructions.
CCS is the acronym of Chinese Cipher System, it includes SM3 and SM4
instructions. This patch adds CpuGMISM2 and CpuGMICCS to replace CpuGMI on
corresponding instructions.
gas/ChangeLog:
* config/tc-i386.c: Add gmism2 and gmiccs to replace gmi.
* doc/c-i386.texi: Ditto.
opcodes/ChangeLog:
* i386-gen.c: Add GMISM2 and GMICCS to replace GMI.
* i386-opc.h (enum i386_cpu): Add CpuGMISM2 and CpuGMICCS to
replace CpuGMI.
* i386-opc.tbl: Replace GMI with GMISM2 on sm2 instruction. Replace GMI
with GMICCS on sm3 and sm4 instructions.
* i386-tbl.h: Regenerated.
* i386-mnem.h: Ditto.
* i386-init.h: Ditto.
This patch will support AMX-MOVRS feature. Unlike all the other
AMX insns in vector space where we pass vex_len_table before
vex_w_table, we first pass vex_w_table for tileloaddrs[,t1] to
align with the order in EVEX space. The reason why we first pass
vex_w_table in EVEX space is due to AMX-AVX512, where tcvtrowd2ps
and tilemovrow with r32 shares the same opcode with tileloaddrs[,t1].
All of them have evex.w = 0 but with different evex.length. Re-doing
that shortly is not ideal.
APX_F extension is also implemented in this patch. The encoding will
be:
- EVEX.128.NP/66.MAP5.W0 F8/F9 !(11):rrr:100 for
T2RPNTLVW[Z0,Z1]RS[,T1] with NF=0.
- EVEX.128.F2/66.0F38.W0 4A !(11):rrr:100 FOR TILELOADDRS[,T1] with
NF=0.
For APX_F extension, we could not use APX_F(AMX_TRANSPOSE&AMX_MOVRS)
since the transformation could not be done. Instead, we will use
AMX_TRANSPOSE & APX_F(AMX_MOVRS). Thus, we should set AMX_TRANSPOSE
for "any" for cpu_flags in assembler. Since it will only affect the
cpu_flags_match, handle that there.
gas/ChangeLog:
* config/tc-i386.c (cpu_arch): Add amx_movrs.
(cpu_flags_match): Set any bitfield for multiple cpuid
enabled insns.
* doc/c-i386.texi: Document .amx_movrs.
* testsuite/gas/i386/x86-64.exp: Run AMX-MOVRS tests.
* testsuite/gas/i386/x86-64-amx-movrs-intel.d: New test.
* testsuite/gas/i386/x86-64-amx-movrs-inval.l: Ditto.
* testsuite/gas/i386/x86-64-amx-movrs-inval.s: Ditto.
* testsuite/gas/i386/x86-64-amx-movrs.d: Ditto.
* testsuite/gas/i386/x86-64-amx-movrs.s: Ditto.
opcodes/ChangeLog:
* i386-dis-evex-len.h (EVEX_LEN_0F384A_X86_64_W_0): New.
* i386-dis-evex-w.h (EVEX_W_0F384A_X86_64): Ditto.
* i386-dis-evex-x86-64.h (X86_64_EVEX_0F384A): Ditto.
* i386-dis-evex.h: New entry for AMX-MOVRS.
* i386-dis.c:
(PREFIX_VEX_0F384A_X86_64_L_0_W_0): New.
(PREFIX_VEX_MAP5_F8_X86_64_L_0_W_0): Ditto.
(PREFIX_VEX_MAP5_F9_X86_64_L_0_W_0): Ditto.
(X86_64_VEX_0F384A): Ditto.
(X86_64_VEX_MAP5_F8): Ditto.
(X86_64_VEX_MAP5_F9): Ditto.
(X86_64_EVEX_0F384A): Ditto.
(VEX_LEN_0F384A_X86_64_W_0): Ditto.
(VEX_LEN_MAP5_F8_X86_64): Ditto.
(VEX_LEN_MAP5_F9_X86_64): Ditto.
(EVEX_LEN_0F384A_X86_64_W_0): Ditto.
(VEX_W_0F384A_X86_64): Ditto.
(VEX_W_MAP5_F8_X86_64): Ditto.
(VEX_W_MAP5_F9_X86_64): Ditto.
(EVEX_W_0F384A_X86_64): Ditto.
(prefix_table): New entry for AMX-MOVRS.
(x86_64_table): Ditto.
(vex_len_table): Ditto.
(vex_w_table): Ditto.
(map5_f8_opcode): New.
(map5_f9_opcode): Ditto.
(get_valid_dis386): Handle VEX_MAP5 opcode for AMX-MOVRS.
* i386-gen.c (isa_dependencies): Add AMX_MOVRS.
(cpu_flags): Ditto.
* i386-init.h: Regenerated.
* i386-mnem.h: Ditto.
* i386-opc.h (CpuAMX_MOVRS): New.
(i386_cpu_flags): Add cpuamx_movrs.
* i386-opc.tbl: Add AMX-MOVRS instructions.
* i386-tbl.h: Regenerated.
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
In this patch, we will support AMX-FP8 feature. Since in the
foreseeable future, only AMX-MOVRS will also use VEX_MAP5, we
currently will not add a table of 256 entries and handle just
like MAP7.
gas/ChangeLog:
* config/tc-i386.c: Add amx_fp8.
* doc/c-i386.texi: Document .amx_fp8.
* testsuite/gas/i386/x86-64.exp: Run AMX-FP8 tests.
* testsuite/gas/i386/x86-64-amx-fp8-bad.d: New test.
* testsuite/gas/i386/x86-64-amx-fp8-bad.s: Ditto.
* testsuite/gas/i386/x86-64-amx-fp8-intel.d: Ditto.
* testsuite/gas/i386/x86-64-amx-fp8-inval.l: Ditto.
* testsuite/gas/i386/x86-64-amx-fp8-inval.s: Ditto.
* testsuite/gas/i386/x86-64-amx-fp8.d: Ditto.
* testsuite/gas/i386/x86-64-amx-fp8.s: Ditto.
opcodes/ChangeLog:
* i386-dis.c (PREFIX_VEX_MAP5_FD_X86_64_L_0_W_0): New.
(X86_64_VEX_MAP5_FD): Ditto.
(VEX_LEN_MAP5_FD_X86_64): Ditto.
(VEX_W_MAP5_FD_X86_64_L_0):Ditto.
(prefix_table): Add PREFIX_VEX_MAP5_FD_X86_64_L_0_W_0.
(x86_64_table): Add X86_64_VEX_MAP5_FD.
(vex_len_table): Add VEX_LEN_MAP5_FD_X86_64.
(vex_w_table): Add VEX_W_MAP5_FD_X86_64_L_0.
* i386-gen.c: Add CPU_AMX_FP8_FLAGS and
CPU_ANY_AMX_FP8_FLAGS.
* i386-init.h: Regenerated.
* i386-mnem.h: Ditto.
* i386-opc.h: Add cpuamx_fp8.
* i386-opc.tbl: Add AMX_FP8 instructions.
* i386-tbl.h: Regenerated.
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
In this patch, we will support AMX-TRANSPOSE. Since AMX-TRANSPOSE
will be used with other CPUIDs very often, we put it into
CPU_FLAGS_COMMON.
To implement TMM pair, we reused ImplicitGroup and adjust the condition
in process_operands for the instructions.
APX_F extension is also handled in this patch, where it extends
T2RPNTLVW[Z0,Z1][,T1] to EVEX.128.NP/66.0F38.W0 6E/6F !(11):rrr:100
with NF=0.
Also, TTDPFP16PS should base on AMX_FP16, not AMX_BF16 in ISE055.
It would be fixed in ISE056.
gas/ChangeLog:
* config/tc-i386.c (cpu_arch): Add amx_transpose.
(_is_cpu): Ditto.
(process_operands): Adjust the condition for AMX-TRANSPOSE.
* doc/c-i386.texi: Document .amx_transpose.
* testsuite/gas/i386/x86-64.exp: Run AMX-TRANSPOSE tests.
* testsuite/gas/i386/x86-64-amx-transpose-bad.d: New test.
* testsuite/gas/i386/x86-64-amx-transpose-bad.s: Ditto.
* testsuite/gas/i386/x86-64-amx-transpose-intel.d: Ditto.
* testsuite/gas/i386/x86-64-amx-transpose-inval.l: Ditto.
* testsuite/gas/i386/x86-64-amx-transpose-inval.s: Ditto.
* testsuite/gas/i386/x86-64-amx-transpose.d: Ditto.
* testsuite/gas/i386/x86-64-amx-transpose.s: Ditto.
opcodes/ChangeLog:
* i386-dis.c (MOD_VEX_0F386E_X86_64_W_0): New.
(MOD_VEX_0F386F_X86_64_W_0): Ditto.
(PREFIX_VEX_0F385F_X86_64_W_0_L_0): Ditto.
(PREFIX_VEX_0F386B_X86_64_W_0_L_0): Ditto.
(PREFIX_VEX_0F386E_X86_64_W_0_M_0_L_0): Ditto.
(PREFIX_VEX_0F386F_X86_64_W_0_M_0_L_0): Ditto.
(X86_64_VEX_0F385F): Ditto.
(X86_64_VEX_0F386B): Ditto.
(X86_64_VEX_0F386E): Ditto.
(X86_64_VEX_0F386F): Ditto.
(VEX_LEN_0F385F_X86_64_W_0): Ditto.
(VEX_LEN_0F386B_X86_64_W_0): Ditto.
(VEX_LEN_0F386E_X86_64_W_0_M_0): Ditto.
(VEX_LEN_0F386F_X86_64_W_0_M_0): Ditto.
(VEX_W_0F385F_X86_64): Ditto.
(VEX_W_0F386B_X86_64): Ditto.
(VEX_W_0F386E_X86_64): Ditto.
(VEX_W_0F386F_X86_64): Ditto.
(mod_table): Add MOD_VEX_0F386E_X86_64_W_0,
MOD_VEX_0F386F_X86_64_W_0.
(prefix_table): Add PREFIX_VEX_0F386E_X86_64_W_0_M_0_L_0,
PREFIX_VEX_0F386F_X86_64_W_0_M_0_L_0.
Add new instructions for PREFIX_VEX_0F386C_X86_64_W_0_L_0.
(x86_64_table): Add X86_64_VEX_0F385F, X86_64_VEX_0F386B,
X86_64_VEX_0F386E, X86_64_VEX_0F386F.
(vex_len_table): Add VEX_LEN_0F385F_X86_64_W_0,
VEX_LEN_0F386B_X86_64_W_0, VEX_LEN_0F386E_X86_64_W_0_M_0,
VEX_LEN_0F386F_X86_64_W_0_M_0.
(vex_w_table): Add VEX_W_0F385F_X86_64, VEX_W_0F386B_X86_64,
VEX_W_0F386E_X86_64, VEX_W_0F386F_X86_64.
* i386-gen.c (cpu_flag_init): Add AMX_TRANSPOSE.
(cpu_flags): Add CpuAMX_TRANSPOSE.
* i386-init.h: Regenerated.
* i386-mnem.h: Ditto.
* i386-opc.h (CpuAMX_TRANSPOSE): New.
(i386_cpu): Add cpuamx_transpose.
* i386-opc.tbl: Add AMX-TRANSPOSE instructions.
* i386-tbl.h: Regenerated.
Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
In this patch, we will support SM4 AVX10.2 extension part. It is
a promotion from VEX encoding to EVEX encoding. The EVEX encoding
is based on AVX10.2, which is the same as the upcoming MOVRS ISA.
Thus, we decide to pull AVX10.2 out to CPU_COMMON_FLAGS.
While I have also tried to merge the table like AVX/AVX512, I
choose to just templatize the table. I am okay to go either way,
but slightly prefer the templatizing one since probably SM4 would
be the only ISA with AVX10.2 needs such VEX to EVEX extension (MOVRS
does not need that). Also, it is a tendancy that we will directly
provide EVEX encodings and no VEX encodings for vector instructions
since AVX10. This will make the adding in gas/config/tc-i386.c not
that worthy.
gas/ChangeLog:
* NEWS: Support Intel SM4 EVEX instructions.
* config/tc-i386.c (_is_cpu): Handle AVX10.2.
* testsuite/gas/i386/i386.exp: Run SM4 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-256-sm4-intel.d: Add SM4 tests.
* testsuite/gas/i386/avx10_2-256-sm4.d: Ditto.
* testsuite/gas/i386/avx10_2-256-sm4.s: Ditto.
* testsuite/gas/i386/avx10_2-512-sm4-intel.d: Ditto.
* testsuite/gas/i386/avx10_2-512-sm4.d: Ditto.
* testsuite/gas/i386/avx10_2-512-sm4.s: Ditto.
* testsuite/gas/i386/avx10_2-sm4-inval.l: Ditto.
* testsuite/gas/i386/avx10_2-sm4-inval.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-sm4-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-sm4.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-sm4.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-sm4-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-sm4.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-sm4.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-sm4-inval.l: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-sm4-inval.s: Ditto.
opcodes/ChangeLog:
* i386-dis-evex.h: Add evex table entry for SM4.
* i386-dis.h: Ditto.
* i386-opc.h: (i386_cpu): Move AVX10.2 to CPU_FLAGS_COMMON.
* i386-opc.tbl: Add SM4 EVEX instructions.
* i386-init.h: Regenerated.
* i386-tbl.h: Ditto.
As soon as I committed Zhaoxin's patch, I realized that I did not
include the regen file. Regenerate them and commit as obvious.
opcodes/ChangeLog:
* i386-tbl.h: Regenerated.
* i386-mnem.h: Ditto.
* i386-init.h: Ditto.
In the patch, in order to support ymm rounding for AVX10.2, we derive
evex attribute for all cases instead of only for rc_none to encode U bit.
Also changed some bad_opcode return due to the share of U bit with APX_F.
gas/ChangeLog:
* config/tc-i386.c
(cpu_flags_match): Handle AVX10_2.
(build_evex_prefix): Handle U bit. Derive evex attribute
for all cases.
(check_VecOperands): Handle AVX10.2 and ymm roundings.
* doc/c-i386.texi: Document .avx10.2.
* testsuite/gas/i386/i386.exp: Run AVX10.2 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-rounding-intel.d: New test.
* testsuite/gas/i386/avx10_2-rounding-inval.l: Ditto.
* testsuite/gas/i386/avx10_2-rounding-inval.s: Ditto.
* testsuite/gas/i386/avx10_2-rounding.d: Ditto.
* testsuite/gas/i386/avx10_2-rounding.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding.s: Ditto.
opcodes/ChangeLog:
* i386-dis.c (struct instr_info): Add U bit.
(get_valid_dis386): Handle U bit.
* i386-gen.c (isa_dependencies): Add AVX10.2.
(cpu_flags): Ditto.
* i386-init.h: Regenerated.
* i386-opc.h (CpuAVX10_2): New.
(i386_cpu_flags): Add cpuavx10_2.
* i386-opc.tbl: Add rounding to old entries which do not
permit rounding previously. Also eliminate the redundant
RegXMM for vcvtps2uqq.
* i386-tbl.h: Regenerated.
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:
1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
author I haven't committed, 'Kalray SA.', to cover gas testsuite
files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
First of all we want to also accumulate its reverse dependencies, such
that we can use them in cpu_flags_match(). This is in particular in
preparation of APX additions, such that e.g. BMI VEX-encoding templates
can become combined VEX/EVEX ones.
Once we have the reverse dependencies, we can further leverage them to
omit explicit "&x64" from any insn templates dealing with 64-bit-mode-
only ISA extensions. Besides helping readability for several insn
templates we already have, this will also help with what is going to be
added for APX (as all of the new templates would otherwise need to have
"&x64").
Note that rather than leaving a meaningless CPU_64_FLAGS (which is
unused anyway), its emitting is now also suppressed.
Now that CpuLM is used solely in cpu_arch_flags and cpu_arch[] while
Cpu64 is solely used in insn templates, they no longer need to be
treated different from other "ordinary" flags; the only "unusual" one
left if CpuNo64. Fold both, leaving just Cpu64.
Since this is merely a re-branding of certain AVX512* features, there's
little code to be added.
The main aspect here are new testcases. In order to be able to re-use
some of the existing testcases, several of them need their start symbols
adjusted. Note that 256- and 128-bit tests want adding here, as these
need to work right away. Subsequently they'll gain vector length
constraints.
Since it was missing and is wanted here, also add an AVX512VL+VPOPCNTDQ
test.
These probably should have been put in place already anyway, but they're
very much wanted in order to then put AVX10.1 support on top. Note that
to avoid reverse dependencies towards SSE (just like we already do for
AVX and XOP), add_isa_dependencies() needs some further tweaking.
While there also address a related anomaly: Disabling AES but neither
AVX nor VAES (similarly for {,V}PCLMULQDQ) would better keep the 128-bit
VEX-encoded forms available. Note that for this the VAES insns are moved
past the AVX+AES ones, to avoid the property-11 test suddenly failing.
The test really is wrong, but let's not also make things inconsistent:
Without the movement, YMM use would be correctly recorded for the
128-bit forms simply because the first template already matches, as long
as VAES wasn't disabled. Yet it still wouldn't be if only AVX+AES were
enabled. Nor would behavior here then be the same as for VPCLMUL* insns.
The name we use internally isn't in line with the SDM, and also isn't in
line with CpuVPCLMULQDQ. Add the missing suffix, but of course leave
alone user facing names.
The table constantly growing in two dimensions (number of table entries
times number of ISA extension flags) doesn't scale very well. Use a more
compact representation: Only identifiers which need to combine with
other identifiers retain individual flag bits. All others are combined
into an enum, with a new helper added to transform the table entries
into the original i386_cpu_flags layout. This way the table in the final
binary shrinks by almost a third (the generated source code shrinks by
about half), and isn't likely to grow again in that dimension any time
soon.
While moving the 3DNow! fields, drop the stray inner 'a' from their
names.
The feature isn't universally available on 64-bit CPUs.
Note that in i386-gen.c:isa_dependencies[] I'm only adding it to models
where I'm certain the functionality exists. For Nocona and Core I'm
uncertain in particular.
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
TSXLDTRK takes RTM as a prereq. Additionally introduce an umbrella "tsx"
extension option covering both RTM and HLE, paralleling the "abm" one we
already have.
SEV-ES is an extension to SVME. SNP in turn is an extension to SEV-ES,
and yet in turn RMPQUERY is a SNP extension.
Note that cpu_arch[] has no SNP entry, so CPU_ANY_SNP_FLAGS remains
unused (just like CPU_SNP_FLAGS already is).
Like various other features AMX-TILE takes XSAVE as a prereq.
XSAVES, unconditionally using compacted format, in turn effectively
takes XSAVEC as a prereq (an SDM clarification to this effect is in the
works).
Like AVX512-FP16, several other extensions require wider than 16-bit
mask registers. As a result they take AVX512BW as a prereq, not (just)
AVX512F. Which in turn points out wrong expectations in the noavx512-1
testcase.
SSE itself takes FXSR as a prereq. Like AES, PCLMUL, and SHA both GFNI
and KL take SSE2 as a prereq, for operating on packed integers. And
while correcting KL also record it as a prereq to WIDEKL.
Getting both forward and reverse ISA dependencies right / consistent has
been a permanent source of mistakes. Reduce what needs specifying
manually to just the direct forward dependencies. Transitive forward
dependencies as well as reverse ones are now derived and hence cannot go
out of sync anymore (at least in the vast majority of cases; there are a
few special cases to still take care of manually). In the course of this
several CPU_ANY_*_FLAGS disappear, requiring adjustment to the
assembler's cpu_arch[].
Note that to retain the correct reverse dependency of AVX512F wrt
AVX512-VP2INTERSECT, the latter has the previously missing AVX512F
prereq added.
Note further that to avoid adding the following undue prereqs:
* ATHLON, K8, and AMDFAM10 gain CMOV and FXSR,
* IAMCU gains 387,
auxiliary table entries (including a colon-separated modifier) are
introduced in addition to the ones representing from converting the old
table.
To maintain forward-only dependencies between AVX (XOP) and SSE* (SSE4a)
(i.e. "nosse" not disabling AVX), reverse dependency tracking is
artifically suppressed.
As a side effect disabling of SSE or SSE2 will now also disable AES,
PCLMUL, and SHA (respective elements were missing from
CPU_ANY_SSE2_FLAGS).