Avoid the alignment hackery necessary when obstack_alloc is used.
obstack_alloc expands to obstack_blank plus obstack_finish, and the
latter call is where alignment of the tail of the obstack happens.
The docs say obstack_alloc "is invoked almost like malloc", which
implies a fixed size allocation and you don't need other obstack calls
in its use. So I think trying to use obstack_alloc in frag_alloc was
always a poor choice.
* frags.c (frag_alloc): Replace obstack_alloc with obstack_blank.
"the weird alignment hackery" comment doesn't help anyone understand
the code. Explain what is going on. Replace the zero length
obstack_alloc with obstack_finish, which by inspection of obstack.h is
all the zero length alloc does.
* frags.c (frag_alloc): Comment. Replace zero length
obstack_alloc with obstack_finish.
(frag_new): Remove unnecessary obstack_finish.
* write.c (compress_frag, compress_debug): Likewise.
Many frag_var calls have unnecessary casts on arguments, no doubt from
the days when binutils was written for K&R C. (ie. functions were not
prototyped so you needed to cast anything that didn't match the
expected type after default promotions, as you still do for args
matching a function ellipsis.) Remove those casts.
* config/tc-alpha.c (s_alpha_comm): Use offset_T for cur_size
to avoid need for casts. Remove casts from frag_var args.
* config/tc-ia64.c (obj_elf_vms_common): Remove casts from
frag_var args.
* config/tc-m32r.c (m32r_scomm): Likewise.
* config/tc-m68hc11.c (build_jump_insn): Likewise.
(build_dbranch_insn): Likewise.
* config/tc-m68k.c (md_assemble): Likewise.
* config/tc-microblaze.c (microblaze_s_lcomm): Likewise.
* config/tc-mmix.c (s_loc): Likewise.
* config/tc-ppc.c (ppc_elf_lcomm, ppc_comm): Likewise.
* config/tc-score.c (s3_s_score_lcomm): Likewise.
* config/tc-score7.c (s7_s_score_lcomm): Likewise.
* config/tc-sh.c (sh_cons_align): Likewise.
* config/tc-sparc.c (s_reserve, s_common): Likewise.
(sparc_cons_align): Likewise.
* config/tc-tic4x.c (tic4x_seg_alloc, tic4x_bss): Likewise.
* config/tc-tic54x.c (tic54x_bss, tic54x_space): Likewise.
(tic54x_usect, tic54x_field): Likewise.
* config/tc-tic6x.c (s_tic6x_scomm): Likewise.
* config/tc-v850.c (v850_offset, v850_comm): Likewise.
* frags.c (frag_align, frag_align_pattern, frag_align_code): Likewise.
* gen-sframe.c (output_sframe_row_entry): Likewise.
(output_sframe_funcdesc): Likewise.
* read.c (s_fill, do_org, s_space, emit_leb128_expr): Likewise.
* symbols.c (colon)): Likewise.
On x86, MAX_MEM_FOR_RS_ALIGN_CODE is 35, when the most common
alignment is 2**3 or 2**4, where the max memory required for the
alignment nops is 7 and 15 bytes respectively. So there is some
memory wasted since commit 83d94ae428. It's not a large amount,
especially considering that frag overhead on x86_46 is 144 bytes,
but even so I'd rather not be blamed for increasing gas memory usage.
So to reduce the memory we'd like to take the alignment into
consideration when initialising an rs_align_code frag. The only
difficulty here is start_bundle making an rs_align_code frag with an
alignment of zero initially, then later increasing the alignment. We
change that to use the bundle alignment when setting up the frag. I
think that is sufficient as bundle_align_p2 can't change in the middle
of a start_bundle/finish_bundle sequence.
I haven't modified any targets other than x86 in this patch. Most
won't benefit much due to using fairly small MAX_MEM_FOR_RS_ALIGN_CODE.
* read.c (start_bundle): Create rs_align_code frag with
bundle_align_p2 alignment, then set to zero alignment.
(finish_bundle): Adjust comment.
* frags.c (MAX_MEM_FOR_RS_ALIGN_CODE): Pass p2align and max
to macro.
* config/tc-i386.h (HANDLE_ALIGN): Assert that max_bytes is
sufficient for nop padding.
(max_mem_for_rs_align_code): New inline function.
(MAX_MEM_FOR_RS_ALIGN_CODE): Use it.
* config/tc-aarch64.h: Adjust MAX_MEM_FOR_RS_ALIGN_CODE.
* config/tc-alpha.h: Likewise.
* config/tc-arc.h: Likewise.
* config/tc-arm.h: Likewise.
* config/tc-epiphany.h: Likewise.
* config/tc-frv.h: Likewise.
* config/tc-ia64.h: Likewise.
* config/tc-kvx.h: Likewise.
* config/tc-loongarch.h: Likewise.
* config/tc-m32r.h: Likewise.
* config/tc-metag.h: Likewise.
* config/tc-mips.h: Likewise.
* config/tc-nds32.h: Likewise.
* config/tc-ppc.h: Likewise.
* config/tc-riscv.h: Likewise.
* config/tc-rl78.h: Likewise.
* config/tc-rx.h: Likewise.
* config/tc-score.h: Likewise.
* config/tc-sh.h: Likewise.
* config/tc-sparc.h: Likewise.
* config/tc-spu.h: Likewise.
* config/tc-tilegx.h: Likewise.
* config/tc-tilepro.h: Likewise.
* config/tc-visium.h: Likewise.
* config/tc-xtensa.h: Likewise.
avr, kvx, metag, mn10300, nds32, v850, visium, and wasm32 targets
defined HANDLE_ALIGN without defining MAX_MEM_FOR_RS_ALIGN_CODE. This
can result in a rather large chunk of memory being allocated. Fix
that by a combination of changing the default allocation to one byte
and/or defining a target MAX_MEM_FOR_RS_ALIGN_CODE.
arm wanted to write out the entire set of nops, and limited allowed
code alignment to 64 bytes to prevent large memory allocations.
Fix that by making use of the fact that rs_align_code frags repeat
fr_var bytes at fr_literal + fr_fix to fill out the required area.
Fix metag, nds32 and kvx too, which it seems copied either arm or x86
in similarly not making use of repeating patterns.
It's worth mentioning that my tidy to kvx changed the order of nop
bundles, placing a short bundle first rather than last.
epiphany was totally broken in that uninitialised data was written out
for any alignment requiring more than three bytes of fill.
ppc created a new frag to handle a branch over a large number of nops.
This saves 4 bytes per rs_align_code frag, and most times the branch
isn't used so it is generally a win for memory usage, but I figured
the extra code complexity wasn't worth it. So that code of mine goes.
visium copied the same scheme, so that goes too.
This leaves x86 as the only target making large allocations for
alignment frags.
* frags.c (MAX_MEM_FOR_RS_ALIGN_CODE): Default to 1.
* config/tc-aarch64.c (aarch64_handle_align): Remove always true
condition.
* config/tc-aarch64.h (MAX_MEM_FOR_RS_ALIGN_CODE): Move to be
adjacent to HANDLE_ALIGN define.
* config/tc-arm.c (arm_handle_align): Allow alignment of more
than MAX_MEM_FOR_RS_ALIGN_CODE bytes. Just write one repeat
of nop pattern to frag.
(arm_frag_align_code): Delete function.
* config/tc-arm.h (MAX_MEM_ALIGNMENT_BYTES): Don't define.
(MAX_MEM_FOR_RS_ALIGN_CODE): Set to 7.
(md_do_align): Don't define.
(arm_frag_align_code): Don't declare.
* config/tc-epiphany.c (epiphany_handle_align): Correct frag
so that nop_pattern repeats rather than random data.
* config/tc-epiphany.h (MAX_MEM_FOR_RS_ALIGN_CODE): Define.
* config/tc-kvx.c (kvx_make_nops): Merge into..
(kvx_handle_align): ..here. Put short nop bundle first,
followed by repeated full nop bundle.
* config/tc-kvx.h (MAX_MEM_FOR_RS_ALIGN_CODE): Define.
* config/tc-m32c.h (HANDLE_ALIGN, MAX_MEM_FOR_RS_ALIGN_CODE):
Don't define.
* config/tc-metag.c (metag_handle_align): Just write one
repeat of nop pattern to frag.
* config/tc-metag.h (MAX_MEM_FOR_RS_ALIGN_CODE): Define.
* config/tc-nds32.c (nds32_handle_align): Just write one
repeat of nop pattern to frag.
* config/tc-nds32.h (MAX_MEM_FOR_RS_ALIGN_CODE): Define.
* config/tc-ppc.c (ppc_handle_align): Don't make a new frag
for branch.
* config/tc-ppc.h (MAX_MEM_FOR_RS_ALIGN_CODE): Increase to 8.
* config/tc-visium.c (visium_handle_align): Don't make a new
frag for branch.
* config/tc-visium.h (MAX_MEM_FOR_RS_ALIGN_CODE): Define.
* config/tc-wasm32.h (HANDLE_ALIGN): Don't define.
* testsuite/gas/epiphany/nop.d,
* testsuite/gas/epiphany/nop.s: New test.
* testsuite/gas/epiphany/allinsn.exp: Run it.
* testsuite/gas/kvx/nop-align.d: Adjust.
Besides it being somewhat off to have three decls scattered across the
code base, it is generally bad practice for the definition of a symbol
to not also observe its declaration (making sure the two won't go out of
sync).
This adds the section to HANDLE_ALIGN args, so that the frag created
by the ppc backend can be properly allocated on the frag obstack.
I've added an extra param to frag_alloc too, for cases where we know
the frag requires at least some bytes in fr_literal. This simplifies
some existing code, for example in compress_debug and relax_segment.
In the case of the relax_segment code, I think we may have had a bug
there in using obstack_blank_fast, which doesn't check that the frag
has room.
* config/tc-ppc.c (ppc_handle_align): Add section param,
use frag obstack to allocate frag.
* config/tc-ppc.h (HANDLE_ALIGN, ppc_handle_align): Add extra
param.
* config/tc-aarch64.h (HANDLE_ALIGN): Add extra param.
* config/tc-alpha.h: Likewise.
* config/tc-arc.h: Likewise.
* config/tc-arm.h: Likewise.
* config/tc-avr.h: Likewise.
* config/tc-epiphany.h: Likewise.
* config/tc-frv.h: Likewise.
* config/tc-i386.h: Likewise.
* config/tc-ia64.h: Likewise.
* config/tc-kvx.h: Likewise.
* config/tc-loongarch.h: Likewise.
* config/tc-m32c.h: Likewise.
* config/tc-m32r.h: Likewise.
* config/tc-metag.h: Likewise.
* config/tc-mips.h: Likewise.
* config/tc-mn10300.h: Likewise.
* config/tc-nds32.h: Likewise.
* config/tc-riscv.h: Likewise.
* config/tc-rl78.h: Likewise.
* config/tc-rx.h: Likewise.
* config/tc-sh.h: Likewise.
* config/tc-sparc.h: Likewise.
* config/tc-spu.h: Likewise.
* config/tc-tilegx.h: Likewise.
* config/tc-tilepro.h: Likewise.
* config/tc-v850.h: Likewise.
* config/tc-visium.h: Likewise.
* config/tc-wasm32.h: Likewise.
* config/tc-xtensa.h: Likewise.
* frags.h (frag_alloc): Update prototype.
* frags.c (frag_alloc): Add extra size param, allocate extra.
(frag_new): Update.
* subsegs.c (subseg_set_rest): Update frag_alloc call.
* write.c: Formatting.
(cvt_frag_to_fill): Pass sec to HANDLE_ALIGN.
(compress_frag): Update frag_alloc call.
(compress_debug): Use new frag_alloc to simplify frag sizing.
(relax_segment): Likewise.
Avoid any possibility of signed overflow. (Seen on oss-fuzz).
* frags.c (totalfrags): Make unsigned.
(get_frag_count): Return unsigned.
* frags.h (get_frag_count): Likewise.
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:
1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
author I haven't committed, 'Kalray SA.', to cover gas testsuite
files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
This:
.struct -1
x:
.fill 1
y:
results in an internal error in frag_new due to abs_section_offset
wrapping from -1 to 0. Frags in the absolute section don't do much so
I think we can allow the address wrap.
* frags.c (frag_new): Allow address wrap in absolute section.
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.
The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
All symbols, sizes and relocations in this section are octets instead of
bytes. Required for DWARF debug sections as DWARF information is
organized in octets, not bytes.
bfd/
* section.c (struct bfd_section): New flag SEC_ELF_OCTETS.
* archures.c (bfd_octets_per_byte): New parameter sec.
If section is not NULL and SEC_ELF_OCTETS is set, one octet es
returned [ELF targets only].
* bfd.c (bfd_get_section_limit): Provide section parameter to
bfd_octets_per_byte.
* bfd-in2.h: regenerate.
* binary.c (binary_set_section_contents): Move call to
bfd_octets_per_byte into section loop. Provide section parameter
to bfd_octets_per_byte.
* coff-arm.c (coff_arm_reloc): Provide section parameter
to bfd_octets_per_byte.
* coff-i386.c (coff_i386_reloc): likewise.
* coff-mips.c (mips_reflo_reloc): likewise.
* coff-x86_64.c (coff_amd64_reloc): likewise.
* cofflink.c (_bfd_coff_link_input_bfd): likewise.
(_bfd_coff_reloc_link_order): likewise.
* elf.c (_bfd_elf_section_offset): likewise.
(_bfd_elf_make_section_from_shdr): likewise.
Set SEC_ELF_OCTETS for sections with names .gnu.build.attributes,
.debug*, .zdebug* and .note.gnu*.
* elf32-msp430.c (rl78_sym_diff_handler): Provide section parameter
to bfd_octets_per_byte.
* elf32-nds.c (nds32_elf_get_relocated_section_contents): likewise.
* elf32-ppc.c (ppc_elf_addr16_ha_reloc): likewise.
* elf32-pru.c (pru_elf32_do_ldi32_relocate): likewise.
* elf32-s12z.c (opru18_reloc): likewise.
* elf32-sh.c (sh_elf_reloc): likewise.
* elf32-spu.c (spu_elf_rel9): likewise.
* elf32-xtensa.c (bfd_elf_xtensa_reloc): likewise
* elf64-ppc.c (ppc64_elf_brtaken_reloc): likewise.
(ppc64_elf_addr16_ha_reloc): likewise.
(ppc64_elf_toc64_reloc): likewise.
* elflink.c (bfd_elf_final_link): likewise.
(bfd_elf_perform_complex_relocation): likewise.
(elf_fixup_link_order): likewise.
(elf_link_input_bfd): likewise.
(elf_link_sort_relocs): likewise.
(elf_reloc_link_order): likewise.
(resolve_section): likewise.
* linker.c (_bfd_generic_reloc_link_order): likewise.
(bfd_generic_define_common_symbol): likewise.
(default_data_link_order): likewise.
(default_indirect_link_order): likewise.
* srec.c (srec_set_section_contents): likewise.
(srec_write_section): likewise.
* syms.c (_bfd_stab_section_find_nearest_line): likewise.
* reloc.c (_bfd_final_link_relocate): likewise.
(bfd_generic_get_relocated_section_contents): likewise.
(bfd_install_relocation): likewise.
For section which have SEC_ELF_OCTETS set, multiply output_base
and output_offset with bfd_octets_per_byte.
(bfd_perform_relocation): likewise.
include/
* coff/ti.h (GET_SCNHDR_SIZE, PUT_SCNHDR_SIZE, GET_SCN_SCNLEN),
(PUT_SCN_SCNLEN): Adjust bfd_octets_per_byte calls.
binutils/
* objdump.c (disassemble_data): Provide section parameter to
bfd_octets_per_byte.
(dump_section): likewise
(dump_section_header): likewise. Show SEC_ELF_OCTETS flag if set.
gas/
* as.h: Define SEC_OCTETS as SEC_ELF_OCTETS if OBJ_ELF.
* dwarf2dbg.c: (dwarf2_finish): Set section flag SEC_OCTETS for
.debug_line, .debug_info, .debug_abbrev, .debug_aranges, .debug_str
and .debug_ranges sections.
* write.c (maybe_generate_build_notes): Set section flag
SEC_OCTETS for .gnu.build.attributes section.
* frags.c (frag_now_fix): Don't divide by OCTETS_PER_BYTE if
SEC_OCTETS is set.
* symbols.c (resolve_symbol_value): Likewise.
ld/
* ldexp.c (fold_name): Provide section parameter to
bfd_octets_per_byte.
* ldlang (init_opb): New argument s. Set opb_shift to 0 if
SEC_ELF_OCTETS for the current section is set.
(print_input_section): Pass current section to init_opb.
(print_data_statement,print_reloc_statement,
print_padding_statement): Likewise.
(lang_check_section_addresses): Call init_opb for each
section.
(lang_size_sections_1,lang_size_sections_1,
lang_do_assignments_1): Likewise.
(lang_process): Pass NULL to init_opb.
Targets such as xtensa incur a much higher overhead to resolve
location view numbers than e.g. x86, because the expressions used to
compute view numbers cannot be resolved soon enough.
Each view number is computed by incrementing the previous view, if
they are both at the same address, or by resetting it to zero
otherwise. If PV is the previous view number, PL is its location, and
NL is the location of the next view, its number is computed by
evaluating NV = !(NL > PL) * (PV + 1).
set_or_check_view uses resolve_expression to decide whether portions
of this expression can be simplified to constants. The (NL > PL)
subexpression is one that can often be resolved to a constant,
breaking chains of view number computations at instructions of nonzero
length, but not after alignment that might be unnecessary.
Alas, when nearly every frag ends with a relaxable instruction,
frag_offset_fixed_p will correctly fail to determine a known offset
between two unresolved addresses in neighboring frags, so the
unresolved symbolic operation will be constructed and used in the
computation of most view numbers. This results in very deep
expressions.
As view numbers get referenced in location view lists, each operand in
the list goes through symbol_clone_if_forward_ref, which recurses on
every subexpression. If each view number were to be referenced, this
would exhibit O(n^2) behavior, where n is the depth of the view number
expressions, i.e., the length of view number sequences without an
early resolution that cuts the expression short.
This patch enables address compares used by view numbering to be
resolved even when exact offsets are not known, using new logic to
determine when the location either remained the same or changed for
sure, even with the possibility of relaxation. This enables most view
number expressions to be resolved with a small, reasonable depth.
PR gas/24444
* frags.c (frag_gtoffset_p): New.
* frags.h (frag_gtoffset_p): Declare it.
* expr.c (resolve_expression): Use it.
ommit 3ae729d5a4
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Mar 7 04:18:45 2018 -0800
x86: Rewrite NOP generation for fill and alignment
increased MAX_MEM_FOR_RS_ALIGN_CODE to 4095 which resulted in increase
of assembler time and memory usage by 5 times for inputs with many
.p2align directives, which is typical for LTO output. This patch passes
max_bytes to TC_FRAG_INIT so that MAX_MEM_FOR_RS_ALIGN_CODE can be set
as needed and tracked by backend it so that HANDLE_ALIGN can check the
maximum alignment for each rs_align_code frag. Wall time to assemble
the same cc1plus.s:
before:
423.78user 0.89system 7:05.71elapsed 99%CPU
after:
102.35user 0.27system 1:42.89elapsed 99%CPU
PR gas/24165
* frags.c (frag_var_init): Pass max_chars to TC_FRAG_INIT as
max_bytes.
* config/tc-aarch64.h (TC_FRAG_INIT): Add and pass max_bytes to
aarch64_init_frag.
* /config/tc-arm.h (TC_FRAG_INIT): And and pass max_bytes to
arm_init_frag.
* config/tc-avr.h (TC_FRAG_INIT): And and ignore max_bytes.
* config/tc-ia64.h (TC_FRAG_INIT): Likewise.
* config/tc-mmix.h (TC_FRAG_INIT): Likewise.
* config/tc-nds32.h (TC_FRAG_INIT): Likewise.
* config/tc-ns32k.h (TC_FRAG_INIT): Likewise.
* config/tc-rl78.h (TC_FRAG_INIT): Likewise.
* config/tc-rx.h (TC_FRAG_INIT): Likewise.
* config/tc-score.h (TC_FRAG_INIT): Likewise.
* config/tc-tic54x.h (TC_FRAG_INIT): Likewise.
* config/tc-tic6x.h (TC_FRAG_INIT): Likewise.
* config/tc-xtensa.h (TC_FRAG_INIT): Likewise.
* config/tc-i386.h (MAX_MEM_FOR_RS_ALIGN_CODE): Set to
(alignment ? ((1 << alignment) - 1) : 1)
(i386_tc_frag_data): Add max_bytes.
(TC_FRAG_INIT): Add and track max_bytes.
(HANDLE_ALIGN): Replace MAX_MEM_FOR_RS_ALIGN_CODE with
fragP->tc_frag_data.max_bytes.
* doc/internals.texi: Update TC_FRAG_TYPE with max_bytes.
Use size_t in a few places involved with obstacks, and don't include
obstack.h in files that don't use obstacks.
gas/
* config/bfin-parse.y: Don't include obstack.h.
* config/obj-aout.c: Likewise.
* config/obj-coff.c: Likewise.
* config/obj-som.c: Likewise.
* config/tc-bfin.c: Likewise.
* config/tc-i960.c: Likewise.
* config/tc-rl78.c: Likewise.
* config/tc-rx.c: Likewise.
* config/tc-tic4x.c: Likewise.
* expr.c: Likewise.
* listing.c: Likewise.
* config/obj-elf.c (elf_file_symbol): Make name_length a size_t.
* config/tc-aarch64.c (symbol_locate): Likewise.
* config/tc-arm.c (symbol_locate): Likewise.
* config/tc-mmix.c (mmix_handle_mmixal): Make len_0 a size_t.
* config/tc-score.c (s3_build_score_ops_hsh): Make len a size_t.
(s3_build_dependency_insn_hsh): Likewise.
* config/tc-score7.c (s7_build_score_ops_hsh): Likewise.
(s7_build_dependency_insn_hsh): Likewise.
* frags.c (frag_grow): Make parameter a size_t, and use size_t locals.
(frag_new): Make parameter a size_t.
(frag_var_init): Make max_chars and var parameters size_t.
(frag_var, frag_variant): Likewise.
(frag_room): Return a size_t.
(frag_align_pattern): Make n_fill parameter a size_t.
* frags.h: Update function prototypes.
* symbols.c (save_symbol_name): Make name_length a size_t.