binutils-gdb

Go to file

Andrew Burgess 6deb7a8185 gdb/disasm: better intel flavour disassembly styling with Pygments

This commit was inspired by this stackoverflow post:

  https://stackoverflow.com/questions/73491793/why-is-there-a-%C2%B1-in-lea-rax-rip-%C2%B1-0xeb3

One of the comments helpfully links to this Python test case:

  from pygments import formatters, lexers, highlight

  def colorize_disasm(content, gdbarch):
      try:
          lexer = lexers.get_lexer_by_name("asm")
          formatter = formatters.TerminalFormatter()
          return highlight(content, lexer, formatter).rstrip().encode()
      except:
          return None

  print(colorize_disasm("lea [rip+0x211]  # COMMENT", None).decode())

Run the test case and you should see that the '+' character is
underlined, and could be confused with a combined +/- symbol.

What's happening is that Pygments is failing to parse the input text,
and the '+' is actually being marked in the error style.  The error
style is red and underlined.

It is worth noting that the assembly instruction being disassembled
here is an x86-64 instruction in the 'intel' disassembly style, rather
than the default att style.  Clearly the Pygments module expects the
att syntax by default.

If we change the test case to this:

  from pygments import formatters, lexers, highlight

  def colorize_disasm(content, gdbarch):
      try:
          lexer = lexers.get_lexer_by_name("asm")
          lexer.add_filter('raiseonerror')
          formatter = formatters.TerminalFormatter()
          return highlight(content, lexer, formatter).rstrip().encode()
      except:
          return None

  res = colorize_disasm("lea rax,[rip+0xeb3] # COMMENT", None)
  if res:
      print(res.decode())
  else:
      print("No result!")

Here I've added the call: lexer.add_filter('raiseonerror'), and I am
now checking to see if the result is None or not.  Running this and
the test now print 'No result!' - instead of styling the '+' in the
error style, we instead give up on the styling attempt.

There are two things we need to fix relating to this disassembly
text.  First, Pygments is expecting att style disassembly, not the
intel style that this example uses.  Fortunately, Pygments also
supports the intel style, all we need to do is use the 'nasm' lexer
instead of the 'asm' lexer.

However, this leads to the second problem; in our disassembler line we
have '# COMMENT'.  The "official" Intel disassembler style uses ';'
for its comment character, however, gas and libopcodes use '#' as the
comment character, as gas uses ';' for an instruction separator.

Unfortunately, Pygments expects ';' as the comment character, and
treats '#' as an error, which means, with the addition of the
'raiseonerror' filter, that any line containing a '#' comment, will
not get styled correctly.

However, as the i386 disassembler never produces a '#' character other
than for comments, we can easily "fix" Pygments parsing of the
disassembly line.  This is done by creating a filter.  This filter
looks for an Error token with the value '#', we then change this into
a comment token.  Every token after this (until the end of the line)
is also converted into a comment.

In this commit I do the following:

  1. Check the 'disassembly-flavor' setting and select between the
  'asm' and 'nasm' lexers based on the setting.  If the setting is not
  available then the 'asm' lexer is used by default,

  2. Use "add_filter('raiseonerror')" to ensure that the formatted
  output will not include any error text, which would be underlined,
  and might be confusing,

  3. If the 'nasm' lexer is selected, then add an additional filter
  that will format '#' and all other text on the line, as a comment,
  and

  4. If Pygments throws an exception, instead of returning None,
  return the original, unmodified content.  This will mean that this
  one instruction is printed without styling, but GDB will continue to
  call into the Python code to style later instructions.

I haven't included a test specifically for the above error case,
though I have manually check that the above case now styles
correctly (with no underline).  The existing style tests check that
the disassembler styling still works though, so I know I've not
generally broken things.

One final thought I have after looking at this issue is that I wonder
now if using Pygments for styling disassembly from every architecture
is actually a good idea?

Clearly, the 'asm' lexer is OK with att style x86-64, but not OK with
intel style x86-64, so who knows how well it will handle other random
architectures?

When I first added this feature I tested it against some random
RISC-V, ARM, and X86-64 (att style) code, and it seemed fine, but I
never tried to make an exhaustive check of all instructions, so its
quite possible that there are corner cases where things are styled
incorrectly.

With the above changes I think that things should be a bit better
now.  If a particular instruction doesn't parse correctly then our
Pygments based styling code will just not style that one instruction.
This is combined with the fact that many architectures are now moving
to libopcodes based styling, which is much more reliable.

So, I think it is fine to keep using Pygments as a fallback mechanism
for styling all architectures, even if we know it might not be perfect
in all cases.

2022-10-02 17:30:04 +01:00

bfd

Automatic date update in version.in

2022-10-02 00:00:18 +00:00

binutils

objcopy: avoid "shadowing" of remove() function name

2022-09-30 10:55:02 +02:00

config

egrep in binutils

2022-09-28 13:37:31 +09:30

contrib

…

cpu

Add markers for 2.39 branch

2022-07-08 10:41:07 +01:00

elfcpp

Add gold support for --package-metadata option.

2022-08-04 17:37:32 -07:00

etc

…

gas

RISC-V: Relax "fmv.[sdq]" requirements

2022-09-30 15:10:27 +00:00

gdb

gdb/disasm: better intel flavour disassembly styling with Pygments

2022-10-02 17:30:04 +01:00

gdbserver

Renenerate {gdb,gdbserver}/configure

2022-09-28 13:06:06 +01:00

gdbsupport

gdbsupport: move fileio_errno_to_host to fileio.{h,cc} and rename

2022-09-21 14:11:03 -04:00

gnulib

gnulib: update to bd11400942d6

2022-05-02 10:54:19 -04:00

gold

egrep in binutils

2022-09-28 13:37:31 +09:30

gprof

Add -B to the help output from gprof, and add suitable documentation.

2022-09-29 13:12:37 +01:00

gprofng

gprofng: fix cppcheck warnings

2022-09-29 22:00:02 -07:00

include

LoongArch: Update ELF e_flags handling according to specification.

2022-09-30 14:00:47 +08:00

intl

egrep in binutils

2022-09-28 13:37:31 +09:30

RISC-V: re-arrange opcode table for consistent alias handling

2022-09-30 10:19:00 +02:00

libbacktrace

…

libctf

libctf: Add ZSTD_LIBS to LIBS so that ac_cv_libctf_bfd_elf can be true

2022-09-26 20:41:42 -07:00

libdecnumber

Merge config/ changes from GCC, to enable DFP on AArch64

2022-05-24 10:47:29 +01:00

libiberty

Add markers for 2.39 branch

2022-07-08 10:41:07 +01:00

opcodes

RISC-V: Relax "fmv.[sdq]" requirements

2022-09-30 15:10:27 +00:00

readline

gdb/readline: fix extra 'quit' message problem

2022-05-07 10:49:27 +01:00

sim

sim: Link ZSTD_LIBS

2022-09-27 11:42:32 -07:00

texinfo

…

zlib

Regenerate with automake-1.15.1

2022-07-09 20:10:47 +09:30

.cvsignore

…

.editorconfig

…

.gitattributes

binutils-gdb/git: highlight whitespace errors in source files

2022-07-25 14:35:41 +01:00

.gitignore

…

ar-lib

…

ChangeLog

Maintainer mode: wrong gettext version?

2022-09-08 10:03:04 +01:00

compile

…

config-ml.in

…

config.guess

…

config.rpath

…

config.sub

…

configure

binutils, gdb: support zstd compressed debug sections

2022-09-26 19:50:13 -07:00

configure.ac

binutils, gdb: support zstd compressed debug sections

2022-09-26 19:50:13 -07:00

COPYING

…

COPYING3

…

COPYING3.LIB

…

COPYING.LIB

…

COPYING.LIBGLOSS

…

COPYING.NEWLIB

…

depcomp

…

djunpack.bat

…

install-sh

…

libtool.m4

…

lt~obsolete.m4

…

ltgcc.m4

…

ltmain.sh

…

ltoptions.m4

…

ltsugar.m4

…

ltversion.m4

…

MAINTAINERS

…

Makefile.def

…

Makefile.in

Pass PKG_CONFIG_PATH down from top-level Makefile

2022-04-08 10:56:41 -04:00

Makefile.tpl

Pass PKG_CONFIG_PATH down from top-level Makefile

2022-04-08 10:56:41 -04:00

makefile.vms

…

missing

…

mkdep

…

mkinstalldirs

…

move-if-change

…

multilib.am

…

README

…

README-maintainer-mode

Maintainer mode: wrong gettext version?

2022-09-08 10:03:04 +01:00

setup.com

…

src-release.sh

…

symlink-tree

…

test-driver

…

ylwrap

…

README

		   README for GNU development tools

This directory contains various GNU compilers, assemblers, linkers, 
debuggers, etc., plus their support routines, definitions, and documentation.

If you are receiving this as part of a GDB release, see the file gdb/README.
If with a binutils release, see binutils/README;  if with a libg++ release,
see libg++/README, etc.  That'll give you info about this
package -- supported targets, how to use it, how to report bugs, etc.

It is now possible to automatically configure and build a variety of
tools with one command.  To build all of the tools contained herein,
run the ``configure'' script here, e.g.:

	./configure 
	make

To install them (by default in /usr/local/bin, /usr/local/lib, etc),
then do:
	make install

(If the configure script can't determine your type of computer, give it
the name as an argument, for instance ``./configure sun4''.  You can
use the script ``config.sub'' to test whether a name is recognized; if
it is, config.sub translates it to a triplet specifying CPU, vendor,
and OS.)

If you have more than one compiler on your system, it is often best to
explicitly set CC in the environment before running configure, and to
also set CC when running make.  For example (assuming sh/bash/ksh):

	CC=gcc ./configure
	make

A similar example using csh:

	setenv CC gcc
	./configure
	make

Much of the code and documentation enclosed is copyright by
the Free Software Foundation, Inc.  See the file COPYING or
COPYING.LIB in the various directories, for a description of the
GNU General Public License terms under which you can copy the files.

REPORTING BUGS: Again, see gdb/README, binutils/README, etc., for info
on where and how to report problems.

Languages

C 50.6%

Makefile 22.6%

Assembly 13.2%

C++ 5.9%

Roff 1.5%

Other 5.6%