gdb: disassembler opcode display formatting

This commit changes the format of 'disassemble /r' to match GNU objdump. Specifically, GDB will now display the instruction bytes in as 'objdump --wide --disassemble' does. Here is an example for RISC-V before this patch: (gdb) disassemble /r 0x0001018e,0x0001019e Dump of assembler code from 0x1018e to 0x1019e: 0x0001018e <call_me+66>: 03 26 84 fe lw a2,-24(s0) 0x00010192 <call_me+70>: 83 25 c4 fe lw a1,-20(s0) 0x00010196 <call_me+74>: 61 65 lui a0,0x18 0x00010198 <call_me+76>: 13 05 85 6a addi a0,a0,1704 0x0001019c <call_me+80>: f1 22 jal 0x10368 <printf> End of assembler dump. And here's an example after this patch: (gdb) disassemble /r 0x0001018e,0x0001019e Dump of assembler code from 0x1018e to 0x1019e: 0x0001018e <call_me+66>: fe842603 lw a2,-24(s0) 0x00010192 <call_me+70>: fec42583 lw a1,-20(s0) 0x00010196 <call_me+74>: 6561 lui a0,0x18 0x00010198 <call_me+76>: 6a850513 addi a0,a0,1704 0x0001019c <call_me+80>: 22f1 jal 0x10368 <printf> End of assembler dump. There are two differences here. First, the instruction bytes after the patch are grouped based on the size of the instruction, and are byte-swapped to little-endian order. Second, after the patch, GDB now uses the bytes-per-line hint from libopcodes to add whitespace padding after the opcode bytes, this means that in most cases the instructions are nicely aligned. It is still possible for a very long instruction to intrude into the disassembled text space. The next example is x86-64, before the patch: (gdb) disassemble /r main Dump of assembler code for function main: 0x0000000000401106 <+0>: 55 push %rbp 0x0000000000401107 <+1>: 48 89 e5 mov %rsp,%rbp 0x000000000040110a <+4>: c7 87 d8 00 00 00 01 00 00 00 movl $0x1,0xd8(%rdi) 0x0000000000401114 <+14>: b8 00 00 00 00 mov $0x0,%eax 0x0000000000401119 <+19>: 5d pop %rbp 0x000000000040111a <+20>: c3 ret End of assembler dump. And after the patch: (gdb) disassemble /r main Dump of assembler code for function main: 0x0000000000401106 <+0>: 55 push %rbp 0x0000000000401107 <+1>: 48 89 e5 mov %rsp,%rbp 0x000000000040110a <+4>: c7 87 d8 00 00 00 01 00 00 00 movl $0x1,0xd8(%rdi) 0x0000000000401114 <+14>: b8 00 00 00 00 mov $0x0,%eax 0x0000000000401119 <+19>: 5d pop %rbp 0x000000000040111a <+20>: c3 ret End of assembler dump. Most instructions are aligned, except for the very long instruction. Notice too that for x86-64 libopcodes doesn't request that GDB group the instruction bytes. This matches the behaviour of objdump. In case the user really wants the old behaviour, I have added a new modifier 'disassemble /b', this displays the instruction byte at a time. For x86-64, which never groups instruction bytes, /b and /r are equivalent, but for RISC-V, using /b gets the old layout back (except that the whitespace for alignment is still present). Consider our original RISC-V example, this time using /b: (gdb) disassemble /b 0x0001018e,0x0001019e Dump of assembler code from 0x1018e to 0x1019e: 0x0001018e <call_me+66>: 03 26 84 fe lw a2,-24(s0) 0x00010192 <call_me+70>: 83 25 c4 fe lw a1,-20(s0) 0x00010196 <call_me+74>: 61 65 lui a0,0x18 0x00010198 <call_me+76>: 13 05 85 6a addi a0,a0,1704 0x0001019c <call_me+80>: f1 22 jal 0x10368 <printf> End of assembler dump. Obviously, this patch is a potentially significant change to the behaviour or /r. I could have added /b with the new behaviour and left /r alone. However, personally, I feel the new behaviour is significantly better than the old, hence, I made /r be what I consider the "better" behaviour. The reason I prefer the new behaviour is that, when I use /r, I almost always want to manually decode the instruction for some reason, and having the bytes displayed in "instruction order" rather than memory order, just makes this easier. The 'record instruction-history' command also takes a /r modifier, and has been modified in the same way as disassemble; /r gets the new behaviour, and /b has been added to retain the old behaviour. Finally, the MI command -data-disassemble, is unchanged in behaviour, this command now requests the raw bytes of the instruction, which is equivalent to the /b modifier. This means that the MI output will remain backward compatible.
2025-12-27 01:28:46 +00:00 · 2022-06-21 20:23:35 +01:00
parent d309a8f9b3
commit d4ce49b7ac
8 changed files with 109 additions and 16 deletions
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -457,7 +457,7 @@ gdb_pretty_print_disassembler::pretty_print_insn (const struct disasm_insn *insn
 	throw ex;
      }

-    if (flags & DISASSEMBLY_RAW_INSN)
+    if ((flags & (DISASSEMBLY_RAW_INSN | DISASSEMBLY_RAW_BYTES)) != 0)
      {
 	/* Build the opcodes using a temporary stream so we can
 	   write them out in a single go for the MI.  */
@@ -467,14 +467,51 @@ gdb_pretty_print_disassembler::pretty_print_insn (const struct disasm_insn *insn
 	m_opcode_data.resize (size);
 	read_code (pc, m_opcode_data.data (), size);

-	for (int i = 0; i < size; ++i)
+	/* The disassembler provides information about the best way to
+	   display the instruction bytes to the user.  We provide some sane
+	   defaults in case the disassembler gets it wrong.  */
+	const struct disassemble_info *di = m_di.disasm_info ();
+	int bytes_per_line = std::max (di->bytes_per_line, size);
+	int bytes_per_chunk = std::max (di->bytes_per_chunk, 1);
+
+	/* If the user has requested the instruction bytes be displayed
+	   byte at a time, then handle that here.  Also, if the instruction
+	   is not a multiple of the chunk size (which probably indicates a
+	   disassembler problem) then avoid that causing display problems
+	   by switching to byte at a time mode.  */
+	if ((flags & DISASSEMBLY_RAW_BYTES) != 0
+	    || (size % bytes_per_chunk) != 0)
+	  bytes_per_chunk = 1;
+
+	/* Print the instruction opcodes bytes, grouped into chunks.  */
+	for (int i = 0; i < size; i += bytes_per_chunk)
 	  {
 	    if (i > 0)
 	      m_opcode_stb.puts (" ");
-	    m_opcode_stb.printf ("%02x", (unsigned) m_opcode_data[i]);
+
+	    if (di->display_endian == BFD_ENDIAN_LITTLE)
+	      {
+		for (int k = bytes_per_chunk; k-- != 0; )
+		  m_opcode_stb.printf ("%02x", (unsigned) m_opcode_data[i + k]);
+	      }
+	    else
+	      {
+		for (int k = 0; k < bytes_per_chunk; k++)
+		  m_opcode_stb.printf ("%02x", (unsigned) m_opcode_data[i + k]);
+	      }
+	  }
+
+	/* Calculate required padding.  */
+	int nspaces = 0;
+	for (int i = size; i < bytes_per_line; i += bytes_per_chunk)
+	  {
+	    if (i > size)
+	      nspaces++;
+	    nspaces += bytes_per_chunk * 2;
 	  }

 	m_uiout->field_stream ("opcodes", m_opcode_stb);
+	m_uiout->spaces (nspaces);
 	m_uiout->text ("\t");
      }