[gdb/symtab] Cover letter -- Lazy expansion of full symbol table

2025-12-27 01:28:46 +00:00 · 2021-06-13 08:42:40 +02:00
parent 013270a16a
commit 1c7ef55252
1 changed files with 104 additions and 0 deletions
--- a/104
+++ b/104
@@ -0,0 +1,104 @@
+[gdb/symtab] Cover letter -- Lazy expansion of full symbol table
+
+[ I'm not posting the experimental patch series as such for now.  Available
+here ( https://github.com/vries/gdb/commits/lazy-full-symtab-v3 ). ]
+
+In PR23710, the stated problem is that gdb is slow and memory hungry when
+consuming debug information generated by GCC with LTO.
+
+I. Measurements.
+
+Taking the range of final releases 8.1.1 to 10.2, as well as a recent trunk
+commit (3633d4fb446), and an experiment using cc1:
+...
+$ gdb -q -batch cc1 -ex "b do_rpo_vn"
+...
+we get:
+...
+---------+----------------+
+| Version | real (seconds) |
+---------+----------------+
+| 8.1.1   | 9.42           |
+| 8.2.1   | - (PR23712)    |
+| 8.3.1   | 9.31           |
+| 9.2     | 8.50           |
+| 10.2    | 5.86           |
+| trunk   | 6.36           |
+---------+----------------+
+...
+which is nice progress in the releases.  The regression on trunk since 10.2
+has been filed as PR27937.
+
+[ The 10.2 score can be further improved to 5.23, by setting dwarf
+max-cache-age to 1000.  Defaults to 5, see PR25703. ]
+
+However, the best score is still more than a factor 3 slower than lldb:
+...
+-------------+----------------+
+| Version     | real (seconds) |
+-------------+----------------+
+| gdb 10.2    | 5.86           |
+| lldb 10.0.1 | 1.74           |
+-------------+----------------+
+...
+
+II. Analysis.
+
+Breaking down the 10.2 time of 5.86, we have:
+...
+-----------------+----------------+
+|                 | real (seconds) |
+-----------------+----------------+
+| Minimal symbols | 0.18           |
+| Partial symbols | 2.34           |
+| Full symbols    | 3.34           |
+-----------------+----------------+
+...
+
+So:
+- the minimal symbols and partial symbols are processed for all CUs, while
+  the full symbols are processed only for the necessary CUs
+- still the majority of the time is spent for the full symbols
+
+This is due to the combination of:
+- the one-CU-at-a-time strategy of gdb, and
+- the code generation for LTO which combines several CUs into an
+  artificial CU.
+In other words, LTO increases the scope of processing from individual CUs to
+larger artificial CUs, and consequently things take much longer.
+
+III. Proposed solution.
+
+A way to fix this is to do processing of the full symbols in a lazy fashion.
+
+This patch series implements a first attempt at this.
+
+IV. How to implement.
+
+The current state of trunk is that expanding full symbols is a two part
+process:
+- a builder is created during expansion
+- after expansion the builder is destroyed after delivering the
+  end result: a symbol table.
+
+The problem is that we need a way to do this gradually instead:
+- expand a few symbols
+- get the corresponding symbol table
+- expand a few more symbols
+- get the updated symbol table containing all expanded symbols.
+
+This patch series takes the following approach: it throws away incomplete full
+symbols when it needs to expand more symbols.
+
+V. Resulting performance improvement.
+
+REMEASURE!!!
+
+With current trunk (commit 987610f2d68), we get 3.44, instead of the 6.44
+without this patch series.
+
+VI. Patch series.
+
+The patch series consists of:
+
+[ Output of "git log --reverse --pretty=%s origin/master..HEAD". ]