Skip to content

Symbols map generator performance improvements#4934

Draft
tarek-y-ismail wants to merge 3 commits into
mainfrom
symbols-map-generator-performance-improvements
Draft

Symbols map generator performance improvements#4934
tarek-y-ismail wants to merge 3 commits into
mainfrom
symbols-map-generator-performance-improvements

Conversation

@tarek-y-ismail
Copy link
Copy Markdown
Contributor

TODO

Previously, process_directory() called clang.cindex.Index.create() inside
the per-file loop, creating a brand-new libclang context for every header.
Each fresh Index discards all internal state (file-content caches, identifier
tables) accumulated while processing previous files.

Move the Index creation outside the loop so that libclang can share that
internal state across all translation units parsed in the same directory
pass.  The unnecessary per-iteration args.copy() is removed at the same time.
Pass PARSE_SKIP_FUNCTION_BODIES to clang.cindex.Index.parse().  Symbol
extraction only needs declarations — function/method names, types, and
virtual-ness — so parsing inline function bodies is wasted work.

This option tells libclang to skip the body of every function definition it
encounters, which meaningfully reduces parse time for headers that contain
non-trivial inline implementations (e.g. template helpers, lambdas in
headers).
…essPoolExecutor

Previously, process_directory() iterated over header files serially.
Parsing individual headers is embarrassingly parallel — each file is
independent — so the loop is replaced with concurrent.futures.ProcessPoolExecutor.

A top-level _parse_single_header() worker function is introduced (closures
are not picklable, which is required by ProcessPoolExecutor).  Each worker
creates its own clang.cindex.Index and inherits the already-configured
libclang SO path from the parent process via fork (Linux default).

On an 8-core machine parsing the 56 miral public headers this gives close
to an 8x reduction in wall time for process_directory().

TODO annotations are also added throughout for the remaining improvements
identified in the performance analysis:

tools/symbols_map_generator/main.py:
  - has_vtable: base-class lookup uses node.semantic_parent.get_children()
    instead of node.get_children(); memoisation by cursor USR.
  - search_class_hierarchy_for_virtual_thunk: memoisation by USR pair.
  - traverse_ast: move clang_Location_isInSystemHeader guard to the top of
    the function before any attribute access.
  - process_directory: umbrella-header approach (parse once, filter by file).
  - process_directory: PARSE_PRECOMPILED_PREAMBLE for a future watch mode.

src/miral/check-and-update-debian-symbols.py,
src/miroil/check-and-update-debian-symbols.py:
  - Replace the c++filt subprocess with in-process demangling (cxxfilt).
  - Pipe dpkg-gensymbols output directly instead of via a /tmp file.
  - Remove the ALL default from the regenerate-*-debian-symbols CMake target
    so the check only runs on explicit request.
@tarek-y-ismail tarek-y-ismail self-assigned this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant