Symbols map generator performance improvements#4934
Draft
tarek-y-ismail wants to merge 3 commits into
Draft
Conversation
Previously, process_directory() called clang.cindex.Index.create() inside the per-file loop, creating a brand-new libclang context for every header. Each fresh Index discards all internal state (file-content caches, identifier tables) accumulated while processing previous files. Move the Index creation outside the loop so that libclang can share that internal state across all translation units parsed in the same directory pass. The unnecessary per-iteration args.copy() is removed at the same time.
Pass PARSE_SKIP_FUNCTION_BODIES to clang.cindex.Index.parse(). Symbol extraction only needs declarations — function/method names, types, and virtual-ness — so parsing inline function bodies is wasted work. This option tells libclang to skip the body of every function definition it encounters, which meaningfully reduces parse time for headers that contain non-trivial inline implementations (e.g. template helpers, lambdas in headers).
…essPoolExecutor
Previously, process_directory() iterated over header files serially.
Parsing individual headers is embarrassingly parallel — each file is
independent — so the loop is replaced with concurrent.futures.ProcessPoolExecutor.
A top-level _parse_single_header() worker function is introduced (closures
are not picklable, which is required by ProcessPoolExecutor). Each worker
creates its own clang.cindex.Index and inherits the already-configured
libclang SO path from the parent process via fork (Linux default).
On an 8-core machine parsing the 56 miral public headers this gives close
to an 8x reduction in wall time for process_directory().
TODO annotations are also added throughout for the remaining improvements
identified in the performance analysis:
tools/symbols_map_generator/main.py:
- has_vtable: base-class lookup uses node.semantic_parent.get_children()
instead of node.get_children(); memoisation by cursor USR.
- search_class_hierarchy_for_virtual_thunk: memoisation by USR pair.
- traverse_ast: move clang_Location_isInSystemHeader guard to the top of
the function before any attribute access.
- process_directory: umbrella-header approach (parse once, filter by file).
- process_directory: PARSE_PRECOMPILED_PREAMBLE for a future watch mode.
src/miral/check-and-update-debian-symbols.py,
src/miroil/check-and-update-debian-symbols.py:
- Replace the c++filt subprocess with in-process demangling (cxxfilt).
- Pipe dpkg-gensymbols output directly instead of via a /tmp file.
- Remove the ALL default from the regenerate-*-debian-symbols CMake target
so the check only runs on explicit request.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TODO