Skip to content

CoreJust/StackGraphExporter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StackGraph Exporter

A CLI utility to generate stack graphs from source code folders. It allows to write the generated stack graph into DOT or JSON files, or to convert them into CFL-r task - where you can generate corresponding DOT/CSV graphs and CFG/Kotlin grammar. CFG grammar and CSV graphs are generated in the format supported by KotGLL, Kotlin grammar and DOT graphs are generated in the format supported by UCFS.

Suported languages: Java (complete support) and Python (partial support, only for Stack Graph generation).

Implementation notes

For Java, some features of the language are not supported (because they are not supported by the crate tree-sitter-stack-graphs-java):

  1. Static imports
  2. Static scopes (i.e. static { /* some code */ } in the class scope)
  3. Imports with asterisks (import some.package.*)
  4. C-style arrays
  5. Some comments, e.g.
public int f(int v,
// int deprecated,
    int q);

You can automatically remove those features from the code with this utility (look for the --remove-unsupported flag below).

Sample workflow

> cargo run -- open ../JsonPath/json-path/src/main/java/com/jayway/jsonpath
[00:00:05] [########################################] 100% [5834ms] Stack graph built successfully
success: Loaded project at ../JsonPath/json-path/src/main/java/com/jayway/jsonpath

> e ucfs
info: Enabled ucfs

> q evaluate
[00:00:01] [########################################] 100% [1632ms] SGGraph built successfully
[00:00:04] [########################################] 100% [3962ms] Database built successfully
[00:00:00] [########################################] 100% [339ms] Indexed nodes at partial path start
[00:00:00] [########################################] 100% [18ms] Found 103 references and 33 definitions for symbol 'evaluate'
info: Found 33 references:
  [0] node 17125 at ../JsonPath/json-path/src/main/java/com/jayway/jsonpath\internal\filter\EvaluatorFactory.java:60:58
  [1] node 17265 at ../JsonPath/json-path/src/main/java/com/jayway/jsonpath\internal\filter\EvaluatorFactory.java:67:60
  ...
  [32] node 114196 at ../JsonPath/json-path/src/main/java/com/jayway/jsonpath\JsonPath.java:348:51
Enter index to resolve (or 'a' for all, empty to cancel):

> 13
[00:00:00] [########################################] 100% [2ms] Paths stitched successfully
info: [0] Node 71622 resolves to 2 definitions:
  - ../JsonPath/json-path/src/main/java/com/jayway/jsonpath\internal\path\CompiledPath.java:105:29 local_id 1373
  - ../JsonPath/json-path/src/main/java/com/jayway/jsonpath\internal\path\CompiledPath.java:90:29 local_id 1071
[00:00:00] [########################################] 100% [189ms] CFL graph built successfully
[00:00:00] [########################################] 100% [23ms] UCFS query grammar generated
info: UCFS query DOT generated at .\query.cfl_ucfs.dot
info: UCFS query grammar generated at .\UCFSGrammar.kt

> exit

CLI

To run the CLI, you can enter <evecutable> open <path-to-source-files-root>, which loads the code into a stack graph and starts interactive mode where you can run commands to configure the project, generate artifacts, run queries.

<executable> here can be either the actual executable or cargo run --, which allows to build and run with the following arguments.

Run <executable> open --help to see all available flags.

Available commands for open:

  1. Loading flags:
      --remove-unsupported

Enables removal of the aforementioned unsupported language features before loading the code into the stackgraph. Note that it modifies the code and is likely to break it (so the code will not compile later), so make sure to have a backup of the code. After the code is clean of those features, next time the removal stage will be skipped (you can force it by removing the autogenerated .unsupported_features_cleaned file).

Currently only supported for Java.

  1. Language choice (the language of the code to be loaded into stack graph):
  -j, --java
  -p, --python

By default Java is assumed.

  1. Backend choice:
      --kotgll
      --ucfs

By default both are disabled and all queries run solely against stack graphs.

When KotGLL is enabled, the corresponding artifacts are produced and KotGLL jar is invoked directly, then it's output is parsed and printed.

For UCFS, only the corresponding artifacts (Kotling grammar and DOT graph) are produced which you need to integrate with UCFS yourself.

  1. KotGLL-related flags:
      --sppf
      --kotgll-path <KOTGLL_PATH>

SPPF enables output in the format of SPPF. KotGLL path is required to be provided when using KotGLL (it must be an executable JAR).

  1. Artifact generation flags:
      --cfg
      --csv
      --stack-graph-dot or --sg-dot
      --dot-ucfs
      --kt
      --stack-graph-json or --sg-json

Enable generation of corresponding artifacts (those not marked with stack-graph or sg are for CFL).

Note that the artifacts are not generated automatically, you need to either run create without arguments later in interactive mode or make an immediate query (see Immediate query below).

  1. Artifact output path flags:
  -o, --output <OUTPUT>
      --output-cfg <OUTPUT_CFG>
      --output-csv <OUTPUT_CSV>
      --output-stack-graph-dot or --output-sg-dot <OUTPUT_STACK_GRAPH_DOT>
      --output-dot-ucfs <OUTPUT_DOT_UCFS>
      --output-kt <OUTPUT_KT>
      --output-stack-graph-json or --output-sg-json <OUTPUT_STACK_GRAPH_JSON>

-o, --output sets directory for all the artifacts, others override paths for specific artifacts. By default, directory for all the artifacts is set to ./.

  1. Immediate query flags:
  -s, --symbol <SYMBOL>
      --source <SOURCE>

Immediately generates all the requested artifacts.

For --source you have to specify full path to the symbol (<path-to-file>:<line>:<column>) and then the query is immediately executed, then app exits.

For --symbol you have to specify symbol name and then you enter the query mode (see Query mode below), after which the app exits.

      --pick-queries <COUNT>

Immediately runs queries for a large amount of symbols and picks of them so that they produce more complex queries than on the average, then generates a stack graph exporter queries (SGEQ) file. If is greater than the number of symbols in the project, only the (number of symbols) are generated.

Incompatible with other immediate queries.

Stores the output in ./queries.sgeq in the format of:

{
  "project_path": "<path to the project root>",
  "stack_graph": {
    "built_in": <duration in milliseconds>,
    "vertices": <vertices count in the graph>,
    "edges": <edges count in the graph>,
    "symbols": <symbols count in the graph>,
  },
  "partial_database_built_in": <duration in milliseconds>,
  "cfl_graph": {
    "path": "<path to the generated graph file>",
    "built_in": <duration in milliseconds>, // includes grammar build time
    "file_size": <graph file size in bytes>,
    "vertices": <vertices count in the graph>,
    "edges": <edges count in the graph>
  },
  "cfl_graph_simplified": {
    "path": "<path to the generated graph file>",
    "built_in": <duration in milliseconds>, // includes grammar build time
    "file_size": <graph file size in bytes>,
    "vertices": <vertices count in the graph>,
    "edges": <edges count in the graph>
  },
  "cfl_grammar": {
    "path": "<path to the generated grammar file>",
    "file_size": <grammar file size in bytes>,
    "rules": <rules count in the grammar>
  },
  "queries": [
    {
      "symbol": {
        "name": "<symbol name>",
        "cfl_index": <node index in CFL graph (the one you need to make queries)>,
        "cfl_index_simplified": <node index in simplified CFL graph (the one you need to make queries)>,
        "file": "<the file the symbol is located at>",
        "line": <the line the symbol is located at within the file>,
        "column": <the column the symbol is located at within the line>
      },
      "resolved_to": [
        {
          <file, line, column as in symbol above>
        },
        ...
      ],
      "resolution_time": [ <list of resolution duration in milliseconds, 7 items> ]
    },
    ...
  ]
}

For grammar files a placeholder is added: <placeholder nt=\"{start_symbol}\"/>. To query some symbol X you need to remove .asStart() from that line if it's there and add a new line val Q by Nt(Term(\"push_{X}\") * {start_symbol} * Term(\"pop_{X}\")).asStart() right before or after the line with placeholder.

For graph files, there is no need for a placeholder. Just add one or more lines start -> {n}; after { to run queries from nodes with index {n} (or do not add those to run the query against the whole graph).

  1. Other flags:
      --verify

Enables verification. When verification is enabled and the query is done with KotGLL, the results are parsed and compared to the results produced by stack graphs. For UCFS verification is not implemented yet.

      --all-symbols

By default, in the query mode you you only see nodes which are at the beginning of at least one partial path. You can disable this behaviour with this flag and see all the nodes for the symbol you requested.

Note: even in medium-sized projects one symbol might have hundreds or thousands of nodes. Filtering them by having at least one partial path can reduce the number by several times. It was verified emperically that nodes without partial paths are resolbed to nothing. But it must be further investigated.

      --simplify-cfl

Currently produced CFL graphs have a lot of epsilon edges. Some might be easily pruned, which is enabled with this flag.

  -v, --verbose
  -h, --help

Self-explanatory. The former enables verbose output, the latter prints help information.

Interactive mode

Available commands:

create, c [<artifact>]
clean [<artifact>]
query, q, run, r <symbol>
enable, e <feature>
disable, d <feature>
output, o [<artifact>] <path>
state, s
help, h
quit, exit, halt

There is a known issue that UCFS artifacts might get corrupted upon sequential queries within same session. It can be solved with running a clean.

Query mode

Triggered by either using --symbol argument or running a query in interactive mode.

First, all the nodes that correspond to the symbol are found. Then, they are filtered by having at least one partial path that begins in a node (or not filtered if all-symbols feature is enabled, see --all-symbols above).

Those nodes are shown to the user and the user is given the choice: either query for all the nodes at once (enter a) or query for one specific node (enter that node's index in the list shown).

Then the query is run depending on enabled backends and features.

Examples

About

A CLI utility to export stack graphs from source code, run queries against them, convert them to CFL, run queries for CFL using KotGLL or UCFS

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages