feat(manifest): --facts mode emits Socket facts JSON for Gradle projects (REA-442)#1318
Draft
Jeppe Fredsgaard Blaabjerg (jfblaa) wants to merge 13 commits into
Conversation
Generates a per-(sub)project .socket.facts.json describing the resolved compile/runtime dependency graph, matching the canonical SocketFacts schema consumed by depscan's SBOM_Resolve pipeline. The new init script (socket-facts.init.gradle) registers a socketFacts task on every project, walks resolvable compile/runtime classpaths via LenientConfiguration so unresolved deps degrade gracefully instead of failing the build, dedupes Gradle's variant artifacts (java-classes-directory vs jar) into one component per logical Maven coordinate, and classifies prod/dev correctly (prod-seen-anywhere wins; production deps are no longer flagged dev just because test classpaths inherit them). Output filename matches Coana's .socket.facts.json so the depscan **/*.socket.facts.json glob picks both pre- and post-reachability versions out of a scan tarball. The TS wrapper (convert-gradle-to-facts.mts) routes --facts through the project's ./gradlew by default; the init script is bundled into dist/ alongside the existing pom-generating init.gradle. Verified end-to-end across JDK 8/Gradle 5.6.4, JDK 11/Gradle 6.9.4, JDK 17/Gradle 7.6.4, JDK 21/Gradle 8.10.2, JDK 21/Gradle 9.2.1 — outputs byte-identical after JSON normalization across the matrix on the test fixtures. Test fixtures cover single-module Java, multi-module with project deps, and an unresolvable-dep scenario. The integration test (socket-facts-init-gradle.e2e.test.mts) is e2e-only and auto-skips when no gradle binary is on PATH; CI doesn't yet install JDK/Gradle. Open follow-ups (tracked in REA-442): - AGP-aware variant config discovery and an Android fixture - Kotlin Multiplatform fixture to exercise kotlin.targets.compilations - Mirror --facts onto cmd-manifest-kotlin and the auto-detect path - Promote the sdkman matrix sweep to a CI job
…urface tests The previous commit added an e2e test that runs the socket-facts init script against real Gradle fixtures. That's broader coverage than the sibling `socket manifest gradle` / `kotlin` / `auto` commands currently have in CI — those are tested only via `--help` snapshot and a `--dry-run` short-circuit, and never actually invoke gradle. Bring the new `--facts` mode in line: keep the help-text snapshot (already covers the new flag) and add a `--facts --dry-run` case that mirrors the existing dry-run test pattern. Removes the e2e test and the gradle-facts fixtures; drops the matching .gitignore entries that no longer have anywhere to apply. The matrix sweep and integration coverage stay as an open follow-up in REA-442 — to be picked up alongside `setup-java`/`setup-gradle` in CI if/when we want any of the gradle commands actually exercised end-to-end.
The socketFacts init script was adding the project being scanned as a component in its own .socket.facts.json. With no parent edges this shows up downstream as `orphaned component not reachable from any direct dependency`. The project is the SBOM target, not one of its own dependencies — drop the root node entirely and let `directIds` carry the first-level edges. afterEvaluate is no longer needed since project coordinates were only used to populate that root entry.
cmd-manifest-kotlin.mts: add the --facts flag (also overridable via defaults.manifest.gradle.facts in socket.json) and route through convertGradleToFacts when set. Test pair (--help snapshot + --facts --dry-run) mirrors what we did for cmd-manifest-gradle. generate_auto_manifest.mts: when defaults.manifest.gradle.facts is true, the auto-detect path now generates Socket facts instead of pom files, matching what the explicit subcommands do. Brings the new mode to feature parity with the existing pom path, which is exposed through gradle, kotlin and auto.
`socket manifest setup` now asks whether gradle should emit Socket facts instead of pom.xml files, writing the answer to defaults.manifest.gradle.facts in socket.json. The prompt sits next to the existing --verbose toggle and follows the same yes/no/leave-default ternary shape.
…adle in e2e CI Brings back the e2e test and gradle-facts fixtures we dropped earlier in this branch and wires `setup-java@v4` + `gradle/actions/setup-gradle@v4` into .github/workflows/e2e-tests.yml so the test actually exercises the init script on every PR (it was previously auto-skipping for lack of a gradle binary). Fixtures keep the guava 31.1-jre / slf4j 1.7.36 pins so resolution stays clean across Gradle 5.6.4 through 9.2.1 in the local sdkman matrix. The e2e CI uses Gradle 9.2.1 / JDK 21 as a single baseline; wider Gradle version coverage in CI is still tracked as a follow-up. Also restores the .gitignore entries for the .gradle/, build/, .socket.facts.json, and pom.xml outputs that integration runs produce.
Adds an android-library fixture (AGP 8.7.3, compileSdk 34) and the minimum machinery the init script needs to scrape AGP-flavored classpaths without blowing up. Two changes to socket-facts.init.gradle: - Skip configurations matching *AndroidTest* (instrumented tests). Their resolution needs device-vs-host target attributes the init script doesn't set, and they fail before producing useful data. - Wrap per-configuration resolution in try/catch. AGP unit-test classpaths (releaseUnitTestCompileClasspath etc.) pull in the project's own debugApiElements, which exposes multiple variants (android-classes-jar, r-class-jar, android-lint, ...); without consumer-side build-type attributes we hit "variant ambiguity" errors. We log "[socket-facts] skipping <cfg>: ..." and continue so other classpaths still produce output. Production (release + debug compile/runtime) variants resolve fine. The e2e test skips the Android case when neither ANDROID_HOME nor ANDROID_SDK_ROOT is set — same auto-skip posture as the rest of the gradle suite. Asserts that androidx.annotation:annotation is captured as a direct dep, confirming AGP variant configs are being walked. Still pending: principled discovery via androidComponents.onVariants (AGP 7+) or android.libraryVariants — current name-pattern matching catches Android variant configs by suffix and gets the job done, but isn't AGP-aware in the strict sense.
A minimal KMP project (jvm + js targets) exercises per-target compile and runtime classpaths (jvmMainCompileClasspath, jsTestRuntimeClasspath, ...) that aren't surfaced through Java's SourceSetContainer. Our name-pattern selection picks them up by suffix. The fixture pulls kotlinx-serialization-core (commonMain) so it shows up in both jvm and js target variants of the artifact, and slf4j-api (jvmMain-only) to confirm target-specific classpaths flow through. The test asserts both deps are present in the resulting components array.
911fa30 to
bfd09fb
Compare
Widens what `socket manifest gradle --facts` emits: instead of
whitelisting only `*CompileClasspath` / `*RuntimeClasspath`
configurations and silently dropping the rest, we now walk every
resolvable configuration and tag each artifact with two independent
boolean flags:
- `dev: true` ← artifact only ever appeared in test-named configs
- `tooling: true` ← artifact only ever appeared outside compile/runtime
classpaths (annotation processors, linters,
code-gen plugins, gradle plugin internals)
"Only ever" means the inverse semantic: if a dep also appears in a
non-test classpath, `dev` is cleared; if it also appears in a
compile/runtime classpath, `tooling` is cleared. So a dep that shows
up as both an `api` and an `annotationProcessor` ends up flagged as
neither dev nor tooling — the production usage wins.
Motivation: downstream reachability scanners (depscan) want to
suppress reachability analysis for tooling artifacts, while still
including them in the SBOM for non-reachability alerts (malware,
license, supply chain). This was previously impossible because the
script dropped tooling deps entirely.
Schema: relies on a new `tooling: z.boolean().optional()` on
SF_ArtifactSchema in depscan, separate work. The cli side emits the
field regardless; older consumers that ignore it stay unaffected.
Fixture/test: single-module-java now declares
`annotationProcessor 'org.projectlombok:lombok:1.18.30'`, exercising
the tooling path. A new test case asserts lombok is emitted with
tooling=true while guava (api) and junit (testImplementation) are
not.
Effect on existing fixtures:
- single-module-java: 11 components total (10 non-tooling + lombok)
- kotlin-multiplatform: 29 components (11 non-tooling + 18 Kotlin
compiler plugin classpath deps as tooling)
- android-library: 83 components (5 non-tooling, 78 AGP internals
as tooling) — previously these AGP internals were dropped
- multi-module-java, unresolved-deps: unchanged shape
Today both the pom path and the facts path print "(It will show no output, you can use --verbose to see its output)" but --verbose only dumped a captured stdout dump *after* gradle finished. For large multi-project builds (elasticsearch-scale), that means the user stared at a spinner for many minutes with no signal that anything was happening. When --verbose is set, spawn gradle with `stdio: 'inherit'` so the build's stdout/stderr stream live to the user's terminal. The spinner is skipped (would conflict with inherited tty output) and the post-run "Reported exports:" / "POM file copied to:" summary is skipped too — those lines were already visible inline during the streamed run. Non-verbose runs are unchanged: spinner + captured stdout + summary. Also corrects the misleading "(It will show no output, you can use --verbose to see its output)" message to "(No live output. Pass --verbose to stream gradle output instead.)".
`dep.moduleArtifacts` access combined with `.file.isDirectory()` filtering was forcing Gradle to *download* every resolved artifact file. On large multi-project builds (e.g. elasticsearch) this pulled hundreds of MB of distribution archives — .deb / .tar.gz / .zip packaging outputs that some configurations expose as dependencies of the build target itself, not as library deps. User observed: `> :qa:packaging:socketFacts > elasticsearch-7.17.22-amd64.deb > 88.3 MiB/310.4 MiB downloaded` mid-task. Fix: read `artifact.type` / `artifact.extension` / `artifact.classifier` from already-fetched POM/GMM metadata. Never touch `artifact.file` — that's what triggers the actual file download. Replace the `!file.isDirectory()` filter (which forced fetch) with a name-based filter (`INTERNAL_ARTIFACT_TYPES`: java-classes-directory, java-resources-directory, android-classes-directory, android-resources-directory) that drops Gradle-internal variants we don't want to surface as `qualifiers.ext`. Verified locally: - commons-io:commons-io:2.15.1 resolves cleanly under cleared cache, emits qualifiers.ext='jar', no jar in ~/.gradle/caches/modules-2/files-2.1/commons-io after the run - 11/11 e2e fixture tests still green, qualifiers preserved across single-module / multi-module / Android / KMP fixtures
…deps
Previously each subproject's `socketFacts` task emitted its own
`.socket.facts.json`, and intra-project deps (`project(':lib')`)
flowed into every consuming subproject's file with no signal that
they're project-local. Downstream consumers (coana `mvn dependency:get`)
would then try to resolve them against Maven Central and fail because
those coordinates don't publish to a repository.
Restructured to mirror the pattern at
program-analysis/plugins/coana-gradle-script/coana-workspaces.init.gradle:
- shared accumulators (`nodes`, `directIds`, `reportedUnresolved`,
`projectKeys`) live on `gradle.ext.socketFactsState` as
synchronized collections, so --parallel-enabled builds don't race
- per-subproject `socketFactsCollect` tasks resolve that
subproject's configurations and contribute to the shared state
- a root `socketFacts` task depends on every collector (via
`gradle.projectsEvaluated`) and serializes the aggregated graph
to a single `.socket.facts.json` at the build root
Intra-project deps are dropped at visit time: when a ResolvedDependency
matches a known project's `group:name`, we return an empty
producedIds set, don't emit a node, and don't recurse. The externals
those project deps expose are picked up via the intra-project's own
collector instead (Gradle's classpath inheritance gives the consumer
subproject those externals directly, and the producer subproject's
collector emits them with `direct: true` from its own classpath
position).
Verified across the fixture matrix:
- single-module-java: unchanged shape (no intra-project deps)
- multi-module-java: emits ONE file at build root, no
`com.example.socket:lib` or `com.example.socket:app` entries,
guava (lib api) + slf4j (app impl) both present
- unresolved-deps, kotlin-multiplatform, android-library: unchanged
Side benefits: no more empty `.socket.facts.json` files on aggregator
subprojects, and the unresolved-dep warning dedupes across the build.
Zizmor's `impostor-commit` audit failed because the SHA I used (0b6dd653...) was the v4 *tag object* SHA, not the commit it dereferences to. gradle/actions uses nested annotated tags — v4 points to another tag (48b5f213...) which points to commit ed408507eac0... That last hop is what should be pinned. setup-java@v4 was already correct (resolves directly to a commit).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes REA-442.
Summary
Adds a
--factsmode tosocket manifest gradle/kotlin/autothat emits a.socket.facts.jsonfile per (sub)project describing the resolved compile/runtime dependency graph. Output matches the canonicalSocketFactsschema consumed by depscan's SBOM_Resolve pipeline.Highlights
src/commands/manifest/socket-facts.init.gradlewalks resolvable compile/runtime classpath configurations viaLenientConfiguration(handles unresolved deps gracefully), dedupes Gradle's variant artifacts so project deps don't get split acrossjava-classes-directory+jarentries, and classifiesprodvsdevcorrectly (prod-seen-anywhere wins so production deps aren't flagged dev just because test classpaths inherit them).groupId → namespace,artifactId → name,artifact.extension → qualifiers.ext,artifact.classifier → qualifiers.classifier. Strict onMavenQualifiersSchema..socket.facts.jsonto match Coana's convention and depscan's**/*.socket.facts.jsonglob.--factsavailable onsocket manifest gradle,socket manifest kotlin, and the auto-detect path (viadefaults.manifest.gradle.factsinsocket.json).socket manifest setupprompts for the toggle.Local verification
Verified end-to-end across a JDK × Gradle matrix (sdkman):
5 fixtures × 5 versions all produced byte-identical output after JSON normalization.
Test fixtures (added)
single-module-java/—java-library,api/implementation/testImplementation. Versions pinned (guava 31.1-jre, slf4j 1.7.36) to resolve cleanly across all matrix Gradle versions.multi-module-java/— root +lib+appwith aproject(':lib')dep, exercises variant dedup.unresolved-deps/— paired real + intentionally-unresolvable dep, exercises lenientunresolvedModuleDependencies.android-library/— AGP 8.7.3 + compileSdk 34. Exercises per-variant classpaths (debugCompileClasspath,releaseRuntimeClasspath, …). AGP test exists under `describe.skip` when no `ANDROID_HOME` / `ANDROID_SDK_ROOT` is set.kotlin-multiplatform/— JVM + JS targets. Exercises per-target classpaths (jvmMainRuntimeClasspath,jsTestCompileClasspath, …).Tests
cmd-manifest-gradle.test.mts/cmd-manifest-kotlin.test.mts—--helpsnapshot +--dry-run+--facts --dry-runcases. Same shape as existing pom-path tests.socket-facts-init-gradle.e2e.test.mts— 10 tests across the 5 fixtures, asserts schema shape, direct/dev attribution, edge integrity, variant dedup, AGP and KMP coverage. Auto-skips when no `gradle` binary is on PATH. Lives in the e2e suite.CI
.github/workflows/e2e-tests.ymlnow installs Temurin JDK 21 + Gradle 9.2.1 via the official actions so the integration test actually runs in CI. Android sub-test will skip on CI until `android-actions/setup-android@v3` is added — non-Android coverage is exercised regardless.Open follow-ups
Test plan
Note
Medium Risk
Introduces a new Gradle execution path (bundled init script + CLI flag/defaults) and adds networked Gradle-based e2e tests/CI tooling, which can affect build stability and output correctness across Gradle variants.
Overview
Adds
--factsoutput for Gradle projects.socket manifest gradleandsocket manifest kotlinnow accept--facts(andsocket.jsondefaultdefaults.manifest.gradle.facts) to generate.socket.facts.jsoninstead ofpom.xml, andgenerate_auto_manifestroutes detected Gradle projects through this mode when enabled.Implements facts generation via a new bundled Gradle init script. Adds
convertGradleToFacts(invokes Gradle withsocket-facts.init.gradleand tasksocketFacts) and updates the Rollup dist build to shipsocket-facts.init.gradlealongside the existinginit.gradle.Adds integration coverage and CI support. Introduces Gradle e2e tests plus new Gradle/Java/Kotlin/Android fixtures, updates
.gitignorefor generated fixture artifacts, and updates the e2e GitHub workflow to install JDK 21 and Gradle so the new tests run.Reviewed by Cursor Bugbot for commit 911fa30. Configure here.