A reproducible benchmark showing that CI architecture, not just SHA pinning, is what materially limits supply-chain attacks like CVE-2025-30066 (tj-actions/changed-files).
| Tier | Architecture | Score | Annual Cost |
|---|---|---|---|
| 1 | No security | 10/100 | $0 |
| 2 | SHA-pinned (typical AI advice) | 20/100 | $0 |
| 3 | Trusted Release Boundary | 75/100 | $0 |
| 4 | Enterprise (egress + attestation) | 83/100 | enterprise-style overhead |
Tier 3 closes the largest security gap at zero tooling cost.
Most CI hardening advice stops at:
- pin actions by SHA
- reduce
GITHUB_TOKENpermissions - add selective hardening later
Those are useful, but they do not solve the core problem:
If untrusted third-party code runs in the same job as secrets and release authority, a compromised action can still steal secrets, poison artifacts, and exfiltrate data.
This benchmark isolates that exact question by running the same malicious action against four different workflow architectures.
A simulated malicious GitHub Action, modeled on the behavior class exposed by CVE-2025-30066, ran the same six attack behaviors in every tier:
- Environment variable dumping
GITHUB_TOKENpermission probing- Process memory access checks
- Network exfiltration attempts
- Artifact poisoning
- Source enumeration
The only changing variable was the workflow design.
- Fork this repository
- Add four dummy secrets
- Create a
productionenvironment with required reviewers - Run the workflows in order
- Compare the artifact hashes and logs
Step-by-step reproduction guide ->
| Rule | Name | Purpose |
|---|---|---|
| 0 | PIN | Use immutable SHA references for all external actions |
| 1 | QUARANTINE | Untrusted lane gets no secrets and no write authority |
| 2 | ISOLATE | Trusted lane is separate and first-party only |
| 3 | REBUILD | Trusted lane rebuilds from source on a fresh runner |
| 4 | ARTIFACT QUARANTINE | Only metadata crosses the boundary, never untrusted binaries |
| 5 | VALIDATE | Outputs crossing the boundary are explicitly sanitized |
GitHub-hosted runners built the artifacts from Linux checkouts with LF line endings. That normalized source hash is the ground truth for clean artifact comparison:
$ sha256sum src/app.js
c4657bc50ab6be26c54354f5304097ead527c46dbf2d72e0efbc35b1727b5988 src/app.js- Tier 1 scores
- Tier 2 scores
- Tier 3 scores
- Tier 4 scores
- Comparison table
- Pre-registered expected results
- Anomaly review
- This repo is public, so source exposure impact is capped relative to a private-repo benchmark
- The malicious action is a controlled simulation, not a live attacker
- Windows
CRLFvs LinuxLFnormalization affected raw local hashes; artifact verification used the normalized Linux hash above - Tier 4 initially failed attestation until Sigstore endpoints were allowlisted through the hardened release egress policy
.github/actions/malicious-tool/ simulated compromised action
.github/workflows/ baseline and tier workflows
evidence/ extracted logs, artifacts, score sheets, comparison table
scripts/ local verification helpers
src/app.js deterministic source input
REPRODUCE.md end-to-end rerun instructions
RESULTS.md narrative benchmark results
PRESENTATION.md short presentation / video notes
MIT