IR Design and Implementation Plan

### Problem

The current `EncryptedBytesScenario` encrypts and sends raw bytes as BOLT messages.  This mostly tests message parsing, never reaching deeper protocol states.

To fuzz deeper, we need the fuzzer to become structure aware, able to send and receive sequences of valid (or semi-valid) messages that get past the initial parsing code and exercise deeper protocol logic.  Ideally the fuzzer would be smart enough to generate the common message sequences to open channels, send or receive HTLCs, close channels, etc.  Many of these message sequences have dependencies between the messages -- e.g., `commitment_signed` must contain a channel ID that matches a previously-opened channel, as well as signatures generated from the previously negotiated keys and commitment states.  We want the fuzzer to be able to satisfy these dependencies as well.

### Solution

We can use an intermediate representation (IR) to capture the type and structure knowledge needed to fuzz deeper.  The fuzzer can then use this IR to generate and mutate short *programs* to be executed in the Nyx VM.

The following design is inspired by ideas from both [syzkaller](https://github.com/google/syzkaller) and [fuzzamoto](https://github.com/dergoegge/fuzzamoto).

## Architecture

We add a new crate `smite-ir/` to contain the IR, custom mutators, and program generators, and a new scenario `IrScenario<T, S>` under `smite-scenarios/` to execute IR programs in the Nyx VM.

![Image](https://github.com/user-attachments/assets/8f278252-b429-45eb-b32e-ee622eb2647a)

The `smite-ir` crate is intended to be *engine-agnostic* -- no dependency on AFL++ or LibAFL.  We intend to use AFL++ at first with a thin custom mutator wrapper library `libsmite_ir_mutator.so` loaded via `AFL_CUSTOM_MUTATOR_LIBRARY`.  In the future, we may migrate to LibAFL by simply replacing the wrapper.  See "Appendix: Fuzzing Engine Tradeoffs" for details.

Our custom mutators and generators create new `Program`s and serialize them for the fuzzing engine, which then sends each `Program` to the Nyx VM via shared memory.  Inside the VM, our `IrScenario` deserializes the `Program` and executes it line-by-line.  `IrScenario` then checks for and reports any crashes or hangs and resets the VM snapshot before processing the next `Program`.

## Example IR Program

This program executes the channel funding flow up to the point where the target sends `funding_signed`.  We will refer to this example in subsequent sections.

```
# Generate channel keys (6 key pairs)
v0 = GeneratePrivateKey(0x00)
v1 = DerivePoint(v0)
v2 = GeneratePrivateKey(0x01)
v3 = DerivePoint(v2)
v4 = GeneratePrivateKey(0x02)
v5 = DerivePoint(v4)
v6 = GeneratePrivateKey(0x03)
v7 = DerivePoint(v6)
v8 = GeneratePrivateKey(0x04)
v9 = DerivePoint(v8)
v10 = GeneratePrivateKey(0x05)
v11 = DerivePoint(v10)

# Channel parameters
v12 = GenerateTemporaryChannelId(0x00)
v13 = LoadChainHash()
v14 = LoadAmount(100_000)
v15 = LoadAmount(0)
v16 = LoadAmount(546)
v17 = LoadAmount(10_000_000)
v18 = LoadAmount(1_000)
v19 = LoadAmount(1)
v20 = LoadFeeratePerKw(2_500)
v21 = LoadCsvDelay(144)
v22 = LoadFeatures()

# Build and send open_channel
v23 = BuildOpenChannel(v13, v12, v14, v15, v16, v17, v18, v19, v20, v21, v1, v3, v5, v7, v9, v11, v22)
SendMessage(v23)

# Receive accept_channel, extract target's keys
v25 = RecvAcceptChannel()
v26 = ExtractFundingPubkey(v25)
v27 = ExtractFirstPerCommitmentPoint(v25)

# Create funding output via bitcoind
v28 = CreateFundingOutput(v1, v26, v14)

# Sign commitment transaction
v29 = BuildCommitmentTx(v28, v1, v26, v21)
v30 = ComputeCommitmentSighash(v29)
v31 = Sign(v0, v30)

# Build and send funding_created
v32 = BuildFundingCreated(v12, v28, v31)
SendMessage(v32)

# Receive funding_signed
v34 = RecvFundingSigned()
...
```

Key things to notice:
- **SSA form**: each instruction produces at most one variable, numbered by instruction index.  `v1 = DerivePoint(v0)` means instruction 1 takes `v0` as input and produces `v1`.
- **Variable gaps** (no `v24`, `v33`) indicate void instructions like `SendMessage` that have side effects but no output.
- **Compound variables**: `RecvAcceptChannel()` produces `v25`, an `AcceptChannelData` containing all parsed response fields.  `ExtractFundingPubkey(v25)` pulls one field into a primitive `Point` for use by later Build operations.
- **Possible mutations**: `InputSwapMutator` could swap `v3` (`revocation_basepoint`) with `v5` (`payment_basepoint`) in `BuildOpenChannel`, since both are `Point`.  `OperationParamMutator` could change `LoadAmount(100_000)` to `LoadAmount(7_654)`.

---

## Core Concepts

### Program and Instructions

A `Program` is an ordered list of `Instruction`s.  Programs are serialized with [postcard](https://docs.rs/postcard/latest/postcard/) for transport between AFL++ and the VM.  Snapshot state (target pubkey, chain hash, block height, channel keys if a channel is already open) lives in a separate `ProgramContext` that is supplied to the executor at run time.

An `Instruction` is an `Operation` plus input variable indices.  In the example, `BuildFundingCreated(v12, v28, v31)` has operation `BuildFundingCreated` and inputs `[12, 28, 31]`.

### Operations

An `Operation` is one of four categories:

1. **Load**: produce a variable from embedded data or snapshot context (`LoadAmount(100_000)`, `LoadChainHash`, `LoadContextChannelId`).
2. **Compute**: derive a variable from inputs (`DerivePoint`, `Sign`, `HashPaymentPreimage`, `Extract*`).
3. **Build**: construct a BOLT message from inputs (`BuildOpenChannel`, `BuildCommitmentSigned`).
4. **Act**: produce side effects against the target (`SendMessage`, `RecvAcceptChannel`, `MineBlocks`, `Reconnect`).

Embedded literal data lives in the Operation itself (e.g., `LoadAmount(100_000)`, `GeneratePrivateKey(0x05)`).

### Recv Operations

`Recv*` operations (e.g., `RecvAcceptChannel`) read from the connection and produce *compound variables* containing all parsed fields.  The executor's receive loop auto-responds to pings (pong) and returns the first non-ping message.  If the returned message matches the expected type, it is parsed into a compound variable.  If it doesn't match (e.g., target sends `error` instead of `accept_channel`), the executor stops immediately.

To minimize gossip noise, we can disable `option_gossip_queries` in our init features and drain any initial gossip messages received during pre-snapshot setup.  In some scenarios it may also be helpful to use `gossip_timestamp_filter` to request the target to refrain from sending us gossip.

### Variables

A `Variable` is a typed *runtime* value produced by the executor -- `ChannelId`, `Point`, `PrivateKey`, `Signature`, `Amount`, `Message`, etc.  Variables correspond to the runtime SSA outputs produced by each executed `Instruction` in a `Program`.

*Compound variables* (e.g., `AcceptChannelData`, `FundingSignedData`) bundle all fields from a parsed target response.  `Extract*` operations pull individual fields into primitive types:

```
v25 = RecvAcceptChannel()              # -> AcceptChannelData (compound)
v26 = ExtractFundingPubkey(v25)        # -> Point (primitive)
```

Each `Variable` has a `VariableType` we can use to ensure mutations are type-safe.

### Executor

The `Executor` is used by `IrScenario` to walk a `Program` instruction by instruction, executing the specified actions and maintaining a `Vec<Variable>` store.

Unlike fuzzamoto's `Compiler` (which pre-compiles the full program into flat actions on the host), we choose to directly interpret `Program`s in the VM.  This simplifies fuzzing of the many interactive flows in the Lightning protocol, which require us to construct later messages using data sent by the target in earlier messages.

```rust
fn run(&mut self, input: &[u8]) -> ScenarioResult {
    let program = match postcard::from_bytes::<Program>(input) {
        Ok(p) => p,
        Err(_) => return ScenarioResult::Skip,
    };
    let mut executor = Executor::new(&self.context);
    match executor.execute(&program, &mut self.conn, &mut self.bitcoind) {
        Ok(()) => {}
        Err(ExecuteError::Connection(_)) => {
            if self.target.check_alive().is_err() {
                return ScenarioResult::Fail("target crashed".into());
            }
        }
        Err(ExecuteError::Timeout) => {
            return ScenarioResult::Fail("target hung".into());
        }
        Err(_) => return ScenarioResult::Skip,
    }
    // Final ping-pong catches delayed crashes
    if let Err(e) = ping_pong(&mut self.conn) { ... }
    ScenarioResult::Ok
}
```

---

## Generators

Generators produce type-correct instruction sequences that represent protocol interactions.  Each generator knows the shape of a protocol flow (what messages to construct, what keys to generate, what order to send/recv) but delegates value selection and variable reuse to `ProgramBuilder`.

### ProgramBuilder

`ProgramBuilder` is the shared infrastructure that all generators use.  It maintains:

- The instruction list being built (append-only, SSA)
- A type-indexed variable registry tracking all produced variables (direct primitives and extractable compound fields)
- The `pick_variable()` method for probabilistic variable selection

Generators call builder methods -- they never manipulate instruction indices or variable references directly:

```rust
// Generator asks builder for variables.
// Builder decides: reuse existing? extract from compound? generate fresh?
let chan_id = builder.pick_variable(VariableType::ChannelId, rng);
let feerate = builder.pick_variable(VariableType::Amount, rng);

// Generator tells builder what instruction to emit.
let msg_idx = builder.append(Operation::BuildUpdateFee, &[chan_id, feerate]);
```

This separation means generators encode *protocol knowledge* (e.g., `open_channel` needs 6 key pairs) while the builder encodes *fuzzing strategy* (e.g., picks which variables to reuse or generate randomly).

### Resource-Aware Variable Selection

When a generator needs a variable, `ProgramBuilder::pick_variable()` selects randomly from different strategies according to their weight:

- **Reuse recent (75%)**: Recently-created variables are more likely to be useful for exercising multi-message protocol flows.
- **Reuse any (15%)**: Cross-pollinates between protocol flows.
- **Generate fresh (10%)**: Emits instructions that produce a new valid value of the requested type.

### Generator Types

Generators are organized into different types.

*Message generators* emit the instructions for a single protocol message: load parameters, build the message, send the message, and optionally receive the response.  These are building blocks for generating interesting protocol flows.  (e.g., `OpenChannelMsg`, `FundingCreatedMsg`, `ChannelReadyMsg`, `UpdateAddHtlcMsg`).

*Action generators* do some single action, such as mining blocks via bitcoind.  (e.g., `MineBlocksAction`).

*Flow generators* compose message and action generators in sequence, threading variables between them via `ProgramBuilder`.  They are the easiest way for the fuzzer to reach deep protocol states when many constraints (matching keys, valid signatures, correct sequencing) need to align.  (e.g., `ChannelOpenFlow`, `HtlcAddFlow`, `HtlcFulfillFlow`, `InteractiveTxFlow`).

### Standalone vs. Insertion

Generators are used both to generate programs from scratch and to insert new code into existing programs as part of the `GeneratorInsertMutator`.

## Mutators

Mutators transform existing programs while preserving structural validity.

### Planned Mutators

| Mutator | What it does |
|---|---|
| `OperationParamMutator` | Pick a random instruction with mutable parameters and mutate its embedded literal.  Type-aware: amounts get boundary values (0, 1, `u64::MAX`) and random ranges; byte arrays get bit flips, insertions, and deletions; feerates/delays get truncated or maximized. |
| `InputSwapMutator` | Replace a variable reference in a random instruction with a different variable of the same `VariableType`. |
| `InstructionReorderMutator` | Swap two Act instructions (`SendMessage`, `MineBlocks`, etc.) that have no data dependency between them. |
| `SpliceMutator` | Pick a random program from the corpus and interleave its instruction subsequence into the current program at a random point, adjusting variable indices. |
| `InstructionDeleteMutator` | Remove a random instruction. |
| `GeneratorInsertionMutator` | Insert a freshly generated instruction subsequence (via a generator) at a random point. |

---

## Snapshot Setup

Different fuzzing goals require different starting states.  `IrScenario<T, S>` is parameterized by a `SnapshotSetup` trait:

```rust
trait SnapshotSetup<T: Target> {
    fn setup(target: &T, conn: &mut NoiseConnection) -> Result<ProgramContext, ScenarioError>;
}
```

Snapshot setup is Rust code that drives the target through an initial deterministic protocol sequence using `NoiseConnection` and `bitcoin-cli` directly.  `IrScenario` calls `setup()` before snapshotting the VM state.

Sample snapshot variants:

| Setup | Snapshot state | IR fuzzes... |
|-------|---------------|-------------|
| `PostInitSetup` | After handshake + `init` exchange | `open_channel`, gossip, any first message |
| `PostChannelOpenSetup` | After channel is funded + ready | HTLCs, commitment rounds, fees, closure |
| `InteractiveTxSetup` | Mid-negotiation (after `open_channel2` + `accept_channel2`) | `tx_[add/remove]_[input/output]`, `tx_complete` |

`ProgramContext` carries setup state into the executor.  For example:

```rust
struct ProgramContext {
    // Always present
    target_pubkey: [u8; 33],
    chain_hash: [u8; 32],
    block_height: u32,
    target_features: Vec<u8>,

    // Present after PostChannelOpenSetup
    channel_id: Option<[u8; 32]>,
    local_keys: Option<ChannelKeys>,
    remote_keys: Option<ChannelKeys>,
    funding_outpoint: Option<OutPoint>,
    commitment_number: Option<u64>,
}
```

`LoadContext*` operations access these fields, erroring at execution time if absent for the current snapshot variant.  Each setup variant is a separate binary (e.g., `ldk_ir_post_init`, `ldk_ir_post_channel`), enabling independent fuzzing campaigns with different corpora.

---

## Implementation Plan

I've put together rough milestones structured as "vertical slices" -- each milestone delivers a working end-to-end system for a narrow set of messages.  After Milestone 1 is completed, most of the other milestones could be developed in parallel.

- [ ] Milestone 1: `open_channel` End-to-End
- [ ] Milestone 2: Funding Flow
- [ ] Milestone 3: HTLC and Commitment Operations
- [ ] Milestone 4: Co-op Channel Closes
- [ ] Milestone 5: Channel Reestablish
- [ ] Milestone 6: Gossip Messages
- [ ] Milestone 7: Advanced Mutators
- [ ] Milestone 8: Interactive Tx Protocol
- [ ] Milestone 9+: Advanced Features

### Milestone 1: `open_channel` End-to-End

The "minimum viable product".  Minimal implementations of IR, mutators, generators, executor, etc. to enable basic fuzzing.  The fuzzer can generate structurally valid `open_channel` messages via IR, send them to the target, read the `accept_channel` response, and extract variables from it.

### Milestone 2: Funding Flow

The fuzzer can complete the channel establishment sequence through funding confirmation, including valid signing and mining of the funding transaction.

### Milestone 3: HTLC and Commitment Operations

The fuzzer can add, fulfill, and fail HTLCs on an open channel, including the commitment dance.

### Milestone 4: Co-op Channel Closes

The fuzzer can co-op close channels.

### Milestone 5: Channel Reestablish

The fuzzer can disconnect, reconnect, and successfully resume channels via `channel_reestablish`.

### Milestone 6: Gossip Messages

The fuzzer can send gossip messages with valid signatures.

### Milestone 7: Advanced Mutators

Mutators exist for instruction reordering, instruction deletion, inserting generated snippets, and splicing two programs together.

### Milestone 8: Interactive Tx Protocol

The fuzzer can complete dual-funded channel negotiations.

### Milestone 9+: Advanced Features

Endless possibilities here, but some ideas:

- Build valid onion packets.
- Channel state oracle: detect various protocol violations during execution (e.g., accepting HTLCs on a shutting-down channel).
- Add "constraint-based generators" that always respect protocol constraints (e.g., increasing commitment numbers, valid HTLC IDs).
- Multi-channel scenarios.

---

## Appendix: Fuzzing Engine Tradeoffs

Smite currently uses AFL++ with Nyx for all targets.  The IR design is intended to work with AFL++ today and is structured to enable LibAFL migration later.

**AFL++ with custom mutator** -- current approach:
- Advantages: No migration needed.  Nyx integration, queue management, crash dedup, and UI all work today.  Simple C ABI.
- Disadvantages: Must use `AFL_CUSTOM_MUTATOR_ONLY=1` (AFL++ byte-level havoc would corrupt IR structure).  Serialization overhead on every mutation round.  Must implement `afl_custom_trim` for instruction-level trimming (byte-level trimming destroys IR).

**LibAFL with Nyx executor** -- future approach:
- Advantages: First-class IR support (mutators operate directly on `Program` structs, no serialization per mutation).  Structural trimming and splicing.  Feedback-driven generation.  Fuzzamoto uses this approach.
- Disadvantages: More implementation work.

Mutator	What it does
`OperationParamMutator`	Pick a random instruction with mutable parameters and mutate its embedded literal. Type-aware: amounts get boundary values (0, 1, `u64::MAX`) and random ranges; byte arrays get bit flips, insertions, and deletions; feerates/delays get truncated or maximized.
`InputSwapMutator`	Replace a variable reference in a random instruction with a different variable of the same `VariableType`.
`InstructionReorderMutator`	Swap two Act instructions (`SendMessage`, `MineBlocks`, etc.) that have no data dependency between them.
`SpliceMutator`	Pick a random program from the corpus and interleave its instruction subsequence into the current program at a random point, adjusting variable indices.
`InstructionDeleteMutator`	Remove a random instruction.
`GeneratorInsertionMutator`	Insert a freshly generated instruction subsequence (via a generator) at a random point.

Setup	Snapshot state	IR fuzzes...
`PostInitSetup`	After handshake + `init` exchange	`open_channel`, gossip, any first message
`PostChannelOpenSetup`	After channel is funded + ready	HTLCs, commitment rounds, fees, closure
`InteractiveTxSetup`	Mid-negotiation (after `open_channel2` + `accept_channel2`)	`tx_[add/remove]_[input/output]`, `tx_complete`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IR Design and Implementation Plan #5

Problem

Solution

Architecture

Example IR Program

Core Concepts

Program and Instructions

Operations

Recv Operations

Variables

Executor

Generators

ProgramBuilder

Resource-Aware Variable Selection

Generator Types

Standalone vs. Insertion

Mutators

Planned Mutators

Snapshot Setup

Implementation Plan

Milestone 1: `open_channel` End-to-End

Milestone 2: Funding Flow

Milestone 3: HTLC and Commitment Operations

Milestone 4: Co-op Channel Closes

Milestone 5: Channel Reestablish

Milestone 6: Gossip Messages

Milestone 7: Advanced Mutators

Milestone 8: Interactive Tx Protocol

Milestone 9+: Advanced Features

Appendix: Fuzzing Engine Tradeoffs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

IR Design and Implementation Plan #5

Description

Problem

Solution

Architecture

Example IR Program

Core Concepts

Program and Instructions

Operations

Recv Operations

Variables

Executor

Generators

ProgramBuilder

Resource-Aware Variable Selection

Generator Types

Standalone vs. Insertion

Mutators

Planned Mutators

Snapshot Setup

Implementation Plan

Milestone 1: open_channel End-to-End

Milestone 2: Funding Flow

Milestone 3: HTLC and Commitment Operations

Milestone 4: Co-op Channel Closes

Milestone 5: Channel Reestablish

Milestone 6: Gossip Messages

Milestone 7: Advanced Mutators

Milestone 8: Interactive Tx Protocol

Milestone 9+: Advanced Features

Appendix: Fuzzing Engine Tradeoffs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Milestone 1: `open_channel` End-to-End