Skip to content

Emit telemetry spans for move_in_query snapshot operations#4303

Open
robacourt wants to merge 4 commits into
mainfrom
rob/move-in-telemetry
Open

Emit telemetry spans for move_in_query snapshot operations#4303
robacourt wants to merge 4 commits into
mainfrom
rob/move-in-telemetry

Conversation

@robacourt
Copy link
Copy Markdown
Contributor

@robacourt robacourt commented May 10, 2026

Summary

shape_snapshot.execute_for_shape and shape_snapshot.query_fn spans were missing from production traces for move_in_query operations, while initial_snapshot and subset_query worked fine. This wires up a parent span so the existing child spans get emitted.

Problem

Sampling for shape_snapshot.query_fn is a plain included?(_) -> true (no rate limiting), so the absence of move_in_query spans wasn't a sampling issue — the spans were never being created.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

❌ 17 Tests Failed:

Tests completed Failed Passed Skipped
3723 17 3706 52
View the top 3 failed test(s) by shortest run time
Elixir.Electric.ShapeCacheTest::test get_or_create_shape_handle/2 against real db crashes when initial snapshot query fails to return data quickly enough
Stack Traces | 0s run time
22) test get_or_create_shape_handle/2 against real db crashes when initial snapshot query fails to return data quickly enough (Electric.ShapeCacheTest)
     test/electric/shape_cache_test.exs:507
     ** (EXIT from #PID<0.21571.0>) killed
Elixir.Electric.ShapeCacheTest::test await_snapshot_start/4 does not recursively loop forever if the snapshot fails to start
Stack Traces | 0.0982s run time
26) test await_snapshot_start/4 does not recursively loop forever if the snapshot fails to start (Electric.ShapeCacheTest)
     test/electric/shape_cache_test.exs:1023
     Expected truthy, got false
     code: assert String.contains?(
             log,
             "shape_handle=#{shape_handle} [error] No consumer process when waiting on initial snapshot creation"
           )
     arguments:

         # 1
         ""

         # 2
         "shape_handle=50617496-1779100178069616 [error] No consumer process when waiting on initial snapshot creation"

     stacktrace:
       test/electric/shape_cache_test.exs:1071: (test)
test/runtime-dsl.test.ts > N: wake primitives verification > N5: runFinished wake records the finished child on the parent stream
Stack Traces | 15.1s run time
Error: Timeout (15000ms) waiting for entity history on /wake-summary-parent-n4/wake-summary-1
[
  {
    "args": {},
    "entityType": "wake-summary-parent-n4",
    "operation": "insert",
    "type": "entity_created"
  },
  {
    "from": "/principal/system%3Aruntime-dsl-test",
    "operation": "insert",
    "payload": "spawn trio",
    "type": "inbox"
  },
  {
    "key": "run-0",
    "operation": "insert",
    "status": "started",
    "type": "run"
  },
  {
    "key": "step-0",
    "operation": "insert",
    "status": "started",
    "stepNumber": 1,
    "type": "step"
  },
  {
    "key": "msg-0",
    "operation": "insert",
    "status": "streaming",
    "type": "text"
  },
  {
    "delta": "spawned:3",
    "key": "msg-0:0",
    "operation": "insert",
    "runId": "run-0",
    "textId": "msg-0",
    "type": "text_delta"
  },
  {
    "key": "msg-0",
    "operation": "update",
    "status": "completed",
    "type": "text"
  },
  {
    "finishReason": "stop",
    "key": "step-0",
    "operation": "update",
    "status": "completed",
    "stepNumber": 1,
    "type": "step"
  },
  {
    "finishReason": "stop",
    "key": "run-0",
    "operation": "update",
    "status": "completed",
    "type": "run"
  },
  {
    "key": "child:wake-summary-child-n4:wake-summary-1-alpha",
    "manifest": {
      "entityType": "wake-summary-child-n4",
      "entityUrl": "/wake-summary-child-n4/wake-summary-1-alpha",
      "id": "wake-summary-1-alpha",
      "key": "child:wake-summary-child-n4:wake-summary-1-alpha",
      "kind": "child",
      "wake": "runFinished"
    },
    "operation": "insert",
    "type": "manifest"
  },
  {
    "key": "child:wake-summary-child-n4:wake-summary-1-bravo",
    "manifest": {
      "entityType": "wake-summary-child-n4",
      "entityUrl": "/wake-summary-child-n4/wake-summary-1-bravo",
      "id": "wake-summary-1-bravo",
      "key": "child:wake-summary-child-n4:wake-summary-1-bravo",
      "kind": "child",
      "wake": "runFinished"
    },
    "operation": "insert",
    "type": "manifest"
  },
  {
    "key": "child:wake-summary-child-n4:wake-summary-1-charlie",
    "manifest": {
      "entityType": "wake-summary-child-n4",
      "entityUrl": "/wake-summary-child-n4/wake-summary-1-charlie",
      "id": "wake-summary-1-charlie",
      "key": "child:wake-summary-child-n4:wake-summary-1-charlie",
      "kind": "child",
      "wake": "runFinished"
    },
    "operation": "insert",
    "type": "manifest"
  }
]
 ❯ waitForHistory test/runtime-dsl.ts:664:11
 ❯ test/runtime-dsl.test.ts:6261:27
test/runtime-dsl.test.ts > M: deep researcher coordination > M3: separate researcher entities keep child results isolated across later wakes
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:4968:3
test/runtime-dsl.test.ts > K: wiki coordination > K4: create_wiki rejects switching the topic on an existing wiki
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5546:3
test/runtime-dsl.test.ts > I: peer review coordination > I1: peer review aggregates three reviewer writes through shared state
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5049:3
test/runtime-dsl.test.ts > K: wiki coordination > K8: wiki keeps durable child metadata and shared articles carry topic and author details
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5688:3
test/runtime-dsl.test.ts > K: wiki coordination > K9: idempotent wiki recreation does not duplicate shared article rows
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5760:3
test/runtime-dsl.test.ts > J: debate coordination > J3: debate with only one durable side stays partial until the missing side arrives
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5328:3
test/runtime-dsl.test.ts > I: peer review coordination > I3: peer review with one configured reviewer summarizes only that durable row
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5143:3
test/runtime-dsl.test.ts > J: debate coordination > J1: debate parent reads both sides from shared state before issuing a ruling
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5221:3
test/runtime-dsl.test.ts > K: wiki coordination > K3: get_wiki_status reports complete coverage after specialist articles land
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5521:3
test/runtime-dsl.test.ts > I: peer review coordination > I4: peer review with two configured reviewers summarizes only those durable rows
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5181:3
test/runtime-dsl.test.ts > K: wiki coordination > K10: same-topic wiki expansion adds only the missing article and updates later query coverage
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5791:3
test/runtime-dsl.test.ts > K: wiki coordination > K6: repeating create_wiki with the same topic and subtopics is idempotent
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5630:3
test/runtime-dsl.test.ts > K: wiki coordination > K2: repeating create_wiki reuses existing specialists and only spawns missing subtopics
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:5434:3
test/runtime-dsl.test.ts > M: deep researcher coordination > M1: researcher workers start from spawn initialMessage without an extra send
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:4915:3
test/runtime-dsl.test.ts > D: shared state > D9: a setup-registered shared-state effect fires on the first wake write and survives a later wake
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/runtime-dsl.test.ts:3396:3
test/runtime-dsl.test.ts > K: wiki coordination > K1: wiki specialists accumulate shared articles that a later query can read
Stack Traces | 30.1s run time
Error: Timeout (30000ms) waiting for shared:wiki_article x2 on shared state wiki-wiki-1
[]
 ❯ waitForHistory test/runtime-dsl.ts:664:11
 ❯ test/runtime-dsl.test.ts:5391:5

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@alco
Copy link
Copy Markdown
Member

alco commented May 12, 2026

I thought 686ebdb had fixed that. Just now reviewing the changes, I see we have two functions named query_move_in_async: one in PartialModes and the other one in Effects. My commit fixed the former one (where the root span comes from the shape serving plug).

The one defined in Effects is invoked in response to a materializer_changes callback in the Consumer module. We could start a span right in the handle_info callback instead, seems like a less ad-hoc approach then creating a span right inside the inner function.

Looks like the two implementations of query_move_in_async overlap. Can we deduplicate them into one?

@netlify
Copy link
Copy Markdown

netlify Bot commented May 18, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 3a0c609
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a0ae6fae1adf900081fdbcc
😎 Deploy Preview https://deploy-preview-4303--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@alco
Copy link
Copy Markdown
Member

alco commented May 18, 2026

I have confirmed that PartialModes.query_move_in_async() is dead code since #4051. I'm removing it completely in #4344.

Took the liberty to clean up the changes in your branch to avoid repetition of shape attrs construction and excessive nesting.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants