Skip to content

mana_driver: vf reconfiguration revokes vtl0 vf faster#3164

Open
erfrimod wants to merge 1 commit intomicrosoft:mainfrom
erfrimod:erfrimod/eqe-135-fast-timeout
Open

mana_driver: vf reconfiguration revokes vtl0 vf faster#3164
erfrimod wants to merge 1 commit intomicrosoft:mainfrom
erfrimod:erfrimod/eqe-135-fast-timeout

Conversation

@erfrimod
Copy link
Copy Markdown
Contributor

@erfrimod erfrimod commented Mar 31, 2026

When VF Reconfiguration attempts to revoke the VTL0 VF, it can get stuck attempting to send HWC commands which have no chance of succeeding. This is causing try_notify_guest_and_revoke_vtl0_vf() to timeout, leaving the Guest in an inconsistent state that can cause the Reconfiguration to fail to restore the VTL0 VF.

  • Once vf_reconfiguration_pending is set both request_version(), report_hwc_timeout() and device drop() will exit early to avoid making calls to the SoC.
  • test_gdma_reconfig_vf test updated to exercise the new logic.
  • Modify resource destroy to skip teardown for HWC resources when vf_reconfiguration_pending is set. Request would fail, but this way there's only one info trace instead of many ignorable error traces.

@erfrimod erfrimod requested a review from a team as a code owner March 31, 2026 00:06
Copilot AI review requested due to automatic review settings March 31, 2026 00:06
@erfrimod
Copy link
Copy Markdown
Contributor Author

erfrimod commented Mar 31, 2026

Traces from my lab machine. Note the time between EQE and VF in the Guest is now about 1.39888 seconds.
And a full 1 second of that is the VF_DEVICE_DELAY there for older Linux guests.

[18.229904] mana_driver::gdma_driver: INFO  HWC VF reconfiguration event
[18.230228] underhill_core::emuplat::netvsp: INFO  VTL2 VF reconfiguration requested vtl2_vfid=0x594ee6be
[18.230400] underhill_core::emuplat::netvsp: WARN  VTL0 VF being removed as a result of VF Reconfiguration. vtl2_vfid=0x594ee6be vtl0_vfid=0x618fb7ca
[18.233386] mana_driver::bnic_driver: ERROR  error=VF reconfiguration pending
[18.233463] netvsp: WARN  Failed setting data path back to synthetic after guest VF was removed. err=VF reconfigurationpending
[18.233760] netvsp: INFO  sending VF association message available=false
[18.294514] vmbus_client: INFO  received rescind state=Connected channel_id=0xa key={44c4f61d-4444-4400-9d52-802e27ede19f}-{618fb7ca-a386-46eb-b257-9a8f67af0bd5}-0
[18.294899] vmbus_server: INFO  revoking channel id.offer_id=OfferId(11) key={44c4f61d-4444-4400-9d52-802e27ede19f}-{618fb7ca-a386-46eb-b257-9a8f67af0bd5}-0
[18.295162] vmbus_server::channels: INFO  rescinding channel from guest channel_id=0x12
[18.295670] vmbus_client: INFO  releasing channel channel_id=0xa key={44c4f61d-4444-4400-9d52-802e27ede19f}-{618fb7ca-a386-46eb-b257-9a8f67af0bd5}-0
[18.304157] netvsp: INFO  Query data path state
[18.304473] underhill_core::emuplat::netvsp: INFO disconnecting all endpoints{ vtl2_vfid=0x594ee6be num_endpoints=0x1}: Network endpoint disconnected vtl2_vfid=0x594ee6be
[18.307327] mana_driver::bnic_driver: ERROR disconnecting all endpoints{ vtl2_vfid=0x594ee6be num_endpoints=0x1}:  error=VF reconfiguration pending
[18.307502] net_mana: WARN disconnecting all endpoints{ vtl2_vfid=0x594ee6be num_endpoints=0x1}:  failed to stop rx error=VF reconfiguration pending
[18.307671] mana_driver::resources: INFO disconnecting all endpoints{ vtl2_vfid=0x594ee6be num_endpoints=0x1}:  skipping HWC resource teardown during VF reconfiguration count=0x20
[18.310974] mana_driver::gdma_driver: INFO shutdown vtl2 device{ vtl2_vfid=0x594ee6be keep_vf_alive=false}:  dropping gdma driver self.state_saved=false self.hwc_failure=false
[18.311330] underhill_core::emuplat::netvsp: WARN  Destroying MANA device vtl2_vfid=0x594ee6be error=VF reconfigurationpending
[18.311501] underhill_core::emuplat::netvsp: INFO  Attempt to reset device via FLR on next teardown. vtl2_vfid=0x594ee6be
[18.332014] vmbus_server::channels: INFO  client released channel dropped_ratelimited=0xc offer_id=OfferId(11) key={44c4f61d-4444-4400-9d52-802e27ede19f}-{618fb7ca-a386-46eb-b257-9a8f67af0bd5}-0
[18.422489] vfio-pci 6c34:00:00.0: All device reset methods disabled by user
[18.423181] user_driver::vfio: INFO new_mana_vfio_device{ vtl2_vfid=0x594ee6be pci_id="6c34:00:00.0"}:  device arrived pci_id="6c34:00:00.0" keepalive=false
[18.433588] vfio-pci 6c34:00:00.0: vfio-noiommu device opened by user (tp:52)
[18.435340] underhill_core::emuplat::netvsp: INFO  Creating MANA device vtl2_vfid=0x594ee6be pci_id="6c34:00:00.0"
[18.440758] mana_driver::gdma_driver: INFO new_mana_device{ vtl2_vfid=0x594ee6be pci_id="6c34:00:00.0"}:new_gdma_driver:  created HWC eq_id=0x10 msix=0
[18.441194] mana_driver::gdma_driver: INFO new_mana_device{ vtl2_vfid=0x594ee6be pci_id="6c34:00:00.0"}:new_gdma_driver:  Max VF resources: GdmaQueryMaxResourcesResp { status: 0, max_sq: 4096, max_rq: 4096, max_cq: 4096, max_eq: 256, max_db: 4096, max_mst: 16384, max_cq_mod_ctx: 2, max_mod_cq: 16, max_msix: 6 }
[18.441531] mana_driver::gdma_driver: INFO new_mana_device{ vtl2_vfid=0x594ee6be pci_id="6c34:00:00.0"}:  GDMA PF capability flags gdma_protocol_ver=0x1 pf_cap_flags1=0x1d pf_cap_flags2=0x0 pf_cap_flags3=0x0 pf_cap_flags4=0x0
[18.441869] mana_driver::mana: INFO new_mana_device{ vtl2_vfid=0x594ee6be pci_id="6c34:00:00.0"}:  mana_dev_config=ManaQueryDeviceCfgResp { pf_cap_flags1: BasicNicDriverFlags { query_link_status: 1, ethertype_enforcement: 1, query_filter_state: 1, reserved: 1 }, pf_cap_flags2: 0, pf_cap_flags3: 0, pf_cap_flags4: 0, max_num_vports: 1, reserved: 0, max_num_eqs: ffffffff }
[18.442459] underhill_core::emuplat::netvsp: INFO connecting endpoints{ vtl2_vfid=0x594ee6be num_endpoints=0x1}:  Network endpoint connected vtl2_vfid=0x594ee6be mac_address=02-04-33-0f-3b-e1 adapter_index=1
[18.442608] underhill_core::emuplat::netvsp: INFO  VTL2 device restarted after VF reconfiguration vtl2_vfid=0x594ee6be attempts=0x1
[18.453258] netvsp: INFO  Query data path state is_data_path_switched=false
[18.455299] mana_driver::gdma_driver: INFO  retargeting EQ 1 to cpu: 3
[18.455502] mana_driver::gdma_driver: INFO  retargeting EQ 2 to cpu: 0
[18.456941] mana_driver::gdma_driver: INFO  retargeting EQ 3 to cpu: 1
[18.458446] mana_driver::gdma_driver: INFO  retargeting EQ 0 to cpu: 2
[18.458624] netvsp: INFO  sending VF association message available=true serial_number=0x618fb7ca
[19.459189] underhill_core::emuplat::netvsp: INFO  Adding VF to VTL0 vtl2_vfid=0x594ee6be vtl0_vfid=0x618fb7ca
[19.461251] vmbus_client: INFO  received offer state=Connected channel_id=0xa interface_id=44c4f61d-4444-4400-9d52-802e27ede19f instance_id=618fb7ca-a386-46eb-b257-9a8f67af0bd5 subchannel_index=0x0
[19.461479] vmbus_server::channels: INFO  sending offer to guest channel_id=0x12 connection_id=0x2012 key={44c4f61d-4444-4400-9d52-802e27ede19f}-{618fb7ca-a386-46eb-b257-9a8f67af0bd5}-0
[19.461574] vmbus_server::channels: INFO  new channel offer_id=OfferId(11) key={44c4f61d-4444-4400-9d52-802e27ede19f}-{618fb7ca-a386-46eb-b257-9a8f67af0bd5}-0 confidential_ring_buffer=false confidential_external_memory=false
[19.463543] vmbus_client: INFO  opening channel on host channel_id=0xa key={44c4f61d-4444-4400-9d52-802e27ede19f}-{618fb7ca-a386-46eb-b257-9a8f67af0bd5}-0
[19.463937] vmbus_server::channels: INFO  opened channel dropped_ratelimited=0x8 offer_id=0x11 channel_id=0x12 key={44c4f61d-4444-4400-9d52-802e27ede19f}-{618fb7ca-a386-46eb-b257-9a8f67af0bd5}-0 result=0
[19.636356] mana_driver::mana: INFO  switch data path for mac mac_address=02-04-33-0f-3b-e1 direction_to_vtl0=0x1 hwc_activity_id=0x87dd0030

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves VF Reconfiguration handling for the MANA driver / Underhill NetVSP path by treating the HWC channel as unavailable immediately after the VF reconfig EQE, avoiding long teardown timeouts and reducing noisy failing teardown attempts.

Changes:

  • Set hwc_timeout_in_ms to 0 on VF reconfiguration EQE to make subsequent HWC waits fail fast, and gate timeout reporting to avoid extra work when timeout is 0.
  • Update driver teardown behavior to skip HWC resource teardown after hwc_failure is detected.
  • Update test_gdma_reconfig_vf to validate the new “timeout becomes 0 after EQE 135” behavior and that teardown fails fast.
  • In VF reconfig handling, remove the VTL0 VF via remove_vtl0_vf() and clear saved datapath/filter state.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
vm/devices/net/mana_driver/src/tests.rs Extends VF reconfig test to assert timeout transitions to 0 and hwc_failure behavior during deregister.
vm/devices/net/mana_driver/src/resources.rs Skips HWC teardown commands when hwc_failure is set to avoid repeated failing requests/log spam.
vm/devices/net/mana_driver/src/gdma_driver.rs Sets timeout to 0 on VF reconfig EQE; adds early timeout exit; preserves 0 timeout through deregister; exposes hwc_failure() and test getter.
openhcl/underhill_core/src/emuplat/netvsp.rs Switches VF reconfig VTL0 VF removal path to remove_vtl0_vf() and clears saved direction_to_vtl0 state.

Comment on lines +66 to +68
// When HWC has already failed, skip sending teardown commands for HWC resources:
// DmaRegion, Eq, BnicQueue. HWC requests all fail: "Previous hardware failure".
// Device should reclaim resources on its own reset.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend moving this comment down to the '_ if skip_hwc ...' code because this one line will be easy to miss when skimming the code, so the comment will help draw attention to it

@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@ben-zen ben-zen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with the change Brian suggested to the comment location. That's a subtle choice, calling it out with a comment makes the logic clear.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Brian-Perkins
Brian-Perkins previously approved these changes Mar 31, 2026
self.report_hwc_timeout(wait_failed, interrupt_loss, eqe_wait_result.elapsed as u32)
// Don't report the timeout once VF reconfiguration is pending,
// since the SoC will not respond.
if !self.vf_reconfiguration_pending {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to keep a consistent check. In other places we are checking for hwc_failure. We should do the same here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, if there is hwc_failure, why not return an error?

Copy link
Copy Markdown
Contributor Author

@erfrimod erfrimod Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would lose reports if the wait times out (hwc_failure set), then eqe is found, and wait_failed is false. Soc is alive, but slow. This case is reflected in the check at the end of the function where if a reconfig is not pending, hwc_failure is set back to false. On one hand, making the change is entirely safe because all we lose is a log sent to soc. On the other hand, I think the intent of the log is to help the soc diagnose when responses are slow and timing out.

Edit: It's a little frustrating the hwc_failure is set to true and then back. A stronger design might have multiple states, or maybe a bool to track hwc_timeout that could get cleared...

// When HWC has already failed, skip sending teardown commands for HWC resources:
// DmaRegion, Eq, BnicQueue. HWC requests all fail: "Previous hardware failure".
// Device should reclaim resources on its own reset.
_ if skip_hwc => continue,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need to do anything here. GDMA driver has the logic to handle what to do when HWC has been marked for failure (during reconfig) and will handle this. So, for example, when this code makes a call to disable EQ or DMA region, gdma driver will error out if HWC failure is set to true (which will be in the case of reconfig)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing on my lab machine, this check removed a dozen or so ignorable "previous hardware failure" traces. They are expected, but I would prefer future failure triage doesn't see them.

Copilot AI review requested due to automatic review settings April 3, 2026 21:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Copilot AI review requested due to automatic review settings April 3, 2026 22:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment on lines +66 to +72
let skip_hwc = gdma.get_vf_reconfiguration_pending();
if skip_hwc {
tracing::info!(
count = self.resources.len(),
"skipping HWC resource teardown during VF reconfiguration"
);
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says teardown is skipped when hwc_failure is set, but the implementation gates skipping on vf_reconfiguration_pending. If the intended contract is actually hwc_failure, consider switching the condition (and adding/accessing an hwc_failure getter) or update the PR description to match the code’s behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +789 to 792
pub fn get_vf_reconfiguration_pending(&self) -> bool {
self.vf_reconfiguration_pending
}

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method previously consumed/cleared the pending flag (via mem::take), but now reads without clearing while keeping the same get_... name. That’s a behavioral change that can surprise callers and can change control flow in existing code. Consider either (a) renaming to is_vf_reconfiguration_pending() to make it clearly read-only, and/or (b) reintroducing a separate take_vf_reconfiguration_pending() for the previous consume-and-clear semantics where needed.

Suggested change
pub fn get_vf_reconfiguration_pending(&self) -> bool {
self.vf_reconfiguration_pending
}
pub fn is_vf_reconfiguration_pending(&self) -> bool {
self.vf_reconfiguration_pending
}
pub fn get_vf_reconfiguration_pending(&mut self) -> bool {
std::mem::take(&mut self.vf_reconfiguration_pending)
}

Copilot uses AI. Check for mistakes.
Comment on lines +691 to +694
// Don't report timeout once VF reconfiguration is pending, SoC will not respond.
if self.vf_reconfiguration_pending {
return;
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VF-reconfiguration guard is implemented both inside report_hwc_timeout() and at the call site. This duplication increases maintenance cost and risks the two checks diverging over time. Prefer keeping the guard in one place: either rely on the internal early-return and always call report_hwc_timeout(), or remove the internal guard and keep the conditional at the call site.

Suggested change
// Don't report timeout once VF reconfiguration is pending, SoC will not respond.
if self.vf_reconfiguration_pending {
return;
}

Copilot uses AI. Check for mistakes.
Comment on lines +1148 to +1157
// Don't report the timeout once VF reconfiguration is pending,
// since the SoC will not respond.
if !self.vf_reconfiguration_pending {
self.report_hwc_timeout(
wait_failed,
interrupt_loss,
eqe_wait_result.elapsed as u32,
)
.await;
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VF-reconfiguration guard is implemented both inside report_hwc_timeout() and at the call site. This duplication increases maintenance cost and risks the two checks diverging over time. Prefer keeping the guard in one place: either rely on the internal early-return and always call report_hwc_timeout(), or remove the internal guard and keep the conditional at the call site.

Suggested change
// Don't report the timeout once VF reconfiguration is pending,
// since the SoC will not respond.
if !self.vf_reconfiguration_pending {
self.report_hwc_timeout(
wait_failed,
interrupt_loss,
eqe_wait_result.elapsed as u32,
)
.await;
}
self.report_hwc_timeout(
wait_failed,
interrupt_loss,
eqe_wait_result.elapsed as u32,
)
.await;

Copilot uses AI. Check for mistakes.
@erfrimod erfrimod force-pushed the erfrimod/eqe-135-fast-timeout branch from 460bdb3 to 52bd2a2 Compare April 3, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants