Skip to content

storage: add usermode storvsc disk backend for OpenHCL#3193

Open
juantian8seattle wants to merge 6 commits intomicrosoft:mainfrom
juantian8seattle:user/juantian/storvsc-usermode
Open

storage: add usermode storvsc disk backend for OpenHCL#3193
juantian8seattle wants to merge 6 commits intomicrosoft:mainfrom
juantian8seattle:user/juantian/storvsc-usermode

Conversation

@juantian8seattle
Copy link
Copy Markdown
Contributor

Summary

Add disk_storvsc, a DiskIo backend that handles guest SCSI I/O in VTL2
userspace via storvsc_driver over VMBus UIO. When enabled, OpenHCL
intercepts VScsi controller channels and translates block I/O into
SCSI CDBs, bypassing the kernel hv_storvsc driver for relay controllers.

Opt-in via OPENHCL_STORVSC_USERMODE=1 environment variable.
Not enabled by default.

What's included

New crate: disk_storvsc

  • DiskIo implementation using SCSI CDBs over storvsc_driver
  • Async constructor pre-fetches device metadata (capacity, disk ID, read-only, unmap support)
  • READ(16), WRITE(16), UNMAP, READ_CAPACITY, INQUIRY, MODE_SENSE CDB formatting
  • CD-ROM detection via INQUIRY for READ_CAPACITY(10) vs (16) selection

storvsc_driver enhancements

  • DMA buffer allocation for bounce-buffer I/O
  • Resize listener support (host-initiated disk resize)
  • Save/restore for live migration
  • Mesh-based async lifecycle

OpenHCL integration (underhill_core)

  • StorvscManager: actor-based manager following NvmeManager pattern
  • StorvscDiskResolver: resolves StorvscDiskConfig from VTL2 settings
  • On-demand UIO channel claiming (claim_vmbus_device_for_uio)
  • VScsi device routing in vtl2_settings_worker
  • Save/restore state at mesh(10004)

Supporting changes

  • scsi_defs: additional SCSI structs for CDB construction
  • storvsp_protocol: Inspect derive on protocol types
  • vmbus_user_channel: configurable ring buffer sizes for message_pipe
  • guestmem: LockedPages::va() accessor

Testing

  • cargo test -p storvsc_driver -- 3 unit tests pass
  • All openvmm_openhcl_linux_x64_storvsp* VMM tests pass with STORVSC_USERMODE=1
  • DVD tests pass (SenseData decode fix)
  • Boot, reboot, dynamic disk add tests pass
  • No regressions when STORVSC_USERMODE=0 (default)

Related issues

Closes #3094, closes #3095, closes #3096
Ref #273

Notes for reviewers

  • OPENHCL_STORVSC_USERMODE=1 is set in openhcl_boot/src/main.rs for testing.
    Will be removed before merge (feature is opt-in only via env var).

Add disk_storvsc, a DiskIo backend that handles guest SCSI I/O in VTL2
userspace via storvsc_driver over VMBus UIO. When enabled, OpenHCL
intercepts VScsi controller channels and translates block I/O into
SCSI CDBs, bypassing the kernel hv_storvsc driver for those controllers.

Opt-in via OPENHCL_STORVSC_USERMODE=1 environment variable.
Copilot AI review requested due to automatic review settings April 4, 2026 03:55
@juantian8seattle juantian8seattle requested review from a team as code owners April 4, 2026 03:55
@github-actions github-actions bot added the unsafe Related to unsafe code label Apr 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new OpenHCL storage path for servicing guest SCSI I/O in VTL2 userspace using a new disk_storvsc DiskIo backend backed by storvsc_driver over VMBus UIO, and wires it into Underhill behind the OPENHCL_STORVSC_USERMODE opt-in flag.

Changes:

  • Add new disk_storvsc crate implementing DiskIo by formatting and issuing SCSI CDBs through storvsc_driver.
  • Extend storvsc_driver with DMA buffer support, resize listeners, and save/restore state handling.
  • Integrate a new StorvscManager/resolver into underhill_core, update servicing save/restore, and add VTL2 settings routing for VScsi devices.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
vm/vmcore/guestmem/src/lib.rs Adds LockedPages::va() accessor used by DMA/driver plumbing.
vm/devices/vmbus/vmbus_user_channel/src/lib.rs Makes ring sizes configurable when opening UIO-backed channels.
vm/devices/storage/storvsp/src/lib.rs Sets storvsp channel type to message-mode pipe for the SCSI interface.
vm/devices/storage/storvsp_protocol/src/lib.rs Adds Inspect derive to protocol types for diagnostics.
vm/devices/storage/storvsp_protocol/Cargo.toml Adds inspect dependency to support protocol inspection.
vm/devices/storage/storvsc_driver/src/test_helpers.rs Updates test harness to new request shape (GPN list + resize listener channel).
vm/devices/storage/storvsc_driver/src/lib.rs Major driver enhancements: DMA buffers, resize listeners, save/restore, request API changes.
vm/devices/storage/storvsc_driver/Cargo.toml Adds dependencies needed for DMA, tracing, events, and save/restore.
vm/devices/storage/scsi_defs/src/lib.rs Improves SCSI struct types (e.g., big-endian wrappers, typed opcodes).
vm/devices/storage/disk_storvsc/src/lib.rs New DiskIo backend implementing I/O via SCSI CDBs over usermode storvsc.
vm/devices/storage/disk_storvsc/Cargo.toml Declares the new disk_storvsc crate and its dependencies.
openhcl/underhill_core/src/worker.rs Adds storvsc_usermode env config and instantiates StorvscManager.
openhcl/underhill_core/src/storvsc_manager.rs New actor-style manager + disk resolver + sysfs UIO claiming + save/restore wiring.
openhcl/underhill_core/src/servicing.rs Adds servicing saved-state plumbing for storvsc at mesh(10004).
openhcl/underhill_core/src/options.rs Adds OPENHCL_STORVSC_USERMODE option parsing and config propagation.
openhcl/underhill_core/src/lib.rs Wires new option field and module into worker launch path.
openhcl/underhill_core/src/dispatch/vtl2_settings_worker.rs Routes VScsi devices to StorvscDiskResolver when usermode storvsc is enabled.
openhcl/underhill_core/src/dispatch/mod.rs Adds storvsc manager lifecycle (shutdown + save state) to LoadedVm.
openhcl/underhill_core/Cargo.toml Adds disk_storvsc, storvsc_driver, and storvsp_protocol dependencies.
openhcl/openhcl_boot/src/main.rs Temporarily forces OPENHCL_STORVSC_USERMODE=1 in kernel cmdline for CI testing.
Cargo.toml Adds disk_storvsc to workspace members and workspace dependencies.
Cargo.lock Locks new crate and dependency graph updates.

Comment on lines +2344 to +2345
pub fn va(&self) -> u64 {
self.pages.first().unwrap().0 as u64
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LockedPages::va() can panic when gpns is empty because it unconditionally unwraps self.pages.first(). Since lock_gpns() can return an empty LockedPages (gpns slice length 0), consider returning Option<u64>/Result<u64, _> or explicitly documenting/enforcing a non-empty invariant before exposing this accessor.

Suggested change
pub fn va(&self) -> u64 {
self.pages.first().unwrap().0 as u64
pub fn va(&self) -> Option<u64> {
self.pages.first().map(|page| page.0 as u64)

Copilot uses AI. Check for mistakes.
new_request_receiver,
add_resize_listener_receiver,
)?;
storvsc.negotiate().await.unwrap();
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storvsc.negotiate().await.unwrap() will panic on protocol negotiation failure. This is a host-facing boundary and should return an error instead (propagate with ? and map into StorvscErrorInner), so OpenHCL doesn’t crash on unexpected/malformed host responses.

Suggested change
storvsc.negotiate().await.unwrap();
storvsc.negotiate().await?;

Copilot uses AI. Check for mistakes.
Comment on lines +847 to +853
if byte_len == 0 {}
let payload_bytes = payload.as_bytes();
let start_page: u64 = gpa_start / PAGE_SIZE as u64;
let end_page: u64 = (gpa_start + (byte_len + PAGE_SIZE - 1) as u64) / PAGE_SIZE as u64;
let gpas: Vec<u64> = (start_page..end_page).collect();
let pages =
PagedRange::new(gpa_start as usize % PAGE_SIZE, byte_len, gpas.as_slice()).unwrap();
// Use caller-provided GPNs directly instead of computing a synthetic
// contiguous range. DMA allocations may have non-contiguous pages.
// gpn_offset handles sub-page-aligned guest buffers (e.g., 512-byte
// offset within first page).
let pages = PagedRange::new(gpn_offset, byte_len, gpns).unwrap();
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PagedRange::new(...).unwrap() can panic if gpn_offset/byte_len are inconsistent with the provided gpns slice (e.g., empty gpns, offset >= page size, length too large). Please convert this to fallible error handling and return a StorvscError instead of panicking; also remove the empty if byte_len == 0 {} no-op (callers already gate on byte_len > 0).

Copilot uses AI. Check for mistakes.
Comment on lines +751 to +761
// Match completion against pending transactions
match self
.transactions
.get_mut(completion.transaction_id as usize)
{
Some(t) => Ok(t),
None => Err(StorvscError(StorvscErrorInner::PacketError(
PacketError::UnexpectedTransaction(completion.transaction_id),
))),
}?
.cancel(StorvscCompleteReason::UnitAttention);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UNIT ATTENTION path cancels the pending transaction but never removes it from self.transactions (it uses get_mut() and then .cancel(...)). This will leak slab entries over time and prevent transaction IDs from being reused, eventually growing memory and possibly failing new requests. Consider using remove() (or equivalent) when completing/cancelling a transaction.

Suggested change
// Match completion against pending transactions
match self
.transactions
.get_mut(completion.transaction_id as usize)
{
Some(t) => Ok(t),
None => Err(StorvscError(StorvscErrorInner::PacketError(
PacketError::UnexpectedTransaction(completion.transaction_id),
))),
}?
.cancel(StorvscCompleteReason::UnitAttention);
// Match completion against pending transactions and remove the
// cancelled transaction so the slot can be reused.
let transaction_id = completion.transaction_id as usize;
let mut transaction = if self.transactions.contains(transaction_id) {
Ok(self.transactions.remove(transaction_id))
} else {
Err(StorvscError(StorvscErrorInner::PacketError(
PacketError::UnexpectedTransaction(completion.transaction_id),
)))
}?;
transaction.cancel(StorvscCompleteReason::UnitAttention);

Copilot uses AI. Check for mistakes.
Comment on lines +763 to +774
// Match completion against pending transactions
match self
.transactions
.get_mut(completion.transaction_id as usize)
{
Some(t) => Ok(t),
None => Err(StorvscError(StorvscErrorInner::PacketError(
PacketError::UnexpectedTransaction(completion.transaction_id),
))),
}?
.complete(result);
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normal completion path calls .complete(result) on a transaction found via self.transactions.get_mut(...), but does not remove the entry afterward. This causes unbounded growth of the Slab<PendingOperation> and can eventually break I/O. Consider removing the transaction from the slab when the completion is handled.

Copilot uses AI. Check for mistakes.
Comment on lines +733 to +737
let cdb = scsi_defs::Unmap {
operation_code: ScsiOp::UNMAP,
allocation_length: (size_of::<scsi_defs::UnmapBlockDescriptor>() as u16).into(),
..FromZeros::new_zeroed()
};
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNMAP CDB allocation_length should be the total parameter list length (header + descriptors), but it’s currently set to only size_of::<UnmapBlockDescriptor>(). This likely makes UNMAP fail (the device will read an incomplete parameter list). Consider setting the CDB length and the data-out byte_len to the actual list size (and only sending that many bytes).

Copilot uses AI. Check for mistakes.
tracing::error!(
error = e.as_ref() as &dyn std::error::Error,
"failed to refetch capacity on resize"
);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On capacity refetch failure, wait_resize() logs and continues without awaiting the resize event (or any backoff), which can spin in a tight loop if the host keeps returning errors. Consider awaiting listen (or adding a small delay/backoff) on error to avoid CPU burn.

Suggested change
);
);
listen.await;

Copilot uses AI. Check for mistakes.
Comment on lines +1037 to +1039
return Ok(Resource::new(StorvscDiskConfig {
instance_guid: controller_instance_id,
lun: sub_device_path as u8,
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub_device_path is a u32 but is cast to u8 for lun without validation. If sub_device_path > 255, this will silently truncate and route I/O to the wrong LUN. Consider validating the range and returning an error if it doesn’t fit in u8.

Suggested change
return Ok(Resource::new(StorvscDiskConfig {
instance_guid: controller_instance_id,
lun: sub_device_path as u8,
let lun = u8::try_from(sub_device_path).context(format!(
"VScsi sub_device_path {} does not fit in a u8 LUN",
sub_device_path
))?;
return Ok(Resource::new(StorvscDiskConfig {
instance_guid: controller_instance_id,
lun,

Copilot uses AI. Check for mistakes.
Comment on lines 180 to +185
"rdinit=/underhill-init",
// Default to user-mode NVMe driver.
"OPENHCL_NVME_VFIO=1",
// TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.
// MUST be removed before merge. See PR description.
"OPENHCL_STORVSC_USERMODE=1",
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-codes OPENHCL_STORVSC_USERMODE=1 into the boot command line, which enables the feature by default and contradicts the PR’s stated opt-in behavior. Please remove this before merge (or gate it behind a test-only/dev-only configuration) so production images remain opt-in via environment only.

Copilot uses AI. Check for mistakes.
Comment on lines +235 to +253
if !self.save_restore_supported {
async {
join_all(self.drivers.drain().map(|(guid, driver)| {
let guid_str = guid.to_string();
async move {
driver
.stop()
.instrument(tracing::info_span!(
"shutdown_storvsc_driver",
guid = guid_str
))
.await
}
}))
.await
}
.instrument(join_span)
.await;
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On shutdown, drivers are only stopped when !save_restore_supported. If save_restore_supported is true, this exits the manager without explicitly stopping the per-controller StorvscDriver tasks, which can leak work/resources and diverges from the NVMe manager shutdown pattern. Consider stopping drivers unconditionally on shutdown (save/restore support should affect servicing behavior, not shutdown cleanup).

Suggested change
if !self.save_restore_supported {
async {
join_all(self.drivers.drain().map(|(guid, driver)| {
let guid_str = guid.to_string();
async move {
driver
.stop()
.instrument(tracing::info_span!(
"shutdown_storvsc_driver",
guid = guid_str
))
.await
}
}))
.await
}
.instrument(join_span)
.await;
}
async {
join_all(self.drivers.drain().map(|(guid, driver)| {
let guid_str = guid.to_string();
async move {
driver
.stop()
.instrument(tracing::info_span!(
"shutdown_storvsc_driver",
guid = guid_str
))
.await
}
}))
.await
}
.instrument(join_span)
.await;

Copilot uses AI. Check for mistakes.
juantian added 2 commits April 5, 2026 00:23
- Propagate negotiate() errors via ? instead of silently ignoring
- Log decode errors and PagedRange failure inputs for diagnostics
- Fix VPD NAA ID parsing to read from descriptor start
- Add backoff on wait_resize error instead of busy-spinning
- Validate sub_device_path with u8::try_from for LUN routing
- Fix UNMAP allocation_length to include header + descriptor
- Remove unused LockedPages::va() (dead code from prior draft)
- Remove resize_listeners from inspect (internal bookkeeping)
- Stop all drivers unconditionally on shutdown (deep defensive)
- Fix ScsiOp::INQUIRY type in fuzz target
- Add TODO for resize integration tests
Copilot AI review requested due to automatic review settings April 5, 2026 01:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 3 comments.

let state = s.state_mut().unwrap();

// Cancel pending operations with save/restore reason.
for mut transaction in state.inner.transactions.drain() {
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state.inner.transactions is a slab::Slab<PendingOperation>, and Slab::drain() yields (usize, PendingOperation) items. The current loop binds the tuple to transaction and then calls transaction.cancel(...), which won't compile. Destructure the drain items (e.g., for (_id, mut op) in ...) before calling cancel.

Suggested change
for mut transaction in state.inner.transactions.drain() {
for (_id, mut transaction) in state.inner.transactions.drain() {

Copilot uses AI. Check for mistakes.
Comment on lines +733 to +790
let cdb = scsi_defs::Unmap {
operation_code: ScsiOp::UNMAP,
allocation_length: ((size_of::<scsi_defs::UnmapListHeader>()
+ size_of::<scsi_defs::UnmapBlockDescriptor>())
as u16)
.into(),
..FromZeros::new_zeroed()
};

let unmap_param_list = scsi_defs::UnmapListHeader {
data_length: ((size_of::<scsi_defs::UnmapListHeader>() - 2
+ size_of::<scsi_defs::UnmapBlockDescriptor>()) as u16)
.into(),
block_descriptor_data_length: (size_of::<scsi_defs::UnmapBlockDescriptor>() as u16)
.into(),
..FromZeros::new_zeroed()
};

let unmap_descriptor = scsi_defs::UnmapBlockDescriptor {
start_lba: sector.into(),
lba_count: u32::try_from(count)
.map_err(|_| DiskError::InvalidInput)?
.into(),
..FromZeros::new_zeroed()
};

// At this time we cannot allocate contiguous pages, but this could be done without an
// assert if we could guarantee that the allocation is contiguous.
const_assert!(
(size_of::<scsi_defs::UnmapListHeader>() + size_of::<scsi_defs::UnmapBlockDescriptor>())
as u64
<= PAGE_SIZE_4K
);
let data_out_size = PAGE_SIZE_4K as usize;
let data_out = match self.driver.allocate_dma_buffer(data_out_size) {
Ok(buf) => buf,
Err(err) => {
tracing::error!(
error = err.to_string(),
"Unable to allocate DMA buffer for UNMAP"
);
return Err(DiskError::Io(std::io::Error::other(err)));
}
};
data_out.write_at(0, unmap_param_list.as_bytes());
data_out.write_at(
size_of::<scsi_defs::UnmapListHeader>(),
unmap_descriptor.as_bytes(),
);

self.send_scsi_request(
cdb.as_bytes(),
cdb.operation_code,
data_out.pfns(),
data_out_size,
false,
0,
)
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNMAP builds a parameter list whose length matches the CDB allocation_length, but the request sends byte_len = data_out_size (a full 4K page). This makes ScsiRequest.data_transfer_length larger than the UNMAP parameter list length, which can cause protocol/target errors and adds unnecessary transfer. Send only the actual parameter list length (header + descriptor), even if the backing DMA allocation is a full page.

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +185
// TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.
// MUST be removed before merge. See PR description.
"OPENHCL_STORVSC_USERMODE=1",
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-codes OPENHCL_STORVSC_USERMODE=1 into the default kernel command line, which contradicts the stated "opt-in only" behavior and would enable usermode storvsc by default for all boots that use this image. Please remove this before merge (or gate it behind a test-only build/feature) so the default remains disabled unless the environment variable is explicitly set at runtime.

Suggested change
// TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.
// MUST be removed before merge. See PR description.
"OPENHCL_STORVSC_USERMODE=1",

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

…m gate

Remove ChannelType::Pipe from storvsp SCSI offer -- incorrectly ported
from eric135 draft. Windows storvsp uses flags=0 for SCSI (Default).
This broke all UEFI VMM tests with Unrecognized operation COMPLETE_IO.

Add cfg(unix) to disk_storvsc (depends on Linux-only vmbus_user_channel).
Fix UNMAP to send actual param list length instead of full page size.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Set has_negotiated=true after the explicit negotiate() call in
StorvscDriver::run(). Without this, StorvscState::run() sees
has_negotiated==false and calls negotiate() a second time.

storvsp rejects the second BEGIN_INITIALIZATION because it is
already in Ready state, returning INVALID_DEVICE_STATE. This
breaks all storvsp tests that go through the UIO channel path.
Copilot AI review requested due to automatic review settings April 5, 2026 07:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.

Comment on lines +165 to +172
async fn fetch_capacity(&self) -> anyhow::Result<DiskCapacity> {
// Must fit in a single page -- DMA allocations may not be
// physically contiguous across page boundaries.
const_assert!(size_of::<scsi_defs::ReadCapacity16Data>() as u64 <= PAGE_SIZE_4K);
// Must fit in a single page -- DMA allocations may not be
// physically contiguous across page boundaries.
const_assert!(size_of::<scsi_defs::ReadCapacityData>() as u64 <= PAGE_SIZE_4K);
let data_in_size = PAGE_SIZE_4K as usize;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_of::<T>() is used throughout this module but size_of is never imported/qualified, which will fail to compile. Add use core::mem::size_of; (or qualify calls as core::mem::size_of).

Copilot uses AI. Check for mistakes.
Comment on lines +295 to +303

// Cancel pending operations with save/restore reason.
for mut transaction in state.inner.transactions.drain() {
transaction.cancel(StorvscCompleteReason::SaveRestore);
}

Ok(StorvscDriverSavedState {
version: state.version.major_minor,
has_negotiated: state.has_negotiated,
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop() (and save()) stop/remove the worker task but leave new_request_sender/add_resize_listener_sender intact. If a caller uses an Arc<StorvscDriver<_>> after stop/save begins, send_request() can enqueue a request with a live completion sender and then await forever because the worker is no longer draining the request channel. Consider clearing/closing the senders during stop/save or gating send_request/add_resize_listener on an explicit running-state check so they fail fast when the task is not running.

Suggested change
// Cancel pending operations with save/restore reason.
for mut transaction in state.inner.transactions.drain() {
transaction.cancel(StorvscCompleteReason::SaveRestore);
}
Ok(StorvscDriverSavedState {
version: state.version.major_minor,
has_negotiated: state.has_negotiated,
let version = state.version.major_minor;
let has_negotiated = state.has_negotiated;
// Cancel pending operations with save/restore reason.
for mut transaction in state.inner.transactions.drain() {
transaction.cancel(StorvscCompleteReason::SaveRestore);
}
// Remove the stopped worker state so its request/listener receivers
// are dropped and future sends fail fast instead of queueing work
// that no task will ever drain.
s.remove();
Ok(StorvscDriverSavedState {
version,
has_negotiated,

Copilot uses AI. Check for mistakes.
};

// If CHECK CONDITION with sense UNIT ATTENTION, then notify any resize listeners and
// resend this request
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the UNIT ATTENTION path will “resend this request”, but the implementation cancels the pending operation with StorvscCompleteReason::UnitAttention and does not resend. Either update the comment to match behavior (caller must retry) or implement the resend logic here.

Suggested change
// resend this request
// cancel the pending request with UnitAttention so the caller can retry it.

Copilot uses AI. Check for mistakes.
Comment on lines 181 to 186
// Default to user-mode NVMe driver.
"OPENHCL_NVME_VFIO=1",
// TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.
// MUST be removed before merge. See PR description.
"OPENHCL_STORVSC_USERMODE=1",
// The next three items reduce the memory overhead of the storvsc driver.
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-codes OPENHCL_STORVSC_USERMODE=1 into the default kernel command line, which contradicts the PR’s “opt-in only / not enabled by default” requirement. Please remove this before merge (or gate it behind an explicit test-only feature/build flag) so production boots remain opt-in via environment.

Copilot uses AI. Check for mistakes.
Comment on lines +810 to +833
// TODO: Add unit tests for wait_resize -- cover error retry with
// listen.await backoff, and capacity change detection.
async fn wait_resize(&self, sector_count: u64) -> u64 {
loop {
let listen = self.resize_event.listen();
// Refetch capacity from host (we're in async context here)
let capacity = match self.fetch_capacity().await {
Ok(c) => c,
Err(e) => {
tracing::error!(
error = e.as_ref() as &dyn std::error::Error,
"failed to refetch capacity on resize"
);
listen.await;
continue;
}
};
if capacity.num_sectors != sector_count {
break capacity.num_sectors;
}
listen.await;
}
}
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disk_storvsc introduces substantial new disk I/O behavior (bounce-buffer vs zero-copy, retry logic on CancelledRetry, capacity refresh on resize), but the crate has no unit tests. Adding targeted tests (e.g., for send_scsi_request retry semantics and wait_resize behavior) would help prevent regressions.

Suggested change
// TODO: Add unit tests for wait_resize -- cover error retry with
// listen.await backoff, and capacity change detection.
async fn wait_resize(&self, sector_count: u64) -> u64 {
loop {
let listen = self.resize_event.listen();
// Refetch capacity from host (we're in async context here)
let capacity = match self.fetch_capacity().await {
Ok(c) => c,
Err(e) => {
tracing::error!(
error = e.as_ref() as &dyn std::error::Error,
"failed to refetch capacity on resize"
);
listen.await;
continue;
}
};
if capacity.num_sectors != sector_count {
break capacity.num_sectors;
}
listen.await;
}
}
}
async fn wait_resize(&self, sector_count: u64) -> u64 {
loop {
let listen = self.resize_event.listen();
let next = match self.fetch_capacity().await {
Ok(capacity) => {
wait_resize_next_action(sector_count, Ok(capacity.num_sectors))
}
Err(e) => {
tracing::error!(
error = e.as_ref() as &dyn std::error::Error,
"failed to refetch capacity on resize"
);
wait_resize_next_action(sector_count, Err(()))
}
};
match next {
WaitResizeAction::ReturnUpdatedCapacity(num_sectors) => break num_sectors,
WaitResizeAction::WaitForNotification => listen.await,
}
}
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum WaitResizeAction {
WaitForNotification,
ReturnUpdatedCapacity(u64),
}
fn wait_resize_next_action(
sector_count: u64,
observed_sector_count: Result<u64, ()>,
) -> WaitResizeAction {
match observed_sector_count {
Ok(observed_sector_count) if observed_sector_count != sector_count => {
WaitResizeAction::ReturnUpdatedCapacity(observed_sector_count)
}
Ok(_) | Err(()) => WaitResizeAction::WaitForNotification,
}
}
#[cfg(test)]
mod tests {
use super::WaitResizeAction;
use super::wait_resize_next_action;
use test_with_tracing::test;
#[test]
fn wait_resize_retries_after_capacity_refresh_error() {
assert_eq!(
wait_resize_next_action(1024, Err(())),
WaitResizeAction::WaitForNotification
);
}
#[test]
fn wait_resize_waits_when_capacity_is_unchanged() {
assert_eq!(
wait_resize_next_action(1024, Ok(1024)),
WaitResizeAction::WaitForNotification
);
}
#[test]
fn wait_resize_returns_when_capacity_changes() {
assert_eq!(
wait_resize_next_action(1024, Ok(2048)),
WaitResizeAction::ReturnUpdatedCapacity(2048)
);
}
}

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Skip disk-specific metadata queries for optical devices (device_type 0x05).
DVD/CD-ROM only needs capacity for read I/O -- SimpleScsiDvd handles SCSI
protocol (INQUIRY, MODE_SENSE, VPD) itself and only delegates read/eject.

- Split fetch_capacity into fetch_capacity_10/16 for clarity
- Optical path: READ_CAPACITY(10), disk_id=None, read_only=true, unmap=0
- Disk path: READ_CAPACITY(16->10 fallback), VPD, MODE_SENSE, Block Limits
- Fix misleading comment in storvsc_driver (resend -> cancel so caller can retry)
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unsafe Related to unsafe code

Projects

None yet

3 participants