Fix FP8 CT export metadata for KV cache and attention by yiliu30 · Pull Request #1752 · intel/auto-round

yiliu30 · 2026-04-28T11:50:08Z

Summary

add FP8 static attention config-group export for compressed-tensors format
keep FP8 KV cache metadata in the exported config
add targeted tests for FP8 KV and FP8 attention CT export behavior

Verification

/home/yiliu4/workspace/ar/bin/python -m pytest test/test_cpu/export/test_llmc_format.py -k 'static_fp8_kv_config or static_fp8_attention_config or static_fp_export_packs_serially' -q -s
- passed
/home/yiliu4/workspace/ar/bin/python -m pytest test/test_cpu/export/test_llmc_format.py -q
- issue-specific tests passed; one separate mixed-precision load failure remains in the installed compressed-tensors stack (mxfp8-quantized not registered)

Closes #1751

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR fixes and extends FP8_STATIC export to the llm-compressor / compressed-tensors config format by ensuring FP8 KV-cache metadata is preserved and by exporting an FP8 “static attention” config-group, with regression tests covering both behaviors.

Changes:

Add FP8 attention config_groups export (non-Linear targets) when static_attention_dtype requests FP8.
Preserve / emit kv_cache_scheme metadata for FP8 KV cache and FP8 attention export paths.
Add targeted CPU export tests validating saved config.json metadata and reload behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`test/test_cpu/export/test_llmc_format.py`	Adds regression tests for FP8 KV-cache scheme metadata and FP8 attention config-group export + reload.
`auto_round/export/export_to_llmcompressor/export_to_static_fp.py`	Refactors FP8 QuantizationArgs construction, adds attention config-group export, and adjusts when `kv_cache_scheme` is included.

Copilot · 2026-04-28T11:56:36Z

+def _use_fp8_attention(static_attention_dtype: str | None) -> bool:
+    """Return True if static attention should use FP8."""
+    if static_attention_dtype in ("fp8", "float8_e4m3fn"):
+        logger.warning_once("Exporting model with static attention in FP8 dtype.")
+        return True
+    return False


_use_fp8_attention (and similarly _use_fp8_kv) only matches string values, but the compressor config allows static_attention_dtype/static_kv_dtype to be passed as torch.dtype (e.g., torch.float8_e4m3fn). In that case this check will return False and the FP8 attention group / kv_cache_scheme metadata will be silently omitted from the exported config. Consider normalizing dtype inputs (accept torch.dtype values and/or convert to a canonical string) before the membership check so both str and torch.dtype are supported.

chensuyue · 2026-04-29T01:45:25Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-29T01:45:35Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI review requested due to automatic review settings April 28, 2026 11:50

yiliu30 and others added 2 commits April 28, 2026 19:52

Fix FP8 CT export metadata for KV cache and attention

c45c905

[pre-commit.ci] auto fixes from pre-commit.com hooks

182a048

for more information, see https://pre-commit.ci

Copilot started reviewing on behalf of yiliu30 April 28, 2026 11:53 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Merge branch 'main' into fix/issue-1751-fp8-ct-export

dd8555f

yiliu30 requested review from changwangss, chensuyue and mengniwang95 April 29, 2026 00:24

chensuyue approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix FP8 CT export metadata for KV cache and attention#1752

Fix FP8 CT export metadata for KV cache and attention#1752
yiliu30 wants to merge 3 commits intomainfrom
fix/issue-1751-fp8-ct-export

yiliu30 commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

chensuyue commented Apr 29, 2026

Uh oh!

azure-pipelines Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yiliu30 commented Apr 28, 2026

Summary

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chensuyue commented Apr 29, 2026

Uh oh!

azure-pipelines Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants