Skip to content
This repository was archived by the owner on Jan 28, 2026. It is now read-only.
This repository was archived by the owner on Jan 28, 2026. It is now read-only.

vllm 0.11 and A770 #13323

@savvadesogle

Description

@savvadesogle

Since Intel has so far abandoned ipex-llm and Arc cards...

vllm v0.11.1rc2.dev221+g49c00fe30 works together with A770 (4x)

Image

You can build a Docker container from the vllm repository sources (Dockerfile.xpu)
https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.xpu

docker build -f docker/Dockerfile.xpu -t vllm-xpu-0110 --shm-size=32g .

But I do not know how to properly configure it for the 4x A770, and I am sure that the performance could be higher
2 req/s -> 10+ req/s.

Image

Llama3.1 8b Instruct FP8
Sometimes the request processing speed reaches 12 requests/s, but there are problems with the process "hanging up" and then speeding up. I haven't figured out the reason yet.
1024 in, 512 out for configuration

--max-model-len "2000" 
--max-num-batched-tokens "3000"

test

vllm bench serve \
    --model /llm/models/LLM-Research/Meta-Llama-3.1-8B-Instruct \
    --served-model-name Meta-Llama-3.1-8B-Instruct \
    --dataset-name random \
    --random-input-len 1024 \
    --random-output-len 512 \
    --ignore-eos \
    --num-prompt 1500 \
    --trust-remote-code \
    --request-rate inf \
    --backend vllm \
    --port 8000

Ubuntu 25.10, 6,17.3 kernel
my numbers for 4x A770, 2x Xeon 2699 V3 is:

115 requests

Image

1500 requests

Image

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions