The code base comes from https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models. It adds fast unit tests checking discrepancies, end to end test with the trained model. It supports more architectures.
black . && ruff check .pip install -e .[dev]The table below shows which models have test coverage across the three test
tiers. Fast tests use randomly-initialised weights and run entirely offline.
Trained tests download the real model weights and check numerical
discrepancies against PyTorch. Genai tests additionally verify end-to-end
token generation via onnxruntime-genai.
| Model | Architecture | Fast tests | Trained tests | Genai tests |
|---|---|---|---|---|
| arnir0/Tiny-LLM | LlamaForCausalLM |
✓ | ✓ CPU | ✓ CPU |
| baidu/ERNIE-4.5-0.3B-PT | Ernie4_5ForCausalLM |
✓ | ||
| google/gemma-2b | GemmaForCausalLM |
✓ | ||
| google/gemma-2-2b | Gemma2ForCausalLM |
✓ | ||
| google/gemma-3-4b-it | Gemma3ForCausalLM (text-only) |
✓ | ||
| google/gemma-3-4b-it | Gemma3ForConditionalGeneration (multimodal) |
✓ | ||
| google/gemma-4-E4B-it | Gemma4ForCausalLM |
✓ | ||
| google/gemma-4-E4B-it | Gemma4ForConditionalGeneration (text-only) |
|||
| openai/gpt-oss-20b | GptOssForCausalLM |
✓ | ||
| ibm-granite/granite-3.3-2b-instruct | GraniteForCausalLM |
✓ | ||
| internlm/internlm2-7b | InternLM2ForCausalLM |
✓ | ||
| mistralai/Ministral-3-3B-Instruct-2512 | Ministral3ForCausalLM (text-only) |
✓ | ||
| mistralai/Ministral-3-3B-Instruct-2512 | Mistral3ForConditionalGeneration (multimodal) |
✓ | ✓ CPU | ✓ CPU |
| mistralai/Mistral-Nemo-Instruct-2407 | MistralNeMoForCausalLM |
✓ | ✓ CPU | ✓ CPU |
| nvidia/Minitron-4B-Base | NemotronForCausalLM |
✓ | ||
| nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 | NemotronHForCausalLM |
✓ | ||
| allenai/OLMo-7B | OlmoForCausalLM |
✓ | ||
| allenai/OLMo-2-1124-7B | Olmo2ForCausalLM |
✓ | ||
| allenai/OLMo-3-7B-Instruct | Olmo3ForCausalLM |
✓ | ✓ CPU | ✓ CPU |
| microsoft/phi-2 | PhiForCausalLM |
✓ | ||
| microsoft/Phi-3-mini-4k-instruct | Phi3ForCausalLM |
✓ | ||
| microsoft/Phi-3-mini-128k-instruct | Phi3ForCausalLM (LongRoPE) |
✓ | ||
| microsoft/Phi-3-small-8k-instruct | Phi3SmallForCausalLM |
✓ | ||
| microsoft/Phi-3-vision-128k-instruct | Phi3VForCausalLM |
✓ | ||
| microsoft/Phi-4-multimodal-instruct | Phi4MMForCausalLM |
✓ | ||
| microsoft/Phi-4-multimodal-instruct | Phi4MultimodalForCausalLM (multimodal) |
✓ | ||
| microsoft/Phi-3.5-MoE-instruct | PhiMoEForCausalLM |
✓ | ||
| Qwen/Qwen2.5-VL-7B-Instruct | Qwen2_5_VLForConditionalGeneration |
✓ | ||
| Qwen/Qwen2.5-Omni-3B | Qwen2_5OmniForConditionalGeneration (text) |
✓ | ||
| Qwen/Qwen2.5-Omni-3B | Qwen2_5OmniForConditionalGeneration (multimodal) |
✓ | ||
| Qwen/Qwen3-0.6B | Qwen3ForCausalLM |
✓ | ✓ CPU | ✓ CPU |
| Qwen/Qwen3.5-3B | Qwen3_5ForConditionalGeneration |
✓ | ||
| Qwen/Qwen3-VL-4B-Instruct | Qwen3VLForConditionalGeneration |
✓ | ||
| HuggingFaceTB/SmolLM3-3B | SmolLM3ForCausalLM |
✓ | ✓ CPU | ✓ CPU |
| openai/whisper-tiny | WhisperForConditionalGeneration |
✓ | ||
| THUDM/chatglm3-6b | ChatGLMForConditionalGeneration |
✓ | ||
| zai-org/chatglm3-6b | ChatGLMModel |
✓ |
pytest tests/fastpython tests/trained/test_trained_tiny_llm.pyWith a better machine:
LONGTEST=1 pytest tests/trainedYou can see the results in stats/end2end_results.json. Example:
{'first_diff': 0, 'delta_length': 4, 'expected_length': 16, 'total_diff': 16, 'precision': 'fp32', 'model_id': 'HuggingFaceTB/SmolLM3-3B', 'experiment': 'generate', 'provider': 'cpu'}
{'max_abs_err': 1.6875, '%_gt_0.1': np.float64(0.5278832959081836), '%_gt_0.01': np.float64(0.962624750499002), 'avg_abs_discrepancy': 0.1749267578125, 'shape': (1, 5, 128256), 'dtype': dtype('float16'), 'precision': 'fp16', 'model_id': 'HuggingFaceTB/SmolLM3-3B', 'experiment': 'forward'}