Skip to content

xadupre/mbext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

190 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelBuilder for onnxruntime-genai

codecov

The code base comes from https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models. It adds fast unit tests checking discrepancies, end to end test with the trained model. It supports more architectures.

Style

black . && ruff check .

Development

pip install -e .[dev]

Supported Models

The table below shows which models have test coverage across the three test tiers. Fast tests use randomly-initialised weights and run entirely offline. Trained tests download the real model weights and check numerical discrepancies against PyTorch. Genai tests additionally verify end-to-end token generation via onnxruntime-genai.

Model Architecture Fast tests Trained tests Genai tests
arnir0/Tiny-LLM LlamaForCausalLM ✓ CPU ✓ CPU
baidu/ERNIE-4.5-0.3B-PT Ernie4_5ForCausalLM
google/gemma-2b GemmaForCausalLM
google/gemma-2-2b Gemma2ForCausalLM
google/gemma-3-4b-it Gemma3ForCausalLM (text-only)
google/gemma-3-4b-it Gemma3ForConditionalGeneration (multimodal)
google/gemma-4-E4B-it Gemma4ForCausalLM
google/gemma-4-E4B-it Gemma4ForConditionalGeneration (text-only)
openai/gpt-oss-20b GptOssForCausalLM
ibm-granite/granite-3.3-2b-instruct GraniteForCausalLM
internlm/internlm2-7b InternLM2ForCausalLM
mistralai/Ministral-3-3B-Instruct-2512 Ministral3ForCausalLM (text-only)
mistralai/Ministral-3-3B-Instruct-2512 Mistral3ForConditionalGeneration (multimodal) ✓ CPU ✓ CPU
mistralai/Mistral-Nemo-Instruct-2407 MistralNeMoForCausalLM ✓ CPU ✓ CPU
nvidia/Minitron-4B-Base NemotronForCausalLM
nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 NemotronHForCausalLM
allenai/OLMo-7B OlmoForCausalLM
allenai/OLMo-2-1124-7B Olmo2ForCausalLM
allenai/OLMo-3-7B-Instruct Olmo3ForCausalLM ✓ CPU ✓ CPU
microsoft/phi-2 PhiForCausalLM
microsoft/Phi-3-mini-4k-instruct Phi3ForCausalLM
microsoft/Phi-3-mini-128k-instruct Phi3ForCausalLM (LongRoPE)
microsoft/Phi-3-small-8k-instruct Phi3SmallForCausalLM
microsoft/Phi-3-vision-128k-instruct Phi3VForCausalLM
microsoft/Phi-4-multimodal-instruct Phi4MMForCausalLM
microsoft/Phi-4-multimodal-instruct Phi4MultimodalForCausalLM (multimodal)
microsoft/Phi-3.5-MoE-instruct PhiMoEForCausalLM
Qwen/Qwen2.5-VL-7B-Instruct Qwen2_5_VLForConditionalGeneration
Qwen/Qwen2.5-Omni-3B Qwen2_5OmniForConditionalGeneration (text)
Qwen/Qwen2.5-Omni-3B Qwen2_5OmniForConditionalGeneration (multimodal)
Qwen/Qwen3-0.6B Qwen3ForCausalLM ✓ CPU ✓ CPU
Qwen/Qwen3.5-3B Qwen3_5ForConditionalGeneration
Qwen/Qwen3-VL-4B-Instruct Qwen3VLForConditionalGeneration
HuggingFaceTB/SmolLM3-3B SmolLM3ForCausalLM ✓ CPU ✓ CPU
openai/whisper-tiny WhisperForConditionalGeneration
THUDM/chatglm3-6b ChatGLMForConditionalGeneration
zai-org/chatglm3-6b ChatGLMModel

Fast Unit tests

pytest tests/fast

Long Unit tests

python tests/trained/test_trained_tiny_llm.py

With a better machine:

LONGTEST=1 pytest tests/trained

You can see the results in stats/end2end_results.json. Example:

{'first_diff': 0, 'delta_length': 4, 'expected_length': 16, 'total_diff': 16, 'precision': 'fp32', 'model_id': 'HuggingFaceTB/SmolLM3-3B', 'experiment': 'generate', 'provider': 'cpu'}
{'max_abs_err': 1.6875, '%_gt_0.1': np.float64(0.5278832959081836), '%_gt_0.01': np.float64(0.962624750499002), 'avg_abs_discrepancy': 0.1749267578125, 'shape': (1, 5, 128256), 'dtype': dtype('float16'), 'precision': 'fp16', 'model_id': 'HuggingFaceTB/SmolLM3-3B', 'experiment': 'forward'}

About

modelbuilder

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages