Skip to content

models.tinyllama.tinyllama_export

Source: models/tinyllama/tinyllama_export.py

models.tinyllama.tinyllama_export

Export TinyLlama to ONNX for OPU benchmarking.

Uses the optimum library for reliable HuggingFace→ONNX export with proper handling of KV cache, attention masks, and rotary embeddings.

Usage

conda run -n merlin-dev uv run python models/tinyllama/tinyllama_export.py

export_with_optimum(model_id, output_dir, seq_len=128)

Export using optimum-cli which handles all the tricky bits.

export_with_torch(model_id, output_path, seq_len=128)

Fallback: export directly with torch.onnx.export.