`models.tinyllama.tinyllama_export`

Source: models/tinyllama/tinyllama_export.py

Export TinyLlama to ONNX for OPU benchmarking.

Uses the optimum library for reliable HuggingFace→ONNX export with proper handling of KV cache, attention masks, and rotary embeddings.

Usage

conda run -n merlin-dev uv run python models/tinyllama/tinyllama_export.py

Export using optimum-cli which handles all the tricky bits.

Fallback: export directly with torch.onnx.export.