models.tinyllama.tinyllama_export
Source: models/tinyllama/tinyllama_export.py
models.tinyllama.tinyllama_export
Export TinyLlama to ONNX for OPU benchmarking.
Uses the optimum library for reliable HuggingFace→ONNX export with
proper handling of KV cache, attention masks, and rotary embeddings.
Usage
conda run -n merlin-dev uv run python models/tinyllama/tinyllama_export.py
export_with_optimum(model_id, output_dir, seq_len=128)
Export using optimum-cli which handles all the tricky bits.
export_with_torch(model_id, output_path, seq_len=128)
Fallback: export directly with torch.onnx.export.