benchmarks.SaturnNPU.scripts.generate_npu_golden_tests
Source: benchmarks/SaturnNPU/scripts/generate_npu_golden_tests.py
benchmarks.SaturnNPU.scripts.generate_npu_golden_tests
Generate hierarchical golden test data for SmolVLA NPU kernels.
Golden data is generated at two levels
- Layer level (PyTorch): full layer forward pass (e.g., GemmaAttention)
- Operator level: each atomic op within the layer (matmul, softmax, etc.)
The operator-level golden data composes to reproduce the layer-level output. This lets RTL teams test individual operators AND verify their composition.
Usage
python tools/generate_npu_golden_tests.py benchmarks/SaturnNPU/smolvla_graph_manifest.json --output-dir benchmarks/SaturnNPU/golden_data/ --generate-programs third_party/npu_model/npu_model/configs/programs/
LayerTrace
dataclass
Full layer golden data with operator decomposition.
OpTrace
dataclass
One operator's golden data within a layer decomposition.
generate_action_time_mlp(hidden=1024, time_dim=2048, dtype=torch.bfloat16)
Generate golden data for the action time MLP.
Linear(time_dim→hidden) + SiLU + Linear(hidden→hidden).
generate_gemma_attention(seq_len=50, hidden=720, num_heads=15, num_kv_heads=5, dtype=torch.bfloat16)
Generate golden data for one Gemma decoder self-attention layer.
generate_gemma_cross_attention(q_seq_len=50, kv_seq_len=241, hidden=720, num_heads=15, num_kv_heads=5, dtype=torch.bfloat16)
Generate golden data for Gemma expert cross-attention.
Different from self-attention: query and key/value have different seq lengths. Query comes from action tokens (50), KV from vision+language context (241).
generate_gemma_mlp(seq_len=50, hidden=720, intermediate=2048, dtype=torch.bfloat16)
Generate golden data for one Gemma MLP layer (gate + up + GELU + down).
generate_siglip_attention(seq_len=1024, hidden=768, num_heads=12, dtype=torch.bfloat16)
Generate golden data for one SigLIP self-attention layer.
generate_siglip_mlp(seq_len=1024, hidden=768, intermediate=3072, dtype=torch.bfloat16)
Generate golden data for one SigLIP MLP layer (fc1 + GELU + fc2).
generate_siglip_patch_embed(image_size=512, patch_size=16, in_channels=3, hidden=768, dtype=torch.bfloat16)
Generate golden data for SigLIP patch embedding.
Conv2d(3→768, kernel=16, stride=16) + position embedding add.
ref_gelu_tanh(x)
GELU with tanh approximation (matches PyTorch nn.GELU('tanh')).
ref_rms_norm(x, weight, eps=1e-06)
RMS normalization: x * rsqrt(mean(x^2) + eps) * weight.
ref_rope(q, k, cos, sin)
Apply rotary position embedding.
ref_silu(x)
SiLU / Swish activation.
ref_softmax(x, dim=-1)
Softmax decomposed into max-subtract-exp-sum-div.
save_layer_trace(trace, output_dir)
Save a LayerTrace to disk in a hierarchical directory structure.
verify_composition(trace)
Verify that operator outputs chain correctly to reproduce layer output.