benchmarks.SaturnNPU.kernel_library.attention
Source: benchmarks/SaturnNPU/kernel_library/attention.py
benchmarks.SaturnNPU.kernel_library.attention
test()
Single-tile materialized attention: softmax((Q @ K) * scale) @ V.
Q, K, and V are fp8 32x32 tiles. The output is stored as two bf16 32x16 halves at DRAM 0x1000 and 0x1400.