Skip to content

benchmarks.SaturnNPU.kernel_library.attention

Source: benchmarks/SaturnNPU/kernel_library/attention.py

benchmarks.SaturnNPU.kernel_library.attention

test()

Single-tile materialized attention: softmax((Q @ K) * scale) @ V.

Q, K, and V are fp8 32x32 tiles. The output is stored as two bf16 32x16 halves at DRAM 0x1000 and 0x1400.