Skip to content

benchmarks.SaturnNPU.kernel_library.rope_frequency

Source: benchmarks/SaturnNPU/kernel_library/rope_frequency.py

benchmarks.SaturnNPU.kernel_library.rope_frequency

Per-element cosine on a 32x32 bf16 tile.

Despite the name, this kernel does not compose the full rotary embedding — it computes y = cos(x) on a 32x32 bf16 tile (stored in VMEM as two 32x16 halves per the bf16_split_halves layout). Pair it with a sibling sin kernel (or torch-side pre-computation) to form the full rotary transform.